Using the Pipeline ================== ``riptide`` comes with a pipeline application, ``rffa``, that can search an observation across a range of dispersion measure (DM) trials, using many CPUs in parallel. ``rffa`` automatically performs the sequence of steps described in :ref:`Quickstart Guide` for all input DM trials, groups the detected peaks into sensible clusters, and then produces a candidate file and plot for each cluster thus found. Here we cover how to search an observation for sources with an unknown DM. ``riptide`` does not have (yet) a dedispersion engine, so you must take care of dedispersing your multi-channel observation data using another software package. Here we will use PRESTO_, but the general process is similar regardless of the dedispersion engine used. **If you are not familiar with PRESTO, please first have a look at** the `PRESTO tutorial`_ and documentation. .. _PRESTO: https://github.com/scottransom/presto .. _`PRESTO tutorial`: https://www.cv.nrao.edu/~sransom/PRESTO_search_tutorial.pdf RFI Mitigation -------------- Before actually dedispersing, a sensible starting point is to produce a radio-frequency interference (RFI) mask using ``rfifind``. It will scan the input data for bad frequency channels and time intervals, and save them to a so-called mask file. This file can then be read by the dedispersion utility ``prepsubband`` which can replace the bad data sections with a sensible value. This will remove a significant amount of spurious candidates from the search output down the line. Choosing the optimal DM step ---------------------------- The next stage is to calculate how closely the consecutive DM trials should be spaced, which in PRESTO is done using ``DDplan.py``. The ideal DM step size is a function of: * The observing band parameters: centre frequency, bandwidth, number of channels * The sampling time of the data * **The minimum pulse width being searched for**. When running ``DDplan.py`` this is the resolution (``-r``) command-line argument: it should be the minimum period you plan to search (``period_min``), divided by the minimum number of phase bins (``bins_min``) you plan to use. **Example:** We want to search a Parkes multibeam receiver observation, with a centre frequency of 1382 MHz, a bandwidth of 400 MHz, 1024 frequency channels, and a sampling interval of 64 microseconds. We will instruct ``prepsubband`` to use 64 sub-bands for dedispersion (``-s 64`` below), and want to cover DMs from 0 to 1,000. We will then use ``riptide`` to search for periods down to 100 ms using at least 200 phase bins, which amounts to a minimum pulse width of 0.5 ms (``-r 0.5``). The call to ``DDplan.py`` is thus: .. code-block:: console $ DDplan.py --loDM 0.0 --hiDM 1000.0 -f 1382.0 -b 400.0 -n 1024 -s 64 -t 0.000064 -r 0.5 Minimum total smearing : 0.0907 ms -------------------------------------------- Minimum channel smearing : 0 ms Minimum smearing across BW : 0.00629 ms Minimum sample time : 0.064 ms Setting the new 'best' resolution to : 0.5 ms Note: ok_smearing > dt (i.e. data is higher resolution than needed) New dt is 4 x 0.064 ms = 0.256 ms Best guess for optimal initial dDM is 0.407 Low DM High DM dDM DownSamp dsubDM #DMs DMs/call calls WorkFract 0.000 585.000 0.30 4 15.00 1950 50 39 0.8211 585.000 1010.000 0.50 8 25.00 850 50 17 0.1789 Dedispersing the data --------------------- The table above returned by ``DDplan.py`` defines the sequence of calls to make to ``prepsubband``. In the most recent versions of PRESTO, ``DDplan.py`` writes a python script that directly makes the right sequence of calls to ``prepsubband``, otherwise it has to be generated by other means (a custom script) or the calls have to be made manually (**not** recommended). For example, covering the DM range 0 to 585 requires 39 consecutive calls to prepsubband, each producing 50 DM trials spaced by a step of 0.30. .. code-block:: console $ prepsubband -lodm 0.0 -dmstep 0.3 -numdms 50 -nsub 64 -downsamp 4 observation.fil $ prepsubband -lodm 15.0 -dmstep 0.3 -numdms 50 -nsub 64 -downsamp 4 observation.fil [...] $ prepsubband -lodm 570.0 -dmstep 0.3 -numdms 50 -nsub 64 -downsamp 4 observation.fil See the `PRESTO tutorial`_ for more details. Once all calls to ``prepsubband`` have been made, we can search the resulting set of DM trials, which will consist of pairs of ``.inf`` (header) and ``.dat`` (binary data) files. .. _`PRESTO tutorial`: https://www.cv.nrao.edu/~sransom/PRESTO_search_tutorial.pdf Configuring the riptide pipeline -------------------------------- The ``rffa`` application is highly flexible and takes a YAML configuration file as an input. A `model configuration file`_, with detailed comments, can be found in the repository. This should be your starting point. Most parameters are mandatory. If the configuration file is malformed, the ``rffa`` application will raise an Exception with a helpul error message. .. _`model configuration file`: https://github.com/v-morello/riptide/blob/master/riptide/pipeline/config/example.yaml Number of parallel processes ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The first parameter is the number of parallel processes to use for the search; each process goes through one DM trial at a time. This should be the number of cores available for the search; if you are running the code on a SLURM supercomputing facility, this should be equal to ``cpus-per-task``. .. code-block:: YAML processes: 8 Data format and band parameters ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Since version ``0.2.0``, ``riptide`` reads the observing band parameters directly from the input ``.inf`` files when using PRESTO for dedispersion. However, when using SIGPROC's dedispersion routine, the DM trial files do *not* contain that information, and it must be specified in the config file. These parameters are important at various stages of the search process. .. code-block:: YAML # Input format, either 'presto' or 'sigproc' format: presto ### Observing band parameters: leave blank except for SIGPROC input data # Minimum observing frequency in MHz fmin: # Maximum observing frequency in MHz fmax: # Number of channels in the data nchans: DM trial selection ^^^^^^^^^^^^^^^^^^ Although the pipeline can be passed a specific list of DM trial files to search, a more practical option is to pass all DM trial files and use the options below to select only a certain DM range. .. code-block:: YAML dmselect: # Minimum DM trial in pc cm^{-3} # If left blank, start at the minimum available trial DM min: 0.0 # Maximum DM trial in pc cm^{-3} # This is a hard limit, regardless of sky coordinates (see below) # If left blank, stop at the maximum available trial DM max: 1000.0 # Maximum value of Trial_DM x |sin b| where b is the Galactic latitude of the observation. # This is a simple method to limit the maximum trial DM as a function of Galactic coordinates # Almost no Galactic pulsars are known to have DM x |sin b| > 40 # If left blank, no latitude-dependent cap on the maximum trial DM is applied dmsinb_max: 45.0 Red noise subtraction ^^^^^^^^^^^^^^^^^^^^^ This section mirrors the parameters passed to the dereddining function. See :meth:`riptide.TimeSeries.deredden` .. code-block:: YAML dereddening: # Width of the running median window in seconds used by the median subtraction # routine before searching the input time series rmed_width: 5.0 # 'minpts' parameter passed to the ffa_search() function rmed_minpts: 101 Defining the search space ^^^^^^^^^^^^^^^^^^^^^^^^^ This section defines a list of search ranges, each with a minimum and maximum trial period, and a duty cycle resolution specified via a minimum and maximum number of phase bins. Here the idea is to use more phase bins for longer search periods. Each range in the list has three sections: * ``ffa_search``: The list of parameters passed to the :func:`riptide.ffa_search` function. Any unspecified parameters will be set to the default values in the function definition. * ``find_peaks``: The list of parameters passed to the :func:`riptide.find_peaks` function. Unspecified parameters are also set to their default values. * ``candidates``: The number of phase bins and sub-integrations in the candidate files produced when searching this period range. The ``name`` attribute is only for logging purposes and can be set to anything. .. code-block:: YAML ranges: - name: 'short' ffa_search: period_min: 0.20 period_max: 1.00 bins_min: 240 bins_max: 260 find_peaks: smin: 6.0 candidates: bins: 256 subints: 32 - name: 'medium' ffa_search: period_min: 1.00 period_max: 5.00 bins_min: 480 bins_max: 520 find_peaks: smin: 6.0 candidates: bins: 512 subints: 32 - name: 'long' ffa_search: period_min: 5.00 period_max: 180.00 bins_min: 960 bins_max: 1040 find_peaks: smin: 6.0 candidates: bins: 1024 subints: 32 Peak clustering and harmonic flagging ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ These parameters control how the many periodogram peaks found during the search across all DM trials are clustered into candidates, and how the candidates deemed to be a harmonic of another are removed. They should be left to their default values unless there is a good reason to. The default parameters for harmonic flagging are conservative; they should very rarely flag a real pulsar as a harmonic of a brighter RFI instance. .. code-block:: YAML # Parameters of the peak clustering that is performed once all DM trials have # been searched clustering: # Clustering radius in units of 1 / Tobs # Two peaks whose frequencies are within (clrad / Tobs) Hz of each other # are considered part of the same cluster radius: 0.2 # Harmonic flagging parameters # See the docstring of the htest() function in harmonic_testing.py for details # NOTE: this is only a flagging operation, the actual *removal* of candidates # flagged as harmonics is entirely optional, see below harmonic_flagging: denom_max: 100 phase_distance_max: 1.0 dm_distance_max: 3.0 snr_distance_max: 3.0 Candidate filters and plotting ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Right before producing candidate files and/or plots, a list of manual filters can be applied. Candidate plots can also be generated automatically. .. code-block:: YAML # Filters applied to the final list of clusters, *just before* the associated # candidate files are produced. # The cap on candidate number is applied last, after all unworthy candidates have been removed # Any of these fields can be left empty, in which case the corresponding filter is NOT applied candidate_filters: dm_min: snr_min: 7.0 remove_harmonics: True max_number: # If True, save a PNG plot for every candidate # Candidate files can always be loaded and plotted later plot_candidates: True Running the Pipeline --------------------- Once the pipeline configuration file is ready, the pipeline application ``rffa`` takes two mandatory arguments: the config file via ``-c`` option and a list of all the DM trial files to search. For example: .. code-block:: console rffa -c myConfig.yml dedispersed_data/*.inf There are additional options, e.g. to set a specific output directory or save a log file. See ``rffa --help``. .. NOTE:: ``rffa`` runs its own internal dedispersion plan to "thin out" the list of DM trials and select the minimum amount necessary to cover the DM range. The actual DM step it chooses is as a function of the minimum pulse width being searched (as specified in the YAML config file). This is a design choice; ``rffa`` can be run along with a standard FFT-based search code and ingest the same set of dedispersed time series files. Indeed the DM step required for millisecond pulsar searches is much smaller than for ordinary pulsars. Data products ------------- Once the pipeline finishes, the following data products will be written in the specified output directory: * A CSV table of all detected periodogram peaks across all DM trials * A CSV table of clusters, obtained by grouping together peaks with frequencies close to each other * A CSV table of candidates, which will have the same entries as the clusters table, unless you have enabled harmonic filtering in the config file. In this case any cluster that was flagged as a harmonic of another is removed from the final candidate list. * One JSON file per :class:`riptide.Candidate` object, which can be loaded using :func:`riptide.load_json` and plotted / manipulated. These contain header information, a table of peaks associated to the candidate, and a sub-integration plots obtained by folding the DM trial at which they were detected with the highest S/N. * One PNG plot per candidate, if the associated option was enabled in the configuration file.