Using the Pipeline

riptide comes with a pipeline application, rffa, that can search an observation across a range of dispersion measure (DM) trials, using many CPUs in parallel. rffa automatically performs the sequence of steps described in Quickstart Guide for all input DM trials, groups the detected peaks into sensible clusters, and then produces a candidate file and plot for each cluster thus found. Here we cover how to search an observation for sources with an unknown DM.

riptide does not have (yet) a dedispersion engine, so you must take care of dedispersing your multi-channel observation data using another software package. Here we will use PRESTO, but the general process is similar regardless of the dedispersion engine used. If you are not familiar with PRESTO, please first have a look at the PRESTO tutorial and documentation.

RFI Mitigation

Before actually dedispersing, a sensible starting point is to produce a radio-frequency interference (RFI) mask using rfifind. It will scan the input data for bad frequency channels and time intervals, and save them to a so-called mask file. This file can then be read by the dedispersion utility prepsubband which can replace the bad data sections with a sensible value. This will remove a significant amount of spurious candidates from the search output down the line.

Choosing the optimal DM step

The next stage is to calculate how closely the consecutive DM trials should be spaced, which in PRESTO is done using DDplan.py. The ideal DM step size is a function of:

The observing band parameters: centre frequency, bandwidth, number of channels
The sampling time of the data
The minimum pulse width being searched for. When running DDplan.py this is the resolution (-r) command-line argument: it should be the minimum period you plan to search (period_min), divided by the minimum number of phase bins (bins_min) you plan to use.

Example: We want to search a Parkes multibeam receiver observation, with a centre frequency of 1382 MHz, a bandwidth of 400 MHz, 1024 frequency channels, and a sampling interval of 64 microseconds. We will instruct prepsubband to use 64 sub-bands for dedispersion (-s 64 below), and want to cover DMs from 0 to 1,000. We will then use riptide to search for periods down to 100 ms using at least 200 phase bins, which amounts to a minimum pulse width of 0.5 ms (-r 0.5). The call to DDplan.py is thus:

$ DDplan.py --loDM 0.0 --hiDM 1000.0 -f 1382.0 -b 400.0 -n 1024 -s 64 -t 0.000064 -r 0.5

Minimum total smearing     : 0.0907 ms
--------------------------------------------
Minimum channel smearing   : 0 ms
Minimum smearing across BW : 0.00629 ms
Minimum sample time        : 0.064 ms

Setting the new 'best' resolution to : 0.5 ms
   Note: ok_smearing > dt (i.e. data is higher resolution than needed)
         New dt is 4 x 0.064 ms = 0.256 ms
Best guess for optimal initial dDM is 0.407

Low DM    High DM     dDM  DownSamp  dsubDM   #DMs  DMs/call  calls  WorkFract
  0.000    585.000    0.30       4   15.00    1950      50      39    0.8211
585.000   1010.000    0.50       8   25.00     850      50      17    0.1789

Dedispersing the data

The table above returned by DDplan.py defines the sequence of calls to make to prepsubband. In the most recent versions of PRESTO, DDplan.py writes a python script that directly makes the right sequence of calls to prepsubband, otherwise it has to be generated by other means (a custom script) or the calls have to be made manually (not recommended).

For example, covering the DM range 0 to 585 requires 39 consecutive calls to prepsubband, each producing 50 DM trials spaced by a step of 0.30.

$ prepsubband -lodm 0.0 -dmstep 0.3 -numdms 50 -nsub 64 -downsamp 4 observation.fil
$ prepsubband -lodm 15.0 -dmstep 0.3 -numdms 50 -nsub 64 -downsamp 4 observation.fil
[...]
$ prepsubband -lodm 570.0 -dmstep 0.3 -numdms 50 -nsub 64 -downsamp 4 observation.fil

See the PRESTO tutorial for more details. Once all calls to prepsubband have been made, we can search the resulting set of DM trials, which will consist of pairs of .inf (header) and .dat (binary data) files.

Configuring the riptide pipeline

The rffa application is highly flexible and takes a YAML configuration file as an input. A model configuration file, with detailed comments, can be found in the repository. This should be your starting point. Most parameters are mandatory. If the configuration file is malformed, the rffa application will raise an Exception with a helpul error message.

Number of parallel processes

The first parameter is the number of parallel processes to use for the search; each process goes through one DM trial at a time. This should be the number of cores available for the search; if you are running the code on a SLURM supercomputing facility, this should be equal to cpus-per-task.

processes: 8

Data format and band parameters

Since version 0.2.0, riptide reads the observing band parameters directly from the input .inf files when using PRESTO for dedispersion. However, when using SIGPROC’s dedispersion routine, the DM trial files do not contain that information, and it must be specified in the config file. These parameters are important at various stages of the search process.

# Input format, either 'presto' or 'sigproc'
format: presto

### Observing band parameters: leave blank except for SIGPROC input data
# Minimum observing frequency in MHz
fmin:

# Maximum observing frequency in MHz
fmax:

# Number of channels in the data
nchans:

DM trial selection

Although the pipeline can be passed a specific list of DM trial files to search, a more practical option is to pass all DM trial files and use the options below to select only a certain DM range.

dmselect:
   # Minimum DM trial in pc cm^{-3}
   # If left blank, start at the minimum available trial DM
   min: 0.0

   # Maximum DM trial in pc cm^{-3}
   # This is a hard limit, regardless of sky coordinates (see below)
   # If left blank, stop at the maximum available trial DM
   max: 1000.0

   # Maximum value of Trial_DM x |sin b| where b is the Galactic latitude of the observation.
   # This is a simple method to limit the maximum trial DM as a function of Galactic coordinates
   # Almost no Galactic pulsars are known to have DM x |sin b| > 40
   # If left blank, no latitude-dependent cap on the maximum trial DM is applied
   dmsinb_max: 45.0

Red noise subtraction

This section mirrors the parameters passed to the dereddining function. See riptide.TimeSeries.deredden()

dereddening:
   # Width of the running median window in seconds used by the median subtraction
   # routine before searching the input time series
   rmed_width: 5.0

   # 'minpts' parameter passed to the ffa_search() function
   rmed_minpts: 101

Defining the search space

This section defines a list of search ranges, each with a minimum and maximum trial period, and a duty cycle resolution specified via a minimum and maximum number of phase bins. Here the idea is to use more phase bins for longer search periods. Each range in the list has three sections:

ffa_search: The list of parameters passed to the riptide.ffa_search() function. Any unspecified parameters will be set to the default values in the function definition.
find_peaks: The list of parameters passed to the riptide.find_peaks() function. Unspecified parameters are also set to their default values.
candidates: The number of phase bins and sub-integrations in the candidate files produced when searching this period range.

The name attribute is only for logging purposes and can be set to anything.

ranges:
   - name: 'short'
     ffa_search:
         period_min: 0.20
         period_max: 1.00
         bins_min: 240
         bins_max: 260

     find_peaks:
         smin: 6.0

     candidates:
         bins: 256
         subints: 32

   - name: 'medium'
     ffa_search:
         period_min: 1.00
         period_max: 5.00
         bins_min: 480
         bins_max: 520

     find_peaks:
         smin: 6.0

     candidates:
         bins: 512
         subints: 32

   - name: 'long'
     ffa_search:
         period_min: 5.00
         period_max: 180.00
         bins_min: 960
         bins_max: 1040

     find_peaks:
         smin: 6.0

     candidates:
         bins: 1024
         subints: 32

Peak clustering and harmonic flagging

These parameters control how the many periodogram peaks found during the search across all DM trials are clustered into candidates, and how the candidates deemed to be a harmonic of another are removed. They should be left to their default values unless there is a good reason to. The default parameters for harmonic flagging are conservative; they should very rarely flag a real pulsar as a harmonic of a brighter RFI instance.

# Parameters of the peak clustering that is performed once all DM trials have
# been searched
clustering:
   # Clustering radius in units of 1 / Tobs
   # Two peaks whose frequencies are within (clrad / Tobs) Hz of each other
   # are considered part of the same cluster
   radius: 0.2


# Harmonic flagging parameters
# See the docstring of the htest() function in harmonic_testing.py for details
# NOTE: this is only a flagging operation, the actual *removal* of candidates
# flagged as harmonics is entirely optional, see below
harmonic_flagging:
   denom_max: 100
   phase_distance_max: 1.0
   dm_distance_max: 3.0
   snr_distance_max: 3.0

Candidate filters and plotting

Right before producing candidate files and/or plots, a list of manual filters can be applied. Candidate plots can also be generated automatically.

# Filters applied to the final list of clusters, *just before* the associated
# candidate files are produced.
# The cap on candidate number is applied last, after all unworthy candidates have been removed
# Any of these fields can be left empty, in which case the corresponding filter is NOT applied
candidate_filters:
   dm_min:
   snr_min: 7.0
   remove_harmonics: True
   max_number:


# If True, save a PNG plot for every candidate
# Candidate files can always be loaded and plotted later
plot_candidates: True

Running the Pipeline

Once the pipeline configuration file is ready, the pipeline application rffa takes two mandatory arguments: the config file via -c option and a list of all the DM trial files to search. For example:

rffa -c myConfig.yml dedispersed_data/*.inf

There are additional options, e.g. to set a specific output directory or save a log file. See rffa --help.

Note

rffa runs its own internal dedispersion plan to “thin out” the list of DM trials and select the minimum amount necessary to cover the DM range. The actual DM step it chooses is as a function of the minimum pulse width being searched (as specified in the YAML config file). This is a design choice; rffa can be run along with a standard FFT-based search code and ingest the same set of dedispersed time series files. Indeed the DM step required for millisecond pulsar searches is much smaller than for ordinary pulsars.

Data products

Once the pipeline finishes, the following data products will be written in the specified output directory:

A CSV table of all detected periodogram peaks across all DM trials
A CSV table of clusters, obtained by grouping together peaks with frequencies close to each other
A CSV table of candidates, which will have the same entries as the clusters table, unless you have enabled harmonic filtering in the config file. In this case any cluster that was flagged as a harmonic of another is removed from the final candidate list.
One JSON file per riptide.Candidate object, which can be loaded using riptide.load_json() and plotted / manipulated. These contain header information, a table of peaks associated to the candidate, and a sub-integration plots obtained by folding the DM trial at which they were detected with the highest S/N.
One PNG plot per candidate, if the associated option was enabled in the configuration file.