Using the Pipeline
riptide comes with a pipeline application, rffa, that can search an observation across a
range of dispersion measure (DM) trials, using many CPUs in parallel. rffa automatically
performs the sequence of steps described in Quickstart Guide for all input DM trials, groups
the detected peaks into sensible clusters, and then produces a candidate file and plot for each
cluster thus found. Here we cover how to search an observation for sources with an unknown DM.
riptide does not have (yet) a dedispersion engine, so you must take care of dedispersing your
multi-channel observation data using another software package. Here we will use PRESTO, but the
general process is similar regardless of the dedispersion engine used. If you are not familiar
with PRESTO, please first have a look at the PRESTO tutorial and documentation.
RFI Mitigation
Before actually dedispersing, a sensible starting point is to produce a radio-frequency
interference (RFI) mask using rfifind. It will scan the input data for bad frequency channels
and time intervals, and save them to a so-called mask file. This file can then be read by the
dedispersion utility prepsubband which can replace the bad data sections with a sensible value.
This will remove a significant amount of spurious candidates from the search output down the line.
Choosing the optimal DM step
The next stage is to calculate how closely the consecutive DM trials should be spaced, which in
PRESTO is done using DDplan.py. The ideal DM step size is a function of:
The observing band parameters: centre frequency, bandwidth, number of channels
The sampling time of the data
The minimum pulse width being searched for. When running
DDplan.pythis is the resolution (-r) command-line argument: it should be the minimum period you plan to search (period_min), divided by the minimum number of phase bins (bins_min) you plan to use.
Example: We want to search a Parkes multibeam receiver observation, with a centre frequency of
1382 MHz, a bandwidth of 400 MHz, 1024 frequency channels, and a sampling interval of 64
microseconds. We will instruct prepsubband to use 64 sub-bands for dedispersion (-s 64
below), and want to cover DMs from 0 to 1,000. We will then use riptide to search for periods
down to 100 ms using at least 200 phase bins, which amounts to a minimum pulse width of 0.5 ms
(-r 0.5). The call to DDplan.py is thus:
$ DDplan.py --loDM 0.0 --hiDM 1000.0 -f 1382.0 -b 400.0 -n 1024 -s 64 -t 0.000064 -r 0.5
Minimum total smearing : 0.0907 ms
--------------------------------------------
Minimum channel smearing : 0 ms
Minimum smearing across BW : 0.00629 ms
Minimum sample time : 0.064 ms
Setting the new 'best' resolution to : 0.5 ms
Note: ok_smearing > dt (i.e. data is higher resolution than needed)
New dt is 4 x 0.064 ms = 0.256 ms
Best guess for optimal initial dDM is 0.407
Low DM High DM dDM DownSamp dsubDM #DMs DMs/call calls WorkFract
0.000 585.000 0.30 4 15.00 1950 50 39 0.8211
585.000 1010.000 0.50 8 25.00 850 50 17 0.1789
Dedispersing the data
The table above returned by DDplan.py defines the sequence of calls to make to prepsubband.
In the most recent versions of PRESTO, DDplan.py writes a python script that directly makes the
right sequence of calls to prepsubband, otherwise it has to be generated by other means (a
custom script) or the calls have to be made manually (not recommended).
For example, covering the DM range 0 to 585 requires 39 consecutive calls to prepsubband, each producing 50 DM trials spaced by a step of 0.30.
$ prepsubband -lodm 0.0 -dmstep 0.3 -numdms 50 -nsub 64 -downsamp 4 observation.fil
$ prepsubband -lodm 15.0 -dmstep 0.3 -numdms 50 -nsub 64 -downsamp 4 observation.fil
[...]
$ prepsubband -lodm 570.0 -dmstep 0.3 -numdms 50 -nsub 64 -downsamp 4 observation.fil
See the PRESTO tutorial for more details. Once all calls to prepsubband have been made, we
can search the resulting set of DM trials, which will consist of pairs of .inf (header)
and .dat (binary data) files.
Configuring the riptide pipeline
The rffa application is highly flexible and takes a YAML configuration file as
an input. A model configuration file, with detailed comments, can be found in the repository.
This should be your starting point. Most parameters are mandatory. If the configuration file is
malformed, the rffa application will raise an Exception with a helpul error message.
Number of parallel processes
The first parameter is the number of parallel processes to use for the search; each process goes
through one DM trial at a time. This should be the number of cores available for the search; if
you are running the code on a SLURM supercomputing facility, this should be equal to
cpus-per-task.
processes: 8
Data format and band parameters
Since version 0.2.0, riptide reads the observing band parameters directly from the input .inf files when using PRESTO for dedispersion.
However, when using SIGPROC’s dedispersion routine, the DM trial files do not contain that information, and it must be specified in the config file.
These parameters are important at various stages of the search process.
# Input format, either 'presto' or 'sigproc'
format: presto
### Observing band parameters: leave blank except for SIGPROC input data
# Minimum observing frequency in MHz
fmin:
# Maximum observing frequency in MHz
fmax:
# Number of channels in the data
nchans:
DM trial selection
Although the pipeline can be passed a specific list of DM trial files to search, a more practical option is to pass all DM trial files and use the options below to select only a certain DM range.
dmselect:
# Minimum DM trial in pc cm^{-3}
# If left blank, start at the minimum available trial DM
min: 0.0
# Maximum DM trial in pc cm^{-3}
# This is a hard limit, regardless of sky coordinates (see below)
# If left blank, stop at the maximum available trial DM
max: 1000.0
# Maximum value of Trial_DM x |sin b| where b is the Galactic latitude of the observation.
# This is a simple method to limit the maximum trial DM as a function of Galactic coordinates
# Almost no Galactic pulsars are known to have DM x |sin b| > 40
# If left blank, no latitude-dependent cap on the maximum trial DM is applied
dmsinb_max: 45.0
Red noise subtraction
This section mirrors the parameters passed to the dereddining function. See riptide.TimeSeries.deredden()
dereddening:
# Width of the running median window in seconds used by the median subtraction
# routine before searching the input time series
rmed_width: 5.0
# 'minpts' parameter passed to the ffa_search() function
rmed_minpts: 101
Defining the search space
This section defines a list of search ranges, each with a minimum and maximum trial period, and a duty cycle resolution specified via a minimum and maximum number of phase bins. Here the idea is to use more phase bins for longer search periods. Each range in the list has three sections:
ffa_search: The list of parameters passed to theriptide.ffa_search()function. Any unspecified parameters will be set to the default values in the function definition.find_peaks: The list of parameters passed to theriptide.find_peaks()function. Unspecified parameters are also set to their default values.candidates: The number of phase bins and sub-integrations in the candidate files produced when searching this period range.
The name attribute is only for logging purposes and can be set to anything.
ranges:
- name: 'short'
ffa_search:
period_min: 0.20
period_max: 1.00
bins_min: 240
bins_max: 260
find_peaks:
smin: 6.0
candidates:
bins: 256
subints: 32
- name: 'medium'
ffa_search:
period_min: 1.00
period_max: 5.00
bins_min: 480
bins_max: 520
find_peaks:
smin: 6.0
candidates:
bins: 512
subints: 32
- name: 'long'
ffa_search:
period_min: 5.00
period_max: 180.00
bins_min: 960
bins_max: 1040
find_peaks:
smin: 6.0
candidates:
bins: 1024
subints: 32
Peak clustering and harmonic flagging
These parameters control how the many periodogram peaks found during the search across all DM trials are clustered into candidates, and how the candidates deemed to be a harmonic of another are removed. They should be left to their default values unless there is a good reason to. The default parameters for harmonic flagging are conservative; they should very rarely flag a real pulsar as a harmonic of a brighter RFI instance.
# Parameters of the peak clustering that is performed once all DM trials have
# been searched
clustering:
# Clustering radius in units of 1 / Tobs
# Two peaks whose frequencies are within (clrad / Tobs) Hz of each other
# are considered part of the same cluster
radius: 0.2
# Harmonic flagging parameters
# See the docstring of the htest() function in harmonic_testing.py for details
# NOTE: this is only a flagging operation, the actual *removal* of candidates
# flagged as harmonics is entirely optional, see below
harmonic_flagging:
denom_max: 100
phase_distance_max: 1.0
dm_distance_max: 3.0
snr_distance_max: 3.0
Candidate filters and plotting
Right before producing candidate files and/or plots, a list of manual filters can be applied. Candidate plots can also be generated automatically.
# Filters applied to the final list of clusters, *just before* the associated
# candidate files are produced.
# The cap on candidate number is applied last, after all unworthy candidates have been removed
# Any of these fields can be left empty, in which case the corresponding filter is NOT applied
candidate_filters:
dm_min:
snr_min: 7.0
remove_harmonics: True
max_number:
# If True, save a PNG plot for every candidate
# Candidate files can always be loaded and plotted later
plot_candidates: True
Running the Pipeline
Once the pipeline configuration file is ready, the pipeline application rffa takes two mandatory arguments: the config file via -c option and a list of all the DM trial files to search. For example:
rffa -c myConfig.yml dedispersed_data/*.inf
There are additional options, e.g. to set a specific output directory or save a log file. See rffa --help.
Note
rffa runs its own internal dedispersion plan to “thin out” the list of DM trials and select the minimum amount necessary to cover the DM range. The actual DM step it chooses is as a function of the minimum pulse width being searched (as specified in the YAML config file).
This is a design choice; rffa can be run along with a standard FFT-based search code and ingest the same set of dedispersed time series files. Indeed the DM step required for millisecond pulsar searches is much smaller than for ordinary pulsars.
Data products
Once the pipeline finishes, the following data products will be written in the specified output directory:
A CSV table of all detected periodogram peaks across all DM trials
A CSV table of clusters, obtained by grouping together peaks with frequencies close to each other
A CSV table of candidates, which will have the same entries as the clusters table, unless you have enabled harmonic filtering in the config file. In this case any cluster that was flagged as a harmonic of another is removed from the final candidate list.
One JSON file per
riptide.Candidateobject, which can be loaded usingriptide.load_json()and plotted / manipulated. These contain header information, a table of peaks associated to the candidate, and a sub-integration plots obtained by folding the DM trial at which they were detected with the highest S/N.One PNG plot per candidate, if the associated option was enabled in the configuration file.