hifast.bld Baseline Fitting#
Overview#
The hifast.bld module is designed for fitting and subtracting baselines from spectral data. This is a critical step in radio astronomy data reduction to remove instrumental or environmental effects and isolate the astronomical signal.
The module supports various fitting algorithms (e.g., PLS, polynomial, spline) and includes robust methods to exclude signal regions during fitting. It also offers preprocessing options (smoothing, binning) and an interactive mode for parameter tuning.
Workflow#
The baseline fitting process typically involves three stages:
%%{init: {'themeVariables': { 'fontSize': '50px'}, 'flowchart': {'diagramPadding': 0}}}%%
graph TD
A[Input Data] --> B[Preprocessing]
B --> C[Baseline Determination]
A --> D[Subtraction]
C --> D
D --> E{Post-processing?}
E -- Yes --> F[Post-processing]
E -- No --> G[Output Data]
F --> G
subgraph Preprocessing
B1[Time Averaging/Smoothing]
B2[Frequency Smoothing/Binning]
end
subgraph Fitting
C1[Method Selection]
C2[Iterative Reweighting]
C3[Signal Exclusion]
end
B -.-> B1
B -.-> B2
C -.-> C1
C -.-> C2
C -.-> C3
Preprocessing (Optional) Data can be smoothed or averaged along the time or frequency axis to improve the signal-to-noise ratio for baseline estimation. This is configured using parameters like
--njoin(time averaging) or--s_method_freq(frequency smoothing).Note
The preprocessed data is used only for determining the baseline. The calculated baseline is then subtracted from the original (unprocessed) data, preserving the spectral resolution and signal characteristics.
Important
Time Domain Fitting (-T)
You can fit the baseline along the time axis instead of the frequency axis by using the
-T(or--trans) flag.Crucially, enabling this option transposes the data dimensions, which swaps the physical meaning of the preprocessing parameters:
Time-related parameters (e.g.,
--njoin,--s_method_t) will effectively apply to the Frequency axis.Frequency-related parameters (e.g.,
--average_every_freq,--s_method_freq) will effectively apply to the Time axis.
Baseline Fitting & Subtraction The core step involves fitting a model to the baseline.
Method Selection: Choose a method using
--method. Common choices include:arPLS: Asymmetrically Reweighted Penalized Least Squares (robust and popular).poly-asym1: Polynomial fitting with asymmetric reweighting.spline-asym1: Spline fitting.Gauss-asym1: Gaussian smoothing based baseline.
Parameter Tuning:
--lam: Smoothing parameter for PLS, Spline, and Gauss methods. Larger values result in a stiffer (smoother) baseline.--deg: Degree for polynomial or spline methods.
Signal Exclusion: The module iteratively reweights data to ignore signal regions (lines).
--exclude_add: Additional automatic exclusion logic.auto1: Excludes points where weights are very low (< 0.01) and extends the region.auto2: Uses Gaussian filtering on residuals to identify and exclude outliers (> 3 sigma).
--src_file: Provide a catalog to explicitly mask known sources.
Post-processing (Optional) If strong time-averaging was used in preprocessing, a secondary “post-processing” step on individual spectra might be necessary to remove residual baseline structures. This is configured using
--post_*arguments.
Examples#
# Basic usage with default arPLS method
python -m hifast.bld data_flux.hdf5
# Use a polynomial fit of degree 1 (linear)
python -m hifast.bld data_flux.hdf5 --method poly-asym1 --deg 1
# Preprocess by averaging 10 time samples for stable baseline estimation
python -m hifast.bld data_flux.hdf5 --njoin 10
# Interactive mode to tune parameters
python -m hifast.bld data_flux.hdf5 -i --length 50
Full Parameter Reference#
Tip
You can also view the full list of parameters and their descriptions directly in your terminal by running:
python -m hifast.bld --help
Fit and subtract the baseline from spectral data.
usage: python -m hifast.bld [-h] [--outdir OUTDIR] [-f] [-g G] [-c MY_CONFIG]
[--frange START END] [--no_radec]
[--show_prog {True,False}] [--nproc N]
[--method {none,PLS-asym1,PLS-asym2,PLS-asym3,PLS-sym1,poly-asym1,poly-asym2,poly-asym3,poly-sym1,knpoly-asym1,knpoly-asym2,knpoly-asym3,knpoly-sym1,Gauss-asym1,Gauss-asym2,Gauss-asym3,Gauss-sym1,knspline-asym1,knspline-asym2,knspline-asym3,knspline-sym1,masPLS-asym1,masPLS-asym2,masPLS-asym3,masPLS-sym1,asPLS,arPLS,original}]
[--lam LAMBDA] [--deg DEG] [--knots JSON_FILE]
[--offset OFFSET] [--ratio RATIO] [--niter N]
[--exclude_add {none,auto1,auto2}]
[-T {True,False}] [--njoin N]
[--s_method_t {none,gaussian,boxcar,median}]
[--s_sigma_t SIGMA]
[--s_method_freq {none,gaussian,boxcar,median,PLS,fft}]
[--s_sigma_freq SIGMA] [--average_every_freq N]
[--use_pre_is_excluded {True,False}]
[--src_file CATALOG]
[--frame {BARYCENT,HELIOCEN,LSRK,LSRD}]
[--post_method {none,poly-asym1,poly-asym2,poly-asym3,poly-sym1}]
[--post_s_method_freq {none,gaussian,boxcar,median,PLS,fft}]
[--post_s_sigma_freq SIGMA]
[--post_average_every_freq N] [--post_deg DEG]
[--post_offset POST_OFFSET]
[--post_ratio POST_RATIO] [--post_niter N]
[--post_exclude_add {none,auto1,auto2}] [-i]
[--ylim YLIM [YLIM ...]] [--figsize W H]
[--length N] [--start_init INDEX]
FILE
Positional Arguments#
- FILE
Path to the input HDF5 file containing spectra (e.g., “data.hdf5”).
Named Arguments#
- --outdir
The directory used to save output file, default is same with the input file
Default: “default”
- -f
if set, overwriting file if output file exists
Default: False
- -g
save config to file path
- -c, --my-config
config file path
- --frange
Frequency range to process [MHz]. Default: process all.
Default: [0, inf]
- --no_radec, --no-radec
Skip verification/addition of RA/Dec coordinates.
Default: False
- --show_prog, --show-prog
Possible choices: True, False
Show progress bar during processing.
Default: True
- --nproc, -n
Number of parallel processes to use.
Default: 1
Baseline Fitting Method#
- --method
Possible choices: none, PLS-asym1, PLS-asym2, PLS-asym3, PLS-sym1, poly-asym1, poly-asym2, poly-asym3, poly-sym1, knpoly-asym1, knpoly-asym2, knpoly-asym3, knpoly-sym1, Gauss-asym1, Gauss-asym2, Gauss-asym3, Gauss-sym1, knspline-asym1, knspline-asym2, knspline-asym3, knspline-sym1, masPLS-asym1, masPLS-asym2, masPLS-asym3, masPLS-sym1, asPLS, arPLS, original
Baseline fitting algorithm. “arPLS” is robust for most cases. “poly-*” uses polynomial fitting.
Default: “arPLS”
- --lam
Smoothing parameter (lambda) for PLS, Spline, and Gauss methods. Larger values = smoother (stiffer) baseline. Typical range: 1e4 - 1e9.
Default: 100000000.0
- --deg
Degree for polynomial or spline methods.
Default: 2
- --knots
Path to JSON file with knots for spline methods.
- --offset
Offset parameter.
Default: 2
- --ratio
Ratio parameter.
Default: 0.01
- --niter
Maximum number of iterations.
Default: 100
- --exclude_add, --exclude_type, --exclude-add, --exclude-type
Possible choices: none, auto1, auto2
Auto-exclusion method. “auto1”: exclude low weights; “auto2”: exclude outliers (>3 sigma).
Default: “none”
Preprocessing (Before Fitting)#
- -T, --trans
Possible choices: True, False
Fit baseline along TIME axis instead of FREQUENCY axis. WARNING: This swaps the roles of time and frequency parameters.
Default: False
- --njoin, --njoin_t, --average_every_t, --njoin-t, --average-every-t
Average N adjacent time samples to increase SNR before fitting. Reduces time resolution of the baseline model.
Default: 0
- --s_method_t, --s-method-t
Possible choices: none, gaussian, boxcar, median
Smoothing method along TIME axis to suppress noise. Options: gaussian, boxcar, median.
Default: “none”
- --s_sigma_t, --s-sigma-t
Width parameter for time-axis smoothing (sigma for Gaussian, window size for others).
Default: 5
- --s_method_freq, --s-method-freq
Possible choices: none, gaussian, boxcar, median, PLS, fft
Smoothing method along FREQUENCY axis.
Default: “none”
- --s_sigma_freq, --s-sigma-freq
Width parameter for frequency-axis smoothing.
Default: 5
- --average_every_freq, --njoin_freq, --average-every-freq, --njoin-freq
Binning factor along FREQUENCY axis.
Default: 0
Exclusion Settings#
- --use_pre_is_excluded, --use-pre-is-excluded
Possible choices: True, False
Use existing “is_excluded” mask from input file (e.g., from previous RFI flagging).
Default: False
- --src_file, --src-file
Path to source catalog file for masking known sources (columns: RA, Dec, Radius, MinFreq, MaxFreq).
- --frame
Possible choices: BARYCENT, HELIOCEN, LSRK, LSRD
Velocity frame for source masking.
Default: “LSRK”
Post-processing (Optional)#
- --post_method, --post-method
Possible choices: none, poly-asym1, poly-asym2, poly-asym3, poly-sym1
Method for secondary baseline fitting on individual spectra (e.g., to remove residuals after time-averaged fitting).
Default: “none”
- --post_s_method_freq, --post-s-method-freq
Possible choices: none, gaussian, boxcar, median, PLS, fft
Smoothing method for post-processing.
Default: “none”
- --post_s_sigma_freq, --post-s-sigma-freq
Smoothing sigma for post-processing.
Default: 5
- --post_average_every_freq, --post-average-every-freq
Binning factor for post-processing.
Default: 0
- --post_deg, --post-deg
Polynomial degree for post-processing.
Default: 2
- --post_offset, --post-offset
Offset for post-processing.
Default: 2
- --post_ratio, --post-ratio
Ratio for post-processing.
Default: 0.01
- --post_niter, --post-niter
Iterations for post-processing.
Default: 100
- --post_exclude_add, --post_exclude_type, --post-exclude-add, --post-exclude-type
Possible choices: none, auto1, auto2
Exclusion method for post-processing.
Default: “none”
Interaction#
- -i, --interact
Enable interactive mode.
Default: False
- --ylim
Y-axis limits for plots (e.g., “auto” or “-1 5”).
Default: [‘auto’]
- --figsize
Figure size in inches.
Default: (10, 7)
- --length
Number of spectra to test in interactive mode.
Default: 20
- --start_init, --start-init
Starting spectrum index.
Default: 0