hifast.bld Baseline Fitting#

Overview#

The hifast.bld module is designed for fitting and subtracting baselines from spectral data. This is a critical step in radio astronomy data reduction to remove instrumental or environmental effects and isolate the astronomical signal.

The module supports various fitting algorithms (e.g., PLS, polynomial, spline) and includes robust methods to exclude signal regions during fitting. It also offers preprocessing options (smoothing, binning) and an interactive mode for parameter tuning.

Workflow#

The baseline fitting process typically involves three stages:

        %%{init: {'themeVariables': { 'fontSize': '50px'}, 'flowchart': {'diagramPadding': 0}}}%%
graph TD
   A[Input Data] --> B[Preprocessing]
   B --> C[Baseline Determination]
   A --> D[Subtraction]
   C --> D
   D --> E{Post-processing?}
   E -- Yes --> F[Post-processing]
   E -- No --> G[Output Data]
   F --> G

   subgraph Preprocessing
      B1[Time Averaging/Smoothing]
      B2[Frequency Smoothing/Binning]
   end

   subgraph Fitting
      C1[Method Selection]
      C2[Iterative Reweighting]
      C3[Signal Exclusion]
   end

   B -.-> B1
   B -.-> B2
   C -.-> C1
   C -.-> C2
   C -.-> C3
    
  1. Preprocessing (Optional) Data can be smoothed or averaged along the time or frequency axis to improve the signal-to-noise ratio for baseline estimation. This is configured using parameters like --njoin (time averaging) or --s_method_freq (frequency smoothing).

    Note

    The preprocessed data is used only for determining the baseline. The calculated baseline is then subtracted from the original (unprocessed) data, preserving the spectral resolution and signal characteristics.

    Important

    Time Domain Fitting (-T)

    You can fit the baseline along the time axis instead of the frequency axis by using the -T (or --trans) flag.

    Crucially, enabling this option transposes the data dimensions, which swaps the physical meaning of the preprocessing parameters:

    • Time-related parameters (e.g., --njoin, --s_method_t) will effectively apply to the Frequency axis.

    • Frequency-related parameters (e.g., --average_every_freq, --s_method_freq) will effectively apply to the Time axis.

  2. Baseline Fitting & Subtraction The core step involves fitting a model to the baseline.

    • Method Selection: Choose a method using --method. Common choices include:

      • arPLS: Asymmetrically Reweighted Penalized Least Squares (robust and popular).

      • poly-asym1: Polynomial fitting with asymmetric reweighting.

      • spline-asym1: Spline fitting.

      • Gauss-asym1: Gaussian smoothing based baseline.

    • Parameter Tuning:

      • --lam: Smoothing parameter for PLS, Spline, and Gauss methods. Larger values result in a stiffer (smoother) baseline.

      • --deg: Degree for polynomial or spline methods.

    • Signal Exclusion: The module iteratively reweights data to ignore signal regions (lines).

      • --exclude_add: Additional automatic exclusion logic.

        • auto1: Excludes points where weights are very low (< 0.01) and extends the region.

        • auto2: Uses Gaussian filtering on residuals to identify and exclude outliers (> 3 sigma).

      • --src_file: Provide a catalog to explicitly mask known sources.

  3. Post-processing (Optional) If strong time-averaging was used in preprocessing, a secondary “post-processing” step on individual spectra might be necessary to remove residual baseline structures. This is configured using --post_* arguments.

Examples#

# Basic usage with default arPLS method
python -m hifast.bld data_flux.hdf5

# Use a polynomial fit of degree 1 (linear)
python -m hifast.bld data_flux.hdf5 --method poly-asym1 --deg 1

# Preprocess by averaging 10 time samples for stable baseline estimation
python -m hifast.bld data_flux.hdf5 --njoin 10

# Interactive mode to tune parameters
python -m hifast.bld data_flux.hdf5 -i --length 50

Full Parameter Reference#

Tip

You can also view the full list of parameters and their descriptions directly in your terminal by running:

python -m hifast.bld --help

Fit and subtract the baseline from spectral data.

usage: python -m hifast.bld [-h] [--outdir OUTDIR] [-f] [-g G] [-c MY_CONFIG]
                            [--frange START END] [--no_radec]
                            [--show_prog {True,False}] [--nproc N]
                            [--method {none,PLS-asym1,PLS-asym2,PLS-asym3,PLS-sym1,poly-asym1,poly-asym2,poly-asym3,poly-sym1,knpoly-asym1,knpoly-asym2,knpoly-asym3,knpoly-sym1,Gauss-asym1,Gauss-asym2,Gauss-asym3,Gauss-sym1,knspline-asym1,knspline-asym2,knspline-asym3,knspline-sym1,masPLS-asym1,masPLS-asym2,masPLS-asym3,masPLS-sym1,asPLS,arPLS,original}]
                            [--lam LAMBDA] [--deg DEG] [--knots JSON_FILE]
                            [--offset OFFSET] [--ratio RATIO] [--niter N]
                            [--exclude_add {none,auto1,auto2}]
                            [-T {True,False}] [--njoin N]
                            [--s_method_t {none,gaussian,boxcar,median}]
                            [--s_sigma_t SIGMA]
                            [--s_method_freq {none,gaussian,boxcar,median,PLS,fft}]
                            [--s_sigma_freq SIGMA] [--average_every_freq N]
                            [--use_pre_is_excluded {True,False}]
                            [--src_file CATALOG]
                            [--frame {BARYCENT,HELIOCEN,LSRK,LSRD}]
                            [--post_method {none,poly-asym1,poly-asym2,poly-asym3,poly-sym1}]
                            [--post_s_method_freq {none,gaussian,boxcar,median,PLS,fft}]
                            [--post_s_sigma_freq SIGMA]
                            [--post_average_every_freq N] [--post_deg DEG]
                            [--post_offset POST_OFFSET]
                            [--post_ratio POST_RATIO] [--post_niter N]
                            [--post_exclude_add {none,auto1,auto2}] [-i]
                            [--ylim YLIM [YLIM ...]] [--figsize W H]
                            [--length N] [--start_init INDEX]
                            FILE

Positional Arguments#

FILE

Path to the input HDF5 file containing spectra (e.g., “data.hdf5”).

Named Arguments#

--outdir

The directory used to save output file, default is same with the input file

Default: “default”

-f

if set, overwriting file if output file exists

Default: False

-g

save config to file path

-c, --my-config

config file path

--frange

Frequency range to process [MHz]. Default: process all.

Default: [0, inf]

--no_radec, --no-radec

Skip verification/addition of RA/Dec coordinates.

Default: False

--show_prog, --show-prog

Possible choices: True, False

Show progress bar during processing.

Default: True

--nproc, -n

Number of parallel processes to use.

Default: 1

Baseline Fitting Method#

--method

Possible choices: none, PLS-asym1, PLS-asym2, PLS-asym3, PLS-sym1, poly-asym1, poly-asym2, poly-asym3, poly-sym1, knpoly-asym1, knpoly-asym2, knpoly-asym3, knpoly-sym1, Gauss-asym1, Gauss-asym2, Gauss-asym3, Gauss-sym1, knspline-asym1, knspline-asym2, knspline-asym3, knspline-sym1, masPLS-asym1, masPLS-asym2, masPLS-asym3, masPLS-sym1, asPLS, arPLS, original

Baseline fitting algorithm. “arPLS” is robust for most cases. “poly-*” uses polynomial fitting.

Default: “arPLS”

--lam

Smoothing parameter (lambda) for PLS, Spline, and Gauss methods. Larger values = smoother (stiffer) baseline. Typical range: 1e4 - 1e9.

Default: 100000000.0

--deg

Degree for polynomial or spline methods.

Default: 2

--knots

Path to JSON file with knots for spline methods.

--offset

Offset parameter.

Default: 2

--ratio

Ratio parameter.

Default: 0.01

--niter

Maximum number of iterations.

Default: 100

--exclude_add, --exclude_type, --exclude-add, --exclude-type

Possible choices: none, auto1, auto2

Auto-exclusion method. “auto1”: exclude low weights; “auto2”: exclude outliers (>3 sigma).

Default: “none”

Preprocessing (Before Fitting)#

-T, --trans

Possible choices: True, False

Fit baseline along TIME axis instead of FREQUENCY axis. WARNING: This swaps the roles of time and frequency parameters.

Default: False

--njoin, --njoin_t, --average_every_t, --njoin-t, --average-every-t

Average N adjacent time samples to increase SNR before fitting. Reduces time resolution of the baseline model.

Default: 0

--s_method_t, --s-method-t

Possible choices: none, gaussian, boxcar, median

Smoothing method along TIME axis to suppress noise. Options: gaussian, boxcar, median.

Default: “none”

--s_sigma_t, --s-sigma-t

Width parameter for time-axis smoothing (sigma for Gaussian, window size for others).

Default: 5

--s_method_freq, --s-method-freq

Possible choices: none, gaussian, boxcar, median, PLS, fft

Smoothing method along FREQUENCY axis.

Default: “none”

--s_sigma_freq, --s-sigma-freq

Width parameter for frequency-axis smoothing.

Default: 5

--average_every_freq, --njoin_freq, --average-every-freq, --njoin-freq

Binning factor along FREQUENCY axis.

Default: 0

Exclusion Settings#

--use_pre_is_excluded, --use-pre-is-excluded

Possible choices: True, False

Use existing “is_excluded” mask from input file (e.g., from previous RFI flagging).

Default: False

--src_file, --src-file

Path to source catalog file for masking known sources (columns: RA, Dec, Radius, MinFreq, MaxFreq).

--frame

Possible choices: BARYCENT, HELIOCEN, LSRK, LSRD

Velocity frame for source masking.

Default: “LSRK”

Post-processing (Optional)#

--post_method, --post-method

Possible choices: none, poly-asym1, poly-asym2, poly-asym3, poly-sym1

Method for secondary baseline fitting on individual spectra (e.g., to remove residuals after time-averaged fitting).

Default: “none”

--post_s_method_freq, --post-s-method-freq

Possible choices: none, gaussian, boxcar, median, PLS, fft

Smoothing method for post-processing.

Default: “none”

--post_s_sigma_freq, --post-s-sigma-freq

Smoothing sigma for post-processing.

Default: 5

--post_average_every_freq, --post-average-every-freq

Binning factor for post-processing.

Default: 0

--post_deg, --post-deg

Polynomial degree for post-processing.

Default: 2

--post_offset, --post-offset

Offset for post-processing.

Default: 2

--post_ratio, --post-ratio

Ratio for post-processing.

Default: 0.01

--post_niter, --post-niter

Iterations for post-processing.

Default: 100

--post_exclude_add, --post_exclude_type, --post-exclude-add, --post-exclude-type

Possible choices: none, auto1, auto2

Exclusion method for post-processing.

Default: “none”

Interaction#

-i, --interact

Enable interactive mode.

Default: False

--ylim

Y-axis limits for plots (e.g., “auto” or “-1 5”).

Default: [‘auto’]

--figsize

Figure size in inches.

Default: (10, 7)

--length

Number of spectra to test in interactive mode.

Default: 20

--start_init, --start-init

Starting spectrum index.

Default: 0