API Reference¶
This page contains the complete API reference for all classes and functions in the HybrA-Filterbanks library.
Core Filterbanks¶
ISAC - Invertible and Stable Auditory Filterbank¶
- class hybra.ISAC(kernel_size: int | None = 128, num_channels: int = 40, fc_max: float | int | None = None, stride: int | None = None, fs: int = None, L: int = None, supp_mult: float = 1, scale: str = 'mel', tighten=False, is_encoder_learnable=False, fit_decoder=False, is_decoder_learnable=False, verbose: bool = True)[source]¶
Bases:
Module
ISAC (Invertible and Stable Auditory filterbank with Customizable kernels) filterbank.
ISAC filterbanks are invertible and stable, perceptually-motivated filterbanks specifically designed for machine learning integration. They provide perfect reconstruction properties with customizable kernel sizes and auditory-inspired frequency decomposition.
- Parameters:
kernel_size (int) – Size of the filter kernels. Default: 128
num_channels (int) – Number of frequency channels. Default: 40
fc_max (float, optional) – Maximum frequency on the auditory scale in Hz. If None, uses fs//2. Default: None
stride (int, optional) – Stride of the filterbank. If None, uses 25% overlap. Default: None
fs (int) – Sampling frequency in Hz. Default: None (required)
L (int) – Signal length in samples. Default: None (required)
supp_mult (float) – Support multiplier for kernel sizing. Default: 1.0
scale (str) – Auditory scale type. One of {‘mel’, ‘erb’, ‘bark’, ‘log10’, ‘elelog’}. ‘elelog’ is adapted for elephant hearing. Default: ‘mel’
tighten (bool) – Whether to apply tightening for better frame bounds. Default: False
is_encoder_learnable (bool) – Whether encoder kernels are learnable parameters. Default: False
fit_decoder (bool) – Whether to compute approximate perfect reconstruction decoder. Default: False
is_decoder_learnable (bool) – Whether decoder kernels are learnable parameters. Default: False
verbose (bool) – Whether to print filterbank information during initialization. Default: True
Note
ISAC filterbanks provide invertible and stable transforms with perfect reconstruction. The filters have user-defined maximum temporal support and can serve as learnable convolutional kernels. The frame bounds can be controlled through the tighten parameter for numerical stability.
Example
>>> filterbank = ISAC(kernel_size=128, num_channels=40, fs=16000, L=16000) >>> x = torch.randn(1, 16000) >>> coeffs = filterbank(x) >>> reconstructed = filterbank.decoder(coeffs)
- ISACgram(x: Tensor, fmax: float | None = None, vmin: float | None = None, log_scale: bool = False) None [source]¶
Plot time-frequency representation of the signal.
- Parameters:
x (torch.Tensor) – Input signal to visualize
fmax (float, optional) – Maximum frequency to display in Hz. Default: None
vmin (float, optional) – Minimum value for dynamic range clipping. Default: None
log_scale (bool) – Whether to apply log scaling to coefficients. Default: False
Note
This method displays a plot and does not return values.
- __init__(kernel_size: int | None = 128, num_channels: int = 40, fc_max: float | int | None = None, stride: int | None = None, fs: int = None, L: int = None, supp_mult: float = 1, scale: str = 'mel', tighten=False, is_encoder_learnable=False, fit_decoder=False, is_decoder_learnable=False, verbose: bool = True)[source]¶
Initialize internal Module state, shared by both nn.Module and ScriptModule.
- property condition_number: Tensor¶
Compute condition number of the analysis filterbank.
- Returns:
Condition number of the frame operator
- Return type:
Note
Lower condition numbers indicate better numerical stability. Values close to 1.0 indicate tight frames.
- property condition_number_decoder: Tensor¶
Compute condition number of the synthesis filterbank.
- Returns:
Condition number of the decoder frame operator
- Return type:
Note
Lower condition numbers indicate better numerical stability for reconstruction.
- decoder(x: Tensor) Tensor [source]¶
Reconstruct signal from ISAC coefficients.
- Parameters:
x (torch.Tensor) – Filterbank coefficients of shape (batch_size, num_channels, num_frames)
- Returns:
Reconstructed signal of shape (batch_size, signal_length)
- Return type:
Note
Uses frame bounds normalization for approximate perfect reconstruction.
- forward(x: Tensor) Tensor [source]¶
Forward pass through the ISAC filterbank.
- Parameters:
x (torch.Tensor) – Input signal of shape (batch_size, signal_length) or (signal_length,)
- Returns:
Filterbank coefficients of shape (batch_size, num_channels, num_frames)
- Return type:
HybrA - Hybrid Auditory Filterbank¶
- class hybra.HybrA(kernel_size: int = 128, learned_kernel_size: int = 23, num_channels: int = 40, stride: int | None = None, fc_max: float | int | None = None, fs: int = None, L: int = None, supp_mult: float = 1, scale: str = 'mel', tighten: bool = False, det_init: bool = False, verbose: bool = True)[source]¶
Bases:
Module
Hybrid Auditory filterbank combining fixed and learnable components.
HybrA (Hybrid Auditory) filterbanks extend ISAC by combining fixed auditory-inspired filters with learnable filters through channel-wise convolution. This hybrid approach enables data-driven adaptation while maintaining perceptual auditory characteristics and frame-theoretic stability guarantees.
- Parameters:
kernel_size (int) – Kernel size of the auditory filterbank. Default: 128
learned_kernel_size (int) – Kernel size of the learned filterbank. Default: 23
num_channels (int) – Number of frequency channels. Default: 40
stride (int, optional) – Stride of the auditory filterbank. If None, uses 25% overlap. Default: None
fc_max (float, optional) – Maximum frequency on the auditory scale in Hz. If None, uses fs//2. Default: None
fs (int) – Sampling frequency in Hz. Default: None (required)
L (int) – Signal length in samples. Default: None (required)
supp_mult (float) – Support multiplier for kernel sizing. Default: 1.0
scale (str) – Auditory scale type. One of {‘mel’, ‘erb’, ‘bark’, ‘log10’, ‘elelog’}. ‘elelog’ is adapted for elephant hearing. Default: ‘mel’
tighten (bool) – Whether to apply tightening to improve frame bounds. Default: False
det_init (bool) – Whether to initialize learned filters as diracs (True) or randomly (False). Default: False
verbose (bool) – Whether to print filterbank information during initialization. Default: True
Note
The hybrid construction h_m = g_m ⊛ ℓ_m combines ISAC auditory filters (g_m) with compact learnable filters (ℓ_m) through convolution. This maintains the perceptual benefits of auditory scales while enabling data-driven optimization and preserving perfect reconstruction properties.
Example
>>> filterbank = HybrA(kernel_size=128, num_channels=40, fs=16000, L=16000) >>> x = torch.randn(1, 16000) >>> coeffs = filterbank(x) >>> reconstructed = filterbank.decoder(coeffs)
- ISACgram(x: Tensor, fmax: float | None = None) None [source]¶
Plot time-frequency representation of the signal.
- Parameters:
x (torch.Tensor) – Input signal to visualize
fmax (float, optional) – Maximum frequency to display in Hz. Default: None
Note
This method displays a plot and does not return values.
- __init__(kernel_size: int = 128, learned_kernel_size: int = 23, num_channels: int = 40, stride: int | None = None, fc_max: float | int | None = None, fs: int = None, L: int = None, supp_mult: float = 1, scale: str = 'mel', tighten: bool = False, det_init: bool = False, verbose: bool = True)[source]¶
Initialize internal Module state, shared by both nn.Module and ScriptModule.
- property condition_number: Tensor | float¶
Compute condition number of the filterbank.
- Parameters:
learnable (bool) – If True, returns tensor for gradient computation. If False, returns scalar value. Default: False
- Returns:
Condition number of the frame operator
- Return type:
Union[torch.Tensor, float]
Note
Lower condition numbers indicate better numerical stability. Values close to 1.0 indicate tight frames.
- decoder(x: Tensor) Tensor [source]¶
Reconstruct signal from filterbank coefficients.
- Parameters:
x (torch.Tensor) – Filterbank coefficients of shape (batch_size, num_channels, num_frames)
- Returns:
Reconstructed signal of shape (batch_size, signal_length)
- Return type:
Note
Uses frame bounds normalization for approximate perfect reconstruction.
- encoder(x: Tensor) Tensor [source]¶
Encode signal using fixed hybrid kernels (no gradient computation).
- Parameters:
x (torch.Tensor) – Input signal of shape (batch_size, signal_length) or (signal_length,)
- Returns:
Filterbank coefficients of shape (batch_size, num_channels, num_frames)
- Return type:
Note
Use forward() method during training to enable gradient computation. This method uses pre-computed kernels for inference.
- forward(x: Tensor) Tensor [source]¶
Forward pass through the HybrA filterbank.
- Parameters:
x (torch.Tensor) – Input signal of shape (batch_size, signal_length) or (signal_length,)
- Returns:
Filterbank coefficients of shape (batch_size, num_channels, num_frames)
- Return type:
Spectrogram and Cepstral Variants¶
ISACSpec - ISAC Spectrogram¶
- class hybra.ISACSpec(kernel_size: int | None = None, num_channels: int = 40, stride: int | None = None, fc_max: float | int | None = None, fmax: int | None = None, fs: int = None, L: int = None, supp_mult: float = 1, scale: str = 'mel', power: float = 2.0, avg_size: int = None, is_log=False, is_encoder_learnable=False, is_avg_learnable=False, verbose: bool = True)[source]¶
Bases:
Module
ISAC spectrogram filterbank for time-frequency analysis.
ISACSpec combines ISAC (Invertible and Stable Auditory filterbank with Customizable kernels) with temporal averaging to produce spectrogram-like representations. The filterbank applies auditory-inspired filters followed by temporal smoothing for robust feature extraction.
- Parameters:
kernel_size (int, optional) – Size of the filter kernels. If None, computed automatically. Default: None
num_channels (int) – Number of frequency channels. Default: 40
stride (int, optional) – Stride of the filterbank. If None, uses 25% overlap. Default: None
fc_max (float, optional) – Maximum frequency on the auditory scale in Hz. If None, uses fs//2. Default: None
fmax (float, optional) – Maximum frequency for output truncation in Hz. Default: None
fs (int) – Sampling frequency in Hz. Default: None (required)
L (int) – Signal length in samples. Default: None (required)
supp_mult (float) – Support multiplier for kernel sizing. Default: 1.0
scale (str) – Auditory scale type. One of {‘mel’, ‘erb’, ‘bark’, ‘log10’, ‘elelog’}. ‘elelog’ is adapted for elephant hearing. Default: ‘mel’
power (float) – Power applied to coefficients before averaging. Default: 2.0
avg_size (int, optional) – Size of the temporal averaging kernel. If None, computed automatically. Default: None
is_log (bool) – Whether to apply logarithm to the output. Default: False
is_encoder_learnable (bool) – Whether encoder kernels are learnable parameters. Default: False
is_avg_learnable (bool) – Whether averaging kernels are learnable parameters. Default: False
verbose (bool) – Whether to print filterbank information during initialization. Default: True
Note
The temporal averaging provides robustness to time variations while preserving spectral characteristics. The power parameter controls the nonlinearity applied before averaging.
Example
>>> spectrogram = ISACSpec(num_channels=40, fs=16000, L=16000, power=2.0) >>> x = torch.randn(1, 16000) >>> spec = spectrogram(x)
- ISACgram(x: Tensor, fmax: float | None = None, vmin: float | None = None, log_scale: bool = False) None [source]¶
Plot time-frequency spectrogram representation of the signal.
- Parameters:
x (torch.Tensor) – Input signal to visualize
fmax (float, optional) – Maximum frequency to display in Hz. Default: None
vmin (float, optional) – Minimum value for dynamic range clipping. Default: None
log_scale (bool) – Whether to apply log scaling to coefficients. Default: False
Note
This method displays a plot and does not return values.
- __init__(kernel_size: int | None = None, num_channels: int = 40, stride: int | None = None, fc_max: float | int | None = None, fmax: int | None = None, fs: int = None, L: int = None, supp_mult: float = 1, scale: str = 'mel', power: float = 2.0, avg_size: int = None, is_log=False, is_encoder_learnable=False, is_avg_learnable=False, verbose: bool = True)[source]¶
Initialize internal Module state, shared by both nn.Module and ScriptModule.
- forward(x: Tensor) Tensor [source]¶
Forward pass through the ISACSpec filterbank.
- Parameters:
x (torch.Tensor) – Input signal of shape (batch_size, signal_length) or (signal_length,)
- Returns:
Spectrogram coefficients of shape (batch_size, num_channels, num_frames)
- Return type:
Note
The output is temporally averaged and optionally log-scaled for robustness.
ISACCC - ISAC Cepstral Coefficients¶
- class hybra.ISACCC(kernel_size: int | None = None, num_channels: int = 40, stride: int | None = None, num_cc: int = 13, fc_max: float | int | None = None, fmax: float | int | None = None, fs: int = 16000, L: int = 16000, supp_mult: float = 1, power: float = 2.0, scale: str = 'mel', is_log: bool = False, verbose: bool = True)[source]¶
Bases:
Module
ISAC Cepstral Coefficients (ISACCC) extractor for speech features.
ISACCC computes cepstral coefficients from ISAC (Invertible and Stable Auditory filterbank with Customizable kernels) spectrograms using the Discrete Cosine Transform (DCT). This provides compact features suitable for speech recognition and audio classification tasks.
- Parameters:
kernel_size (int, optional) – Size of the filter kernels. If None, computed automatically. Default: None
num_channels (int) – Number of frequency channels. Default: 40
stride (int, optional) – Stride of the filterbank. If None, uses 25% overlap. Default: None
num_cc (int) – Number of cepstral coefficients to extract. Default: 13
fc_max (float, optional) – Maximum frequency on the auditory scale in Hz. If None, uses fs//2. Default: None
fmax (float, optional) – Maximum frequency for ISACSpec computation in Hz. Default: None
fs (int) – Sampling frequency in Hz. Default: 16000
L (int) – Signal length in samples. Default: 16000
supp_mult (float) – Support multiplier for kernel sizing. Default: 1.0
power (float) – Power applied to ISACSpec coefficients. Default: 2.0
scale (str) – Auditory scale type. One of {‘mel’, ‘erb’, ‘bark’, ‘log10’, ‘elelog’}. ‘elelog’ is adapted for elephant hearing. Default: ‘mel’
is_log (bool) – Whether to apply log instead of dB conversion. Default: False
verbose (bool) – Whether to print filterbank information during initialization. Default: True
- Raises:
ValueError – If num_cc > num_channels
Note
The DCT is applied with orthonormal basis functions for energy preservation. The number of cepstral coefficients should typically be much smaller than the number of frequency channels for dimensionality reduction.
Example
>>> mfcc_extractor = ISACCC(num_channels=40, num_cc=13, fs=16000, L=16000) >>> x = torch.randn(1, 16000) >>> cepstral_coeffs = mfcc_extractor(x)
- ISACgram(x: Tensor) None [source]¶
Plot cepstral coefficients representation.
- Parameters:
x (torch.Tensor) – Input signal to visualize
Note
This method displays a plot of the cepstral coefficients and does not return values.
- __init__(kernel_size: int | None = None, num_channels: int = 40, stride: int | None = None, num_cc: int = 13, fc_max: float | int | None = None, fmax: float | int | None = None, fs: int = 16000, L: int = 16000, supp_mult: float = 1, power: float = 2.0, scale: str = 'mel', is_log: bool = False, verbose: bool = True)[source]¶
Initialize internal Module state, shared by both nn.Module and ScriptModule.
- forward(x: Tensor) Tensor [source]¶
Forward pass to compute ISAC cepstral coefficients.
- Parameters:
x (torch.Tensor) – Input signal of shape (batch_size, signal_length) or (signal_length,)
- Returns:
Cepstral coefficients of shape (batch_size, num_cc, num_frames)
- Return type:
Note
The process involves: ISAC spectrogram -> log/dB conversion -> DCT transform.
Utility Functions¶
Frame Theory Functions¶
- hybra.utils.frame_bounds(w: Tensor, d: int, Ls: int | None = None) Tuple[Tensor, Tensor] [source]¶
Compute frame bounds of a filterbank using polyphase representation.
Frame bounds characterize the numerical stability and invertibility of the filterbank transform. Tight frames (A ≈ B) provide optimal stability.
- Parameters:
w (torch.Tensor) – Impulse responses of shape (num_channels, length)
d (int) – Decimation (stride) factor
Ls (int, optional) – Signal length. If None, computed automatically. Default: None
- Returns:
Lower and upper frame bounds (A, B)
- Return type:
Tuple[torch.Tensor, torch.Tensor]
Note
For d=1, reduces to computing min/max of power spectral density. For d>1, uses polyphase analysis to compute worst-case eigenvalues.
Example
>>> w = torch.randn(40, 128) >>> A, B = frame_bounds(w, d=4) >>> condition_number = B / A
- hybra.utils.condition_number(w: Tensor, d: int, Ls: int | None = None) Tensor [source]¶
Compute condition number of a filterbank frame operator.
The condition number κ = B/A quantifies numerical stability, where A and B are the lower and upper frame bounds. Lower values indicate better stability.
- Parameters:
w (torch.Tensor) – Impulse responses of shape (num_channels, signal_length)
d (int) – Decimation factor (stride)
Ls (int, optional) – Signal length. If None, computed automatically. Default: None
- Returns:
Condition number κ = B/A
- Return type:
Note
κ = 1 indicates a tight frame (optimal stability). κ >> 1 suggests potential numerical instability.
Example
>>> w = torch.randn(40, 128) >>> kappa = condition_number(w, d=4) >>> print(f"Condition number: {kappa.item():.2f}")
Auditory Scale Conversions¶
- hybra.utils.freqtoaud(freq: float | int | Tensor, scale: str = 'erb', fs: int | None = None) Tensor [source]¶
Convert frequencies from Hz to auditory scale units.
Transforms linear frequency values to perceptually-motivated auditory scales that better reflect human frequency discrimination.
- Parameters:
freq (Union[float, int, torch.Tensor]) – Frequency value(s) in Hz
scale (str) – Auditory scale type. One of {‘erb’, ‘mel’, ‘bark’, ‘log10’, ‘elelog’}. Default: ‘erb’
fs (int, optional) – Sampling frequency (required for ‘elelog’ scale). Default: None
- Returns:
Corresponding auditory scale units
- Return type:
- Raises:
ValueError – If unsupported scale is specified or fs is missing for ‘elelog’
Note
ERB: Equivalent Rectangular Bandwidth (Glasberg & Moore)
MEL: Mel scale (perceptually uniform pitch)
Bark: Bark scale (critical band rate)
elelog: Logarithmic scale adapted for elephant hearing
Example
>>> freq_hz = torch.tensor([100, 1000, 8000]) >>> mel_units = freqtoaud(freq_hz, scale='mel')
- hybra.utils.audtofreq(aud: float | int | Tensor, scale: str = 'erb', fs: int | None = None) Tensor [source]¶
Convert auditory scale units back to frequencies in Hz.
- Parameters:
aud (Union[float, int, torch.Tensor]) – Auditory scale values
scale (str) – Auditory scale type. One of {‘erb’, ‘mel’, ‘bark’, ‘log10’, ‘elelog’}. Default: ‘erb’
fs (int, optional) – Sampling frequency (required for ‘elelog’ scale). Default: None
- Returns:
Corresponding frequencies in Hz
- Return type:
Example
>>> mel_units = torch.tensor([100, 1000, 2000]) >>> freq_hz = audtofreq(mel_units, scale='mel')
- hybra.utils.audspace(fmin: float | int | Tensor, fmax: float | int | Tensor, num_channels: int, scale: str = 'erb')[source]¶
Computes a vector of values equidistantly spaced on the selected auditory scale.
- Parameters:
- Returns:
y (ndarray): Array of frequencies equidistantly scaled on the auditory scale.
- Return type:
Filterbank Construction¶
- hybra.utils.audfilters(kernel_size: int | None = None, num_channels: int = 96, fc_max: float | int | None = None, fs: int = None, L: int = None, supp_mult: float = 1, scale: str = 'mel') Tuple[Tensor, int, Tensor, int | float, int | float, int, int, int, Tensor] [source]¶
Generate auditory-inspired FIR filterbank kernels.
Creates a bank of bandpass filters with center frequencies distributed according to perceptual auditory scales (mel, erb, bark, etc.). Filters are designed with variable bandwidths matching critical bands of human auditory perception.
- Parameters:
kernel_size (int, optional) – Maximum filter kernel size. If None, computed automatically. Default: None
num_channels (int) – Number of frequency channels. Default: 96
fc_max (float, optional) – Maximum center frequency in Hz. If None, uses fs//2. Default: None
fs (int) – Sampling frequency in Hz. Default: None (required)
L (int) – Signal length in samples. If None, uses fs. Default: None
supp_mult (float) – Support multiplier for kernel sizing. Default: 1.0
scale (str) – Auditory scale. One of {‘mel’, ‘erb’, ‘bark’, ‘log10’, ‘elelog’}. Default: ‘mel’
- Returns:
kernels (torch.Tensor): Filter kernels of shape (num_channels, kernel_size)
d (int): Recommended stride for 25% overlap
fc (torch.Tensor): Center frequencies in Hz
fc_min (Union[int, float]): Minimum center frequency
fc_max (Union[int, float]): Maximum center frequency
kernel_min (int): Minimum kernel size used
kernel_size (int): Maximum kernel size used
Ls (int): Adjusted signal length
tsupp (torch.Tensor): Time support for each filter
- Return type:
Tuple containing
- Raises:
ValueError – If parameters are invalid (negative values, unsupported scale, etc.)
Note
The filterbank construction follows auditory modeling principles where: - Low frequencies use longer filters (better frequency resolution) - High frequencies use shorter filters (better time resolution) - Bandwidth scales according to critical band theory
Example
>>> kernels, stride, fc, _, _, _, _, Ls, _ = audfilters( ... kernel_size=128, num_channels=40, fs=16000, scale='mel' ... ) >>> print(f"Generated {kernels.shape[0]} filters with stride {stride}")
- hybra.utils.fctobw(fc: float | int | Tensor, scale='erb')[source]¶
Computes the critical bandwidth of a filter at a given center frequency.
- Parameters:
- Returns:
Critical bandwidth at each center frequency.
- Return type:
ndarray or float
- hybra.utils.bwtofc(bw: float | int | Tensor, scale='erb')[source]¶
Computes the center frequency corresponding to a given critical bandwidth.
- Parameters:
- Returns:
Center frequency corresponding to the given bandwidth.
- Return type:
ndarray or float
Convolution Operations¶
- hybra.utils.circ_conv(x: Tensor, kernels: Tensor, d: int = 1) Tensor [source]¶
Circular convolution with optional downsampling.
Performs efficient circular convolution using FFT, followed by downsampling. The kernels are automatically centered for proper phase alignment.
- Parameters:
x (torch.Tensor) – Input signal of shape (…, signal_length)
kernels (torch.Tensor) – Filter kernels of shape (num_channels, 1, kernel_length) or (num_channels, kernel_length)
d (int) – Downsampling factor (stride). Default: 1
- Returns:
Convolved and downsampled output of shape (…, num_channels, output_length)
- Return type:
Note
Uses circular convolution which assumes periodic boundary conditions. Kernels are automatically zero-padded and centered.
Example
>>> x = torch.randn(1, 1000) >>> kernels = torch.randn(40, 128) >>> y = circ_conv(x, kernels, d=4)
- hybra.utils.circ_conv_transpose(y: Tensor, kernels: Tensor, d: int = 1) Tensor [source]¶
Transpose (adjoint) of circular convolution with upsampling.
Implements the adjoint operation of circ_conv for signal reconstruction. Used in synthesis/decoder operations of filterbanks.
- Parameters:
y (torch.Tensor) – Input coefficients of shape (…, num_channels, num_frames)
kernels (torch.Tensor) – Filter kernels of shape (num_channels, 1, kernel_length) or (num_channels, kernel_length)
d (int) – Upsampling factor (stride). Default: 1
- Returns:
Reconstructed signal of shape (…, 1, signal_length)
- Return type:
Note
This is the mathematical adjoint, not the true inverse. For perfect reconstruction, appropriate dual frame filters should be used.
Example
>>> coeffs = torch.randn(1, 40, 250) >>> kernels = torch.randn(40, 128) >>> x_recon = circ_conv_transpose(coeffs, kernels, d=4)
Visualization Functions¶
- hybra.utils.ISACgram(c: Tensor, fc: Tensor | None = None, L: int | None = None, fs: int | None = None, fmax: float | None = None, log_scale: bool = False, vmin: float | None = None, cmap: str = 'inferno') None [source]¶
Plot time-frequency representation of filterbank coefficients.
Creates a spectrogram-like visualization with frequency on y-axis and time on x-axis. Supports logarithmic scaling and frequency range limitation for better visualization.
- Parameters:
c (torch.Tensor) – Filterbank coefficients of shape (batch_size, num_channels, num_frames)
fc (torch.Tensor, optional) – Center frequencies in Hz for y-axis labeling. Default: None
L (int, optional) – Original signal length for time axis scaling. Default: None
fs (int, optional) – Sampling frequency for time axis scaling. Default: None
fmax (float, optional) – Maximum frequency to display in Hz. Default: None
log_scale (bool) – Whether to apply log10 scaling to coefficients. Default: False
vmin (float, optional) – Minimum value for dynamic range clipping. Default: None
cmap (str) – Matplotlib colormap name. Default: ‘inferno’
Note
This function displays a plot and does not return values. Only processes the first batch element if batch_size > 1.
Example
>>> coeffs = torch.randn(1, 40, 250) >>> fc = torch.linspace(100, 8000, 40) >>> ISACgram(coeffs, fc=fc, L=16000, fs=16000, log_scale=True)
- hybra.utils.plot_response(g: ndarray, fs: int, scale: str = 'mel', plot_scale: bool = False, fc_min: float | None = None, fc_max: float | None = None, kernel_min: int | None = None, decoder: bool = False) None [source]¶
Plot frequency responses and auditory scale visualization of filters.
Creates comprehensive visualization showing individual filter responses, total power spectral density, and optional auditory scale mapping.
- Parameters:
g (np.ndarray) – Filter kernels of shape (num_channels, kernel_size)
fs (int) – Sampling frequency in Hz for frequency axis scaling
scale (str) – Auditory scale name for scale plotting. Default: ‘mel’
plot_scale (bool) – Whether to plot the auditory scale mapping. Default: False
fc_min (float, optional) – Lower transition frequency for scale visualization. Default: None
fc_max (float, optional) – Upper transition frequency for scale visualization. Default: None
kernel_min (int, optional) – Minimum kernel size for annotations. Default: None
decoder (bool) – Whether filters are for synthesis (affects plot titles). Default: False
Note
This function displays plots and does not return values. Creates 2-3 subplots depending on plot_scale parameter.
Example
>>> filters = np.random.randn(40, 128) >>> plot_response(filters, fs=16000, scale='mel', plot_scale=True)
- hybra.utils.response(g: ndarray, fs: int) ndarray [source]¶
Compute frequency responses of filter kernels.
- Parameters:
g (np.ndarray) – Filter kernels of shape (num_channels, kernel_size)
fs (int) – Sampling frequency for frequency axis scaling
- Returns:
Magnitude-squared frequency responses of shape (2*num_channels, fs//2)
- Return type:
np.ndarray
Note
Computes responses for both analysis and conjugate filters.
Frame Analysis Functions¶
- hybra.utils.frequency_correlation(w: Tensor, d: int, Ls: int | None = None, diag_only: bool = False) Tensor [source]¶
Computes the frequency correlation functions (vectorized version). :param w: (J, K) - Impulse responses :param d: Decimation factor :param Ls: FFT length (default: nearest multiple of d ≥ 2K-1) :param diag_only: If True, only return diagonal (i.e., PSD)
- Returns:
(d, Ls) complex tensor with frequency correlations
- Return type:
G
- hybra.utils.alias(w: Tensor, d: int, Ls: int | None = None, diag_only: bool = False) Tensor [source]¶
Computes the norm of the aliasing terms. :param w: Impulse responses of the filterbank as 2-D Tensor torch.tensor[num_channels, sig_length] :param d: Decimation factor, must divide filter length!
- Output:
A: Energy of the aliasing terms
- hybra.utils.can_tight(w: Tensor, d: int, Ls: int) Tensor [source]¶
Computes the canonical tight filterbank of w (time domain) using the polyphase representation. :param w: Impulse responses of the filterbank as 2-d Tensor torch.tensor[num_channels, signal_length] :param d: Decimation factor, must divide signal_length!
- Returns:
Canonical tight filterbank of W (torch.tensor[num_channels, signal_length])
- Return type:
W
- hybra.utils.fir_tightener3000(w: Tensor, supp: int, d: int, eps: float = 1.01, Ls: int | None = None)[source]¶
Iterative tightening procedure with fixed support for a given filterbank w :param w: Impulse responses of the filterbank as 2-D Tensor torch.tensor[num_channels, signal_length]. :param supp: Desired support of the resulting filterbank :param d: Decimation factor, must divide filter length! :param eps: Desired precision for the condition number :param Ls: System length (if not already given by w). If set, the resulting filterbank is padded with zeros to length Ls.
- Returns:
Filterbank with condition number eps and support length supp. If length=supp then the resulting filterbank is the canonical tight filterbank of w.
Helper Functions¶
- hybra.utils.freqtoaud_mod(freq: float | int | Tensor, fc_low: float | int | Tensor, fc_high: float | int | Tensor, scale='erb', fs=None)[source]¶
Modified auditory scale function with linear region below fc_crit.