API Reference¶

This page contains the complete API reference for all classes and functions in the HybrA-Filterbanks library.

Core Filterbanks¶

ISAC - Invertible and Stable Auditory Filterbank¶

class hybra.ISAC(kernel_size: int | None = 128, num_channels: int = 40, fc_max: float | int | None = None, stride: int | None = None, fs: int = None, L: int = None, supp_mult: float = 1, scale: str = 'mel', tighten=False, is_encoder_learnable=False, fit_decoder=False, is_decoder_learnable=False, verbose: bool = True)[source]¶

Bases: Module

ISAC (Invertible and Stable Auditory filterbank with Customizable kernels) filterbank.

ISAC filterbanks are invertible and stable, perceptually-motivated filterbanks specifically designed for machine learning integration. They provide perfect reconstruction properties with customizable kernel sizes and auditory-inspired frequency decomposition.

Parameters:

kernel_size (int) – Size of the filter kernels. Default: 128
num_channels (int) – Number of frequency channels. Default: 40
fc_max (float, optional) – Maximum frequency on the auditory scale in Hz. If None, uses fs//2. Default: None
stride (int, optional) – Stride of the filterbank. If None, uses 25% overlap. Default: None
fs (int) – Sampling frequency in Hz. Default: None (required)
L (int) – Signal length in samples. Default: None (required)
supp_mult (float) – Support multiplier for kernel sizing. Default: 1.0
scale (str) – Auditory scale type. One of {‘mel’, ‘erb’, ‘bark’, ‘log10’, ‘elelog’}. ‘elelog’ is adapted for elephant hearing. Default: ‘mel’
tighten (bool) – Whether to apply tightening for better frame bounds. Default: False
is_encoder_learnable (bool) – Whether encoder kernels are learnable parameters. Default: False
fit_decoder (bool) – Whether to compute approximate perfect reconstruction decoder. Default: False
is_decoder_learnable (bool) – Whether decoder kernels are learnable parameters. Default: False
verbose (bool) – Whether to print filterbank information during initialization. Default: True

Note

ISAC filterbanks provide invertible and stable transforms with perfect reconstruction. The filters have user-defined maximum temporal support and can serve as learnable convolutional kernels. The frame bounds can be controlled through the tighten parameter for numerical stability.

Example

>>> filterbank = ISAC(kernel_size=128, num_channels=40, fs=16000, L=16000)
>>> x = torch.randn(1, 16000)
>>> coeffs = filterbank(x)
>>> reconstructed = filterbank.decoder(coeffs)

ISACgram(x: Tensor, fmax: float | None = None, vmin: float | None = None, log_scale: bool = False) → None[source]¶

Plot time-frequency representation of the signal.

Parameters:

x (torch.Tensor) – Input signal to visualize
fmax (float, optional) – Maximum frequency to display in Hz. Default: None
vmin (float, optional) – Minimum value for dynamic range clipping. Default: None
log_scale (bool) – Whether to apply log scaling to coefficients. Default: False

Note

This method displays a plot and does not return values.

__init__(kernel_size: int | None = 128, num_channels: int = 40, fc_max: float | int | None = None, stride: int | None = None, fs: int = None, L: int = None, supp_mult: float = 1, scale: str = 'mel', tighten=False, is_encoder_learnable=False, fit_decoder=False, is_decoder_learnable=False, verbose: bool = True)[source]¶: Initialize internal Module state, shared by both nn.Module and ScriptModule.

property condition_number: Tensor¶

Compute condition number of the analysis filterbank.

Returns:: Condition number of the frame operator
Return type:: torch.Tensor

Note

Lower condition numbers indicate better numerical stability. Values close to 1.0 indicate tight frames.

property condition_number_decoder: Tensor¶

Compute condition number of the synthesis filterbank.

Returns:: Condition number of the decoder frame operator
Return type:: torch.Tensor

Note

Lower condition numbers indicate better numerical stability for reconstruction.

decoder(x: Tensor) → Tensor[source]¶

Reconstruct signal from ISAC coefficients.

Parameters:: x (torch.Tensor) – Filterbank coefficients of shape (batch_size, num_channels, num_frames)
Returns:: Reconstructed signal of shape (batch_size, signal_length)
Return type:: torch.Tensor

Note

Uses frame bounds normalization for approximate perfect reconstruction.

forward(x: Tensor) → Tensor[source]¶

Forward pass through the ISAC filterbank.

Parameters:: x (torch.Tensor) – Input signal of shape (batch_size, signal_length) or (signal_length,)
Returns:: Filterbank coefficients of shape (batch_size, num_channels, num_frames)
Return type:: torch.Tensor

plot_decoder_response() → None[source]¶: Plot frequency response of the synthesis (decoder) filters.

Note

This method displays a plot and does not return values.

plot_response() → None[source]¶: Plot frequency response of the analysis filters.

Note

This method displays a plot and does not return values.

HybrA - Hybrid Auditory Filterbank¶

class hybra.HybrA(kernel_size: int = 128, learned_kernel_size: int = 23, num_channels: int = 40, stride: int | None = None, fc_max: float | int | None = None, fs: int = None, L: int = None, supp_mult: float = 1, scale: str = 'mel', tighten: bool = False, det_init: bool = False, verbose: bool = True)[source]¶

Bases: Module

Hybrid Auditory filterbank combining fixed and learnable components.

HybrA (Hybrid Auditory) filterbanks extend ISAC by combining fixed auditory-inspired filters with learnable filters through channel-wise convolution. This hybrid approach enables data-driven adaptation while maintaining perceptual auditory characteristics and frame-theoretic stability guarantees.

Parameters:

kernel_size (int) – Kernel size of the auditory filterbank. Default: 128
learned_kernel_size (int) – Kernel size of the learned filterbank. Default: 23
num_channels (int) – Number of frequency channels. Default: 40
stride (int, optional) – Stride of the auditory filterbank. If None, uses 25% overlap. Default: None
fc_max (float, optional) – Maximum frequency on the auditory scale in Hz. If None, uses fs//2. Default: None
fs (int) – Sampling frequency in Hz. Default: None (required)
L (int) – Signal length in samples. Default: None (required)
supp_mult (float) – Support multiplier for kernel sizing. Default: 1.0
scale (str) – Auditory scale type. One of {‘mel’, ‘erb’, ‘bark’, ‘log10’, ‘elelog’}. ‘elelog’ is adapted for elephant hearing. Default: ‘mel’
tighten (bool) – Whether to apply tightening to improve frame bounds. Default: False
det_init (bool) – Whether to initialize learned filters as diracs (True) or randomly (False). Default: False
verbose (bool) – Whether to print filterbank information during initialization. Default: True

Note

The hybrid construction h_m = g_m ⊛ ℓ_m combines ISAC auditory filters (g_m) with compact learnable filters (ℓ_m) through convolution. This maintains the perceptual benefits of auditory scales while enabling data-driven optimization and preserving perfect reconstruction properties.

Example

>>> filterbank = HybrA(kernel_size=128, num_channels=40, fs=16000, L=16000)
>>> x = torch.randn(1, 16000)
>>> coeffs = filterbank(x)
>>> reconstructed = filterbank.decoder(coeffs)

ISACgram(x: Tensor, fmax: float | None = None) → None[source]¶

Plot time-frequency representation of the signal.

Parameters:

x (torch.Tensor) – Input signal to visualize
fmax (float, optional) – Maximum frequency to display in Hz. Default: None

Note

This method displays a plot and does not return values.

__init__(kernel_size: int = 128, learned_kernel_size: int = 23, num_channels: int = 40, stride: int | None = None, fc_max: float | int | None = None, fs: int = None, L: int = None, supp_mult: float = 1, scale: str = 'mel', tighten: bool = False, det_init: bool = False, verbose: bool = True)[source]¶: Initialize internal Module state, shared by both nn.Module and ScriptModule.

property condition_number: Tensor | float¶

Compute condition number of the filterbank.

Parameters:: learnable (bool) – If True, returns tensor for gradient computation. If False, returns scalar value. Default: False
Returns:: Condition number of the frame operator
Return type:: Union[torch.Tensor, float]

Note

Lower condition numbers indicate better numerical stability. Values close to 1.0 indicate tight frames.

decoder(x: Tensor) → Tensor[source]¶

Reconstruct signal from filterbank coefficients.

Parameters:: x (torch.Tensor) – Filterbank coefficients of shape (batch_size, num_channels, num_frames)
Returns:: Reconstructed signal of shape (batch_size, signal_length)
Return type:: torch.Tensor

Note

Uses frame bounds normalization for approximate perfect reconstruction.

encoder(x: Tensor) → Tensor[source]¶

Encode signal using fixed hybrid kernels (no gradient computation).

Parameters:: x (torch.Tensor) – Input signal of shape (batch_size, signal_length) or (signal_length,)
Returns:: Filterbank coefficients of shape (batch_size, num_channels, num_frames)
Return type:: torch.Tensor

Note

Use forward() method during training to enable gradient computation. This method uses pre-computed kernels for inference.

forward(x: Tensor) → Tensor[source]¶

Forward pass through the HybrA filterbank.

Parameters:: x (torch.Tensor) – Input signal of shape (batch_size, signal_length) or (signal_length,)
Returns:: Filterbank coefficients of shape (batch_size, num_channels, num_frames)
Return type:: torch.Tensor

plot_decoder_response() → None[source]¶: Plot frequency response of the synthesis (decoder) filters.

Note

This method displays a plot and does not return values.

plot_response() → None[source]¶: Plot frequency response of the analysis filters.

Note

This method displays a plot and does not return values.

Spectrogram and Cepstral Variants¶

ISACSpec - ISAC Spectrogram¶

class hybra.ISACSpec(kernel_size: int | None = None, num_channels: int = 40, stride: int | None = None, fc_max: float | int | None = None, fmax: int | None = None, fs: int = None, L: int = None, supp_mult: float = 1, scale: str = 'mel', power: float = 2.0, avg_size: int = None, is_log=False, is_encoder_learnable=False, is_avg_learnable=False, verbose: bool = True)[source]¶

Bases: Module

ISAC spectrogram filterbank for time-frequency analysis.

ISACSpec combines ISAC (Invertible and Stable Auditory filterbank with Customizable kernels) with temporal averaging to produce spectrogram-like representations. The filterbank applies auditory-inspired filters followed by temporal smoothing for robust feature extraction.

Parameters:

kernel_size (int, optional) – Size of the filter kernels. If None, computed automatically. Default: None
num_channels (int) – Number of frequency channels. Default: 40
stride (int, optional) – Stride of the filterbank. If None, uses 25% overlap. Default: None
fc_max (float, optional) – Maximum frequency on the auditory scale in Hz. If None, uses fs//2. Default: None
fmax (float, optional) – Maximum frequency for output truncation in Hz. Default: None
fs (int) – Sampling frequency in Hz. Default: None (required)
L (int) – Signal length in samples. Default: None (required)
supp_mult (float) – Support multiplier for kernel sizing. Default: 1.0
scale (str) – Auditory scale type. One of {‘mel’, ‘erb’, ‘bark’, ‘log10’, ‘elelog’}. ‘elelog’ is adapted for elephant hearing. Default: ‘mel’
power (float) – Power applied to coefficients before averaging. Default: 2.0
avg_size (int, optional) – Size of the temporal averaging kernel. If None, computed automatically. Default: None
is_log (bool) – Whether to apply logarithm to the output. Default: False
is_encoder_learnable (bool) – Whether encoder kernels are learnable parameters. Default: False
is_avg_learnable (bool) – Whether averaging kernels are learnable parameters. Default: False
verbose (bool) – Whether to print filterbank information during initialization. Default: True

Note

The temporal averaging provides robustness to time variations while preserving spectral characteristics. The power parameter controls the nonlinearity applied before averaging.

Example

>>> spectrogram = ISACSpec(num_channels=40, fs=16000, L=16000, power=2.0)
>>> x = torch.randn(1, 16000)
>>> spec = spectrogram(x)

ISACgram(x: Tensor, fmax: float | None = None, vmin: float | None = None, log_scale: bool = False) → None[source]¶

Plot time-frequency spectrogram representation of the signal.

Parameters:

x (torch.Tensor) – Input signal to visualize
fmax (float, optional) – Maximum frequency to display in Hz. Default: None
vmin (float, optional) – Minimum value for dynamic range clipping. Default: None
log_scale (bool) – Whether to apply log scaling to coefficients. Default: False

Note

This method displays a plot and does not return values.

__init__(kernel_size: int | None = None, num_channels: int = 40, stride: int | None = None, fc_max: float | int | None = None, fmax: int | None = None, fs: int = None, L: int = None, supp_mult: float = 1, scale: str = 'mel', power: float = 2.0, avg_size: int = None, is_log=False, is_encoder_learnable=False, is_avg_learnable=False, verbose: bool = True)[source]¶: Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(x: Tensor) → Tensor[source]¶

Forward pass through the ISACSpec filterbank.

Parameters:: x (torch.Tensor) – Input signal of shape (batch_size, signal_length) or (signal_length,)
Returns:: Spectrogram coefficients of shape (batch_size, num_channels, num_frames)
Return type:: torch.Tensor

Note

The output is temporally averaged and optionally log-scaled for robustness.

plot_response() → None[source]¶: Plot frequency response of the analysis filters.

Note

This method displays a plot and does not return values.

ISACCC - ISAC Cepstral Coefficients¶

class hybra.ISACCC(kernel_size: int | None = None, num_channels: int = 40, stride: int | None = None, num_cc: int = 13, fc_max: float | int | None = None, fmax: float | int | None = None, fs: int = 16000, L: int = 16000, supp_mult: float = 1, power: float = 2.0, scale: str = 'mel', is_log: bool = False, verbose: bool = True)[source]¶

Bases: Module

ISAC Cepstral Coefficients (ISACCC) extractor for speech features.

ISACCC computes cepstral coefficients from ISAC (Invertible and Stable Auditory filterbank with Customizable kernels) spectrograms using the Discrete Cosine Transform (DCT). This provides compact features suitable for speech recognition and audio classification tasks.

Parameters:

kernel_size (int, optional) – Size of the filter kernels. If None, computed automatically. Default: None
num_channels (int) – Number of frequency channels. Default: 40
stride (int, optional) – Stride of the filterbank. If None, uses 25% overlap. Default: None
num_cc (int) – Number of cepstral coefficients to extract. Default: 13
fc_max (float, optional) – Maximum frequency on the auditory scale in Hz. If None, uses fs//2. Default: None
fmax (float, optional) – Maximum frequency for ISACSpec computation in Hz. Default: None
fs (int) – Sampling frequency in Hz. Default: 16000
L (int) – Signal length in samples. Default: 16000
supp_mult (float) – Support multiplier for kernel sizing. Default: 1.0
power (float) – Power applied to ISACSpec coefficients. Default: 2.0
scale (str) – Auditory scale type. One of {‘mel’, ‘erb’, ‘bark’, ‘log10’, ‘elelog’}. ‘elelog’ is adapted for elephant hearing. Default: ‘mel’
is_log (bool) – Whether to apply log instead of dB conversion. Default: False
verbose (bool) – Whether to print filterbank information during initialization. Default: True

Raises:

ValueError – If num_cc > num_channels

Note

The DCT is applied with orthonormal basis functions for energy preservation. The number of cepstral coefficients should typically be much smaller than the number of frequency channels for dimensionality reduction.

Example

>>> mfcc_extractor = ISACCC(num_channels=40, num_cc=13, fs=16000, L=16000)
>>> x = torch.randn(1, 16000)
>>> cepstral_coeffs = mfcc_extractor(x)

ISACgram(x: Tensor) → None[source]¶

Plot cepstral coefficients representation.

Parameters:: x (torch.Tensor) – Input signal to visualize

Note

This method displays a plot of the cepstral coefficients and does not return values.

__init__(kernel_size: int | None = None, num_channels: int = 40, stride: int | None = None, num_cc: int = 13, fc_max: float | int | None = None, fmax: float | int | None = None, fs: int = 16000, L: int = 16000, supp_mult: float = 1, power: float = 2.0, scale: str = 'mel', is_log: bool = False, verbose: bool = True)[source]¶: Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(x: Tensor) → Tensor[source]¶

Forward pass to compute ISAC cepstral coefficients.

Parameters:: x (torch.Tensor) – Input signal of shape (batch_size, signal_length) or (signal_length,)
Returns:: Cepstral coefficients of shape (batch_size, num_cc, num_frames)
Return type:: torch.Tensor

Note

The process involves: ISAC spectrogram -> log/dB conversion -> DCT transform.

plot_response() → None[source]¶: Plot frequency response of the underlying ISAC filters.

Note

This method displays a plot and does not return values.

Utility Functions¶

Frame Theory Functions¶

hybra.utils.frame_bounds(w: Tensor, d: int, Ls: int | None = None) → Tuple[Tensor, Tensor][source]¶

Compute frame bounds of a filterbank using polyphase representation.

Frame bounds characterize the numerical stability and invertibility of the filterbank transform. Tight frames (A ≈ B) provide optimal stability.

Parameters:

w (torch.Tensor) – Impulse responses of shape (num_channels, length)
d (int) – Decimation (stride) factor
Ls (int, optional) – Signal length. If None, computed automatically. Default: None

Returns:

Lower and upper frame bounds (A, B)

Return type:

Tuple[torch.Tensor, torch.Tensor]

Note

For d=1, reduces to computing min/max of power spectral density. For d>1, uses polyphase analysis to compute worst-case eigenvalues.

Example

>>> w = torch.randn(40, 128)
>>> A, B = frame_bounds(w, d=4)
>>> condition_number = B / A

hybra.utils.condition_number(w: Tensor, d: int, Ls: int | None = None) → Tensor[source]¶

Compute condition number of a filterbank frame operator.

The condition number κ = B/A quantifies numerical stability, where A and B are the lower and upper frame bounds. Lower values indicate better stability.

Parameters:

w (torch.Tensor) – Impulse responses of shape (num_channels, signal_length)
d (int) – Decimation factor (stride)
Ls (int, optional) – Signal length. If None, computed automatically. Default: None

Returns:

Condition number κ = B/A

Return type:

torch.Tensor

Note

κ = 1 indicates a tight frame (optimal stability). κ >> 1 suggests potential numerical instability.

Example

>>> w = torch.randn(40, 128)
>>> kappa = condition_number(w, d=4)
>>> print(f"Condition number: {kappa.item():.2f}")

Auditory Scale Conversions¶

hybra.utils.freqtoaud(freq: float | int | Tensor, scale: str = 'erb', fs: int | None = None) → Tensor[source]¶

Convert frequencies from Hz to auditory scale units.

Transforms linear frequency values to perceptually-motivated auditory scales that better reflect human frequency discrimination.

Parameters:

freq (Union[float, int, torch.Tensor]) – Frequency value(s) in Hz
scale (str) – Auditory scale type. One of {‘erb’, ‘mel’, ‘bark’, ‘log10’, ‘elelog’}. Default: ‘erb’
fs (int, optional) – Sampling frequency (required for ‘elelog’ scale). Default: None

Returns:

Corresponding auditory scale units

Return type:

torch.Tensor

Raises:

ValueError – If unsupported scale is specified or fs is missing for ‘elelog’

Note

ERB: Equivalent Rectangular Bandwidth (Glasberg & Moore)
MEL: Mel scale (perceptually uniform pitch)
Bark: Bark scale (critical band rate)
elelog: Logarithmic scale adapted for elephant hearing

Example

>>> freq_hz = torch.tensor([100, 1000, 8000])
>>> mel_units = freqtoaud(freq_hz, scale='mel')

hybra.utils.audtofreq(aud: float | int | Tensor, scale: str = 'erb', fs: int | None = None) → Tensor[source]¶

Convert auditory scale units back to frequencies in Hz.

Parameters:

aud (Union[float, int, torch.Tensor]) – Auditory scale values
scale (str) – Auditory scale type. One of {‘erb’, ‘mel’, ‘bark’, ‘log10’, ‘elelog’}. Default: ‘erb’
fs (int, optional) – Sampling frequency (required for ‘elelog’ scale). Default: None

Returns:

Corresponding frequencies in Hz

Return type:

torch.Tensor

Example

>>> mel_units = torch.tensor([100, 1000, 2000])
>>> freq_hz = audtofreq(mel_units, scale='mel')

hybra.utils.audspace(fmin: float | int | Tensor, fmax: float | int | Tensor, num_channels: int, scale: str = 'erb')[source]¶

Computes a vector of values equidistantly spaced on the selected auditory scale.

Parameters:

fmin (float) – Minimum frequency in Hz.
fmax (float) – Maximum frequency in Hz.
num_channels (int) – Number of points in the output vector.
audscale (str) – Auditory scale (default is ‘erb’).

Returns:

y (ndarray): Array of frequencies equidistantly scaled on the auditory scale.

Return type:

tuple

hybra.utils.audspace_mod(fc_low: float | int | Tensor, fc_high: float | int | Tensor, fs: int, num_channels: int, scale: str = 'erb')[source]¶

Generate M frequency samples that are equidistant in the modified auditory scale.

Parameters:

fc_crit (float) – Critical frequency in Hz.
fs (int) – Sampling rate in Hz.
M (int) – Number of filters/channels.

Returns:

Frequency values in Hz and in the auditory scale.

Return type:

ndarray

Filterbank Construction¶

hybra.utils.audfilters(kernel_size: int | None = None, num_channels: int = 96, fc_max: float | int | None = None, fs: int = None, L: int = None, supp_mult: float = 1, scale: str = 'mel') → Tuple[Tensor, int, Tensor, int | float, int | float, int, int, int, Tensor][source]¶

Generate auditory-inspired FIR filterbank kernels.

Creates a bank of bandpass filters with center frequencies distributed according to perceptual auditory scales (mel, erb, bark, etc.). Filters are designed with variable bandwidths matching critical bands of human auditory perception.

Parameters:

kernel_size (int, optional) – Maximum filter kernel size. If None, computed automatically. Default: None
num_channels (int) – Number of frequency channels. Default: 96
fc_max (float, optional) – Maximum center frequency in Hz. If None, uses fs//2. Default: None
fs (int) – Sampling frequency in Hz. Default: None (required)
L (int) – Signal length in samples. If None, uses fs. Default: None
supp_mult (float) – Support multiplier for kernel sizing. Default: 1.0
scale (str) – Auditory scale. One of {‘mel’, ‘erb’, ‘bark’, ‘log10’, ‘elelog’}. Default: ‘mel’

Returns:

kernels (torch.Tensor): Filter kernels of shape (num_channels, kernel_size)
d (int): Recommended stride for 25% overlap
fc (torch.Tensor): Center frequencies in Hz
fc_min (Union[int, float]): Minimum center frequency
fc_max (Union[int, float]): Maximum center frequency
kernel_min (int): Minimum kernel size used
kernel_size (int): Maximum kernel size used
Ls (int): Adjusted signal length
tsupp (torch.Tensor): Time support for each filter

Return type:

Tuple containing

Raises:

ValueError – If parameters are invalid (negative values, unsupported scale, etc.)

Note

The filterbank construction follows auditory modeling principles where: - Low frequencies use longer filters (better frequency resolution) - High frequencies use shorter filters (better time resolution) - Bandwidth scales according to critical band theory

Example

>>> kernels, stride, fc, _, _, _, _, Ls, _ = audfilters(
...     kernel_size=128, num_channels=40, fs=16000, scale='mel'
... )
>>> print(f"Generated {kernels.shape[0]} filters with stride {stride}")

hybra.utils.fctobw(fc: float | int | Tensor, scale='erb')[source]¶

Computes the critical bandwidth of a filter at a given center frequency.

Parameters:

fc (float or ndarray) – Center frequency in Hz. Must be non-negative.
audscale (str) – Auditory scale. Supported values are: - ‘erb’: Equivalent Rectangular Bandwidth (default) - ‘bark’: Bark scale - ‘mel’: Mel scale - ‘log10’: Logarithmic scale

Returns:

Critical bandwidth at each center frequency.

Return type:

ndarray or float

hybra.utils.bwtofc(bw: float | int | Tensor, scale='erb')[source]¶

Computes the center frequency corresponding to a given critical bandwidth.

Parameters:

bw (float or ndarray) – Critical bandwidth. Must be non-negative.
scale (str) – Auditory scale. Supported values are: - ‘erb’: Equivalent Rectangular Bandwidth - ‘bark’: Bark scale - ‘mel’: Mel scale - ‘log10’: Logarithmic scale

Returns:

Center frequency corresponding to the given bandwidth.

Return type:

ndarray or float

hybra.utils.firwin(kernel_size: int, padto: int = None)[source]¶

FIR window generation in Python.

Parameters:

kernel_size (int) – Length of the window.
padto (int) – Length to which it should be padded.
name (str) – Name of the window.

Returns:

FIR window.

Return type:

g (ndarray)

hybra.utils.modulate(g: Tensor, fc: float | int | Tensor, fs: int)[source]¶

Modulate a filters.

Parameters:

g (list of torch.Tensor) – Filters.
fc (list) – Center frequencies.
fs (int) – Sampling rate.

Returns:

Modulated filters.

Return type:

g_mod (list of torch.Tensor)

Convolution Operations¶

hybra.utils.circ_conv(x: Tensor, kernels: Tensor, d: int = 1) → Tensor[source]¶

Circular convolution with optional downsampling.

Performs efficient circular convolution using FFT, followed by downsampling. The kernels are automatically centered for proper phase alignment.

Parameters:

x (torch.Tensor) – Input signal of shape (…, signal_length)
kernels (torch.Tensor) – Filter kernels of shape (num_channels, 1, kernel_length) or (num_channels, kernel_length)
d (int) – Downsampling factor (stride). Default: 1

Returns:

Convolved and downsampled output of shape (…, num_channels, output_length)

Return type:

torch.Tensor

Note

Uses circular convolution which assumes periodic boundary conditions. Kernels are automatically zero-padded and centered.

Example

>>> x = torch.randn(1, 1000)
>>> kernels = torch.randn(40, 128)
>>> y = circ_conv(x, kernels, d=4)

hybra.utils.circ_conv_transpose(y: Tensor, kernels: Tensor, d: int = 1) → Tensor[source]¶

Transpose (adjoint) of circular convolution with upsampling.

Implements the adjoint operation of circ_conv for signal reconstruction. Used in synthesis/decoder operations of filterbanks.

Parameters:

y (torch.Tensor) – Input coefficients of shape (…, num_channels, num_frames)
kernels (torch.Tensor) – Filter kernels of shape (num_channels, 1, kernel_length) or (num_channels, kernel_length)
d (int) – Upsampling factor (stride). Default: 1

Returns:

Reconstructed signal of shape (…, 1, signal_length)

Return type:

torch.Tensor

Note

This is the mathematical adjoint, not the true inverse. For perfect reconstruction, appropriate dual frame filters should be used.

Example

>>> coeffs = torch.randn(1, 40, 250)
>>> kernels = torch.randn(40, 128)
>>> x_recon = circ_conv_transpose(coeffs, kernels, d=4)

Visualization Functions¶

Plot time-frequency representation of filterbank coefficients.

Creates a spectrogram-like visualization with frequency on y-axis and time on x-axis. Supports logarithmic scaling and frequency range limitation for better visualization.

Parameters:

c (torch.Tensor) – Filterbank coefficients of shape (batch_size, num_channels, num_frames)
fc (torch.Tensor, optional) – Center frequencies in Hz for y-axis labeling. Default: None
L (int, optional) – Original signal length for time axis scaling. Default: None
fs (int, optional) – Sampling frequency for time axis scaling. Default: None
fmax (float, optional) – Maximum frequency to display in Hz. Default: None
log_scale (bool) – Whether to apply log10 scaling to coefficients. Default: False
vmin (float, optional) – Minimum value for dynamic range clipping. Default: None
cmap (str) – Matplotlib colormap name. Default: ‘inferno’

Note

This function displays a plot and does not return values. Only processes the first batch element if batch_size > 1.

Example

>>> coeffs = torch.randn(1, 40, 250)
>>> fc = torch.linspace(100, 8000, 40)
>>> ISACgram(coeffs, fc=fc, L=16000, fs=16000, log_scale=True)

hybra.utils.plot_response(g: ndarray, fs: int, scale: str = 'mel', plot_scale: bool = False, fc_min: float | None = None, fc_max: float | None = None, kernel_min: int | None = None, decoder: bool = False) → None[source]¶

Plot frequency responses and auditory scale visualization of filters.

Creates comprehensive visualization showing individual filter responses, total power spectral density, and optional auditory scale mapping.

Parameters:

g (np.ndarray) – Filter kernels of shape (num_channels, kernel_size)
fs (int) – Sampling frequency in Hz for frequency axis scaling
scale (str) – Auditory scale name for scale plotting. Default: ‘mel’
plot_scale (bool) – Whether to plot the auditory scale mapping. Default: False
fc_min (float, optional) – Lower transition frequency for scale visualization. Default: None
fc_max (float, optional) – Upper transition frequency for scale visualization. Default: None
kernel_min (int, optional) – Minimum kernel size for annotations. Default: None
decoder (bool) – Whether filters are for synthesis (affects plot titles). Default: False

Note

This function displays plots and does not return values. Creates 2-3 subplots depending on plot_scale parameter.

Example

>>> filters = np.random.randn(40, 128)
>>> plot_response(filters, fs=16000, scale='mel', plot_scale=True)

hybra.utils.response(g: ndarray, fs: int) → ndarray[source]¶

Compute frequency responses of filter kernels.

Parameters:

g (np.ndarray) – Filter kernels of shape (num_channels, kernel_size)
fs (int) – Sampling frequency for frequency axis scaling

Returns:

Magnitude-squared frequency responses of shape (2*num_channels, fs//2)

Return type:

np.ndarray

Note

Computes responses for both analysis and conjugate filters.

Frame Analysis Functions¶

hybra.utils.frequency_correlation(w: Tensor, d: int, Ls: int | None = None, diag_only: bool = False) → Tensor[source]¶

Computes the frequency correlation functions (vectorized version). :param w: (J, K) - Impulse responses :param d: Decimation factor :param Ls: FFT length (default: nearest multiple of d ≥ 2K-1) :param diag_only: If True, only return diagonal (i.e., PSD)

Returns:: (d, Ls) complex tensor with frequency correlations
Return type:: G

hybra.utils.alias(w: Tensor, d: int, Ls: int | None = None, diag_only: bool = False) → Tensor[source]¶

Computes the norm of the aliasing terms. :param w: Impulse responses of the filterbank as 2-D Tensor torch.tensor[num_channels, sig_length] :param d: Decimation factor, must divide filter length!

Output:: A: Energy of the aliasing terms

hybra.utils.can_tight(w: Tensor, d: int, Ls: int) → Tensor[source]¶

Computes the canonical tight filterbank of w (time domain) using the polyphase representation. :param w: Impulse responses of the filterbank as 2-d Tensor torch.tensor[num_channels, signal_length] :param d: Decimation factor, must divide signal_length!

Returns:: Canonical tight filterbank of W (torch.tensor[num_channels, signal_length])
Return type:: W

hybra.utils.fir_tightener3000(w: Tensor, supp: int, d: int, eps: float = 1.01, Ls: int | None = None)[source]¶

Iterative tightening procedure with fixed support for a given filterbank w :param w: Impulse responses of the filterbank as 2-D Tensor torch.tensor[num_channels, signal_length]. :param supp: Desired support of the resulting filterbank :param d: Decimation factor, must divide filter length! :param eps: Desired precision for the condition number :param Ls: System length (if not already given by w). If set, the resulting filterbank is padded with zeros to length Ls.

Returns:: Filterbank with condition number eps and support length supp. If length=supp then the resulting filterbank is the canonical tight filterbank of w.

Helper Functions¶

hybra.utils.upsample(x: Tensor, d: int) → Tensor[source]¶

hybra.utils.freqtoaud_mod(freq: float | int | Tensor, fc_low: float | int | Tensor, fc_high: float | int | Tensor, scale='erb', fs=None)[source]¶

Modified auditory scale function with linear region below fc_crit.

Parameters:

freq (ndarray) – Frequency values in Hz.
fc_low (float) – Lower transition frequency in Hz.
fc_high (float) – Upper transition frequency in Hz.

Returns:

Values on the modified auditory scale.

Return type:

ndarray

hybra.utils.audtofreq_mod(aud: float | int | Tensor, fc_low: float | int | Tensor, fc_high: float | int | Tensor, scale='erb', fs=None)[source]¶

Inverse of freqtoaud_mod to map auditory scale back to frequency.

Parameters:

aud (ndarray) – Auditory scale values.
fc_low (float) – Lower transition frequency in Hz.
fc_high (float) – Upper transition frequency in Hz.

Returns:

Frequency values in Hz

Return type:

ndarray

API Reference¶

Core Filterbanks¶

ISAC - Invertible and Stable Auditory Filterbank¶

HybrA - Hybrid Auditory Filterbank¶

Spectrogram and Cepstral Variants¶

ISACSpec - ISAC Spectrogram¶

ISACCC - ISAC Cepstral Coefficients¶

Utility Functions¶

Frame Theory Functions¶

Auditory Scale Conversions¶

Filterbank Construction¶

Convolution Operations¶

Visualization Functions¶

Frame Analysis Functions¶

Helper Functions¶

HybrA-Filterbanks

Navigation

Related Topics

Versions