hybra

class hybra.HybrA(kernel_size: int = 128, learned_kernel_size: int = 23, num_channels: int = 40, stride: int = None, fc_max: float | int = None, fs: int = 16000, L: int = 16000, bw_multiplier: float = 1, scale: str = 'erb', tighten: bool = True, det_init: bool = False)
decoder(x_real: Tensor, x_imag: Tensor) Tensor

Forward pass of the dual HybridFilterbank.

Parameters:

x (torch.Tensor) - input tensor of shape (batch_size, num_channels, signal_length//hop_length)

Returns:

x (torch.Tensor) - output tensor of shape (batch_size, signal_length)

encoder(x: Tensor)

For learning use forward method!

forward(x: Tensor) Tensor

Forward pass of the HybridFilterbank.

Parameters:

x (torch.Tensor) - input tensor of shape (batch_size, 1, signal_length)

Returns:

x (torch.Tensor) - output tensor of shape (batch_size, num_channels, signal_length//hop_length)

class hybra.ISAC(kernel_size: int | None = 128, num_channels: int = 40, fc_max: float | int | None = None, stride: int = None, fs: int = 16000, L: int = 16000, supp_mult: float = 1, scale: str = 'erb', tighten=False, is_encoder_learnable=False, use_decoder=False, is_decoder_learnable=False)
decoder(x_real: Tensor, x_imag: Tensor) Tensor

Filterbank synthesis.

Parameters:

x (torch.Tensor) - input tensor of shape (batch_size, num_channels, signal_length//hop_length)

Returns:

x (torch.Tensor) - output tensor of shape (batch_size, signal_length)

forward(x)

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class hybra.ISACMelSpectrogram(kernel_size: int | None = None, num_channels: int = 40, fc_max: float | int | None = None, stride: int = None, fs: int = 16000, L: int = 16000, bw_multiplier: float = 1, scale: str = 'erb', is_encoder_learnable=False, is_averaging_kernel_learnable=False, is_log=False)
forward(x: Tensor) Tensor

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

hybra.utils

hybra.utils.ISACgram(coefficients, fc, L, fs, fc_max=None, log_scale=False, vmin=None, cmap='inferno')

Plot the ISAC coefficients with optional log scaling and colorbar.

Parameters:
  • coefficients (numpy.Array or torch.Tensor) – Filterbank coefficients.

  • fc (numpy.Array) – Center frequencies.

  • L (int) – Signal length.

  • fs (int) – Sampling rate.

  • fc_max (float or None) – Max frequency to display.

  • log_scale (bool) – Apply log scaling to coefficients.

  • vmin (float or None) – Minimum value for dynamic range clipping.

  • cmap (str) – Matplotlib colormap name.

hybra.utils.alias(w: Tensor, d: int, Ls: int | None = None, diag_only: bool = False) Tensor

Computes the norm of the aliasing terms. :param w: Impulse responses of the filterbank as 2-D Tensor torch.tensor[num_channels, sig_length] :param d: Decimation factor, must divide filter length!

Output:

A: Energy of the aliasing terms

hybra.utils.audfilters(kernel_size: int | None = None, num_channels: int = 96, fc_max: float | int | None = None, fs: int = 16000, L: int = 16000, supp_mult: float = 1, scale: str = 'mel') tuple[Tensor, int, int, int | float, int | float, int, int, int]

Generate FIR filter kernels with length kernel_size equidistantly spaced on auditory frequency scales.

Parameters:
  • kernel_size (int) – Size of the filter kernels (equals maximum window length).

  • num_channels (int) – Number of channels.

  • fc_max (int) – Maximum frequency (in Hz) that should lie on the aud scale.

  • fs (int) – Sampling rate.

  • L (int) – Signal length.

  • bw_multiplier (float) – Bandwidth multiplier.

  • scale (str) – Auditory scale.

Returns:

kernels (torch.Tensor): Generated kernels. d (int): Downsampling rates. fc (list): Center frequencies. fc_min (int, float): First transition frequency. fc_max (int, float): Second transition frequency. kernel_min (int): Minimum kernel size. kernel_size (int): Maximum kernel size. L (int): Admissible signal length.

Return type:

tuple

hybra.utils.audspace(fmin: float | int | Tensor, fmax: float | int | Tensor, num_channels: int, scale: str = 'erb')

Computes a vector of values equidistantly spaced on the selected auditory scale.

Parameters:
  • fmin (float) – Minimum frequency in Hz.

  • fmax (float) – Maximum frequency in Hz.

  • num_channels (int) – Number of points in the output vector.

  • audscale (str) – Auditory scale (default is ‘erb’).

Returns:

y (ndarray): Array of frequencies equidistantly scaled on the auditory scale.

Return type:

tuple

hybra.utils.audspace_mod(fc_low: float | int | Tensor, fc_high: float | int | Tensor, fs: int, num_channels: int, scale: str = 'erb')

Generate M frequency samples that are equidistant in the modified auditory scale.

Parameters:
  • fc_crit (float) – Critical frequency in Hz.

  • fs (int) – Sampling rate in Hz.

  • M (int) – Number of filters/channels.

Returns:

Frequency values in Hz and in the auditory scale.

Return type:

ndarray

hybra.utils.audtofreq_mod(aud: float | int | Tensor, fc_low: float | int | Tensor, fc_high: float | int | Tensor, scale='erb', fs=None)

Inverse of freqtoaud_mod to map auditory scale back to frequency.

Parameters:
  • aud (ndarray) – Auditory scale values.

  • fc_low (float) – Lower transition frequency in Hz.

  • fc_high (float) – Upper transition frequency in Hz.

Returns:

Frequency values in Hz

Return type:

ndarray

hybra.utils.bwtofc(bw: float | int | Tensor, scale='erb')

Computes the center frequency corresponding to a given critical bandwidth.

Parameters:
  • bw (float or ndarray) – Critical bandwidth. Must be non-negative.

  • scale (str) – Auditory scale. Supported values are: - ‘erb’: Equivalent Rectangular Bandwidth - ‘bark’: Bark scale - ‘mel’: Mel scale - ‘log10’: Logarithmic scale

Returns:

Center frequency corresponding to the given bandwidth.

Return type:

ndarray or float

hybra.utils.can_tight(w: Tensor, d: int, Ls: int) Tensor

Computes the canonical tight filterbank of w (time domain) using the polyphase representation. :param w: Impulse responses of the filterbank as 2-d Tensor torch.tensor[num_channels, signal_length] :param d: Decimation factor, must divide signal_length!

Returns:

Canonical tight filterbank of W (torch.tensor[num_channels, signal_length])

Return type:

W

hybra.utils.condition_number(w: Tensor, d: int, Ls: int = None) Tensor

Computes the condition number of a filterbank. :param w: Impulse responses of the filterbank as 2-D Tensor torch.tensor[num_channels, signal_length] :param d: Decimation factor (stride), must divide signal_length!

Returns:

Condition number

Return type:

kappa

hybra.utils.fctobw(fc: float | int | Tensor, scale='erb')

Computes the critical bandwidth of a filter at a given center frequency.

Parameters:
  • fc (float or ndarray) – Center frequency in Hz. Must be non-negative.

  • audscale (str) – Auditory scale. Supported values are: - ‘erb’: Equivalent Rectangular Bandwidth (default) - ‘bark’: Bark scale - ‘mel’: Mel scale - ‘log10’: Logarithmic scale

Returns:

Critical bandwidth at each center frequency.

Return type:

ndarray or float

hybra.utils.fir_tightener3000(w: Tensor, supp: int, d: int, eps: float = 1.01, Ls: int | None = None)

Iterative tightening procedure with fixed support for a given filterbank w :param w: Impulse responses of the filterbank as 2-D Tensor torch.tensor[num_channels, signal_length]. :param supp: Desired support of the resulting filterbank :param d: Decimation factor, must divide filter length! :param eps: Desired precision for the condition number :param Ls: System length (if not already given by w). If set, the resulting filterbank is padded with zeros to length Ls.

Returns:

Filterbank with condition number eps and support length supp. If length=supp then the resulting filterbank is the canonical tight filterbank of w.

hybra.utils.firwin(kernel_size: int, padto: int = None)

FIR window generation in Python.

Parameters:
  • kernel_size (int) – Length of the window.

  • padto (int) – Length to which it should be padded.

  • name (str) – Name of the window.

Returns:

FIR window.

Return type:

g (ndarray)

hybra.utils.frame_bounds(w: Tensor, d: int, Ls: int = None) Tuple[Tensor, Tensor]

Computes the frame bounds of a filterbank given in impulse responses using the polyphase representation. :param w: Impulse responses of the filterbank as 2-D Tensor torch.tensor[num_channels, length] :param d: Decimation (or downsampling) factor, must divide filter length!

Returns:

A, B: Frame bounds

Return type:

tuple

hybra.utils.freqtoaud(freq: float | int | Tensor, scale: str = 'erb', fs: int = None)

Converts frequencies (Hz) to auditory scale units.

Parameters:
  • freq (float or ndarray) – Frequency value(s) in Hz.

  • scale (str) – Auditory scale. Supported values are: - ‘erb’ (default) - ‘mel’ - ‘bark’ - ‘log10’

Returns:

Corresponding auditory scale units.

Return type:

float or ndarray

hybra.utils.freqtoaud_mod(freq: float | int | Tensor, fc_low: float | int | Tensor, fc_high: float | int | Tensor, scale='erb', fs=None)

Modified auditory scale function with linear region below fc_crit.

Parameters:
  • freq (ndarray) – Frequency values in Hz.

  • fc_low (float) – Lower transition frequency in Hz.

  • fc_high (float) – Upper transition frequency in Hz.

Returns:

Values on the modified auditory scale.

Return type:

ndarray

hybra.utils.frequency_correlation(w: Tensor, d: int, Ls: int = None, diag_only: bool = False) Tensor

Computes the frequency correlation functions. :param w: Impulse responses of the filterbank as 2-D Tensor torch.tensor[num_channels, sig_length] :param d: Decimation factor, must divide filter length! :type d: int

Output:

G: (Ls x d) matrix with frequency correlations as columns

hybra.utils.modulate(g: Tensor, fc: float | int | Tensor, fs: int)

Modulate a filters.

Parameters:
  • g (list of torch.Tensor) – Filters.

  • fc (list) – Center frequencies.

  • fs (int) – Sampling rate.

Returns:

Modulated filters.

Return type:

g_mod (list of torch.Tensor)

hybra.utils.plot_response(g, fs, scale='erb', plot_scale=False, fc_min=None, fc_max=None, kernel_min=None, decoder=False)

Plotting routine for the frequencs scale and the frequency responses of the filters.

Parameters:
  • g (numpy.Array) – Filters.

  • fs (int) – Sampling rate for plotting Hz.

  • scale (str) – Auditory scale.

  • plot_scale (bool) – Plot the scale or not.

  • fc_min (float) – Lower transition frequency in Hz.

  • fc_max (float) – Upper transition frequency in Hz.

  • kernel_min (int) – Minimum kernel size.

  • decoder (bool) – Plot for the synthesis fb.

hybra.utils.response(g, fs)

Frequency response of the filters (Total power spectral density).

Parameters:
  • g (numpy.Array) – Filter kernels.

  • fs (int) – Sampling rate for plotting Hz.