.. HybrA-Filterbanks documentation master file, created by sphinx-quickstart on Wed May 21 11:34:47 2025. You can adapt this file completely to your liking, but it should at least contain the root `toctree` directive. HybrA-Filterbanks ================= **Auditory-inspired filterbanks for deep learning** Welcome to HybrA-Filterbanks, a PyTorch library providing state-of-the-art auditory-inspired filterbanks for audio processing and deep learning applications. Overview -------- This library contains the official implementations of: * **ISAC** (`paper `_): Invertible and Stable Auditory filterbank with Customizable kernels for ML integration * **HybrA** (`paper `_): Hybrid Auditory filterbank that extends ISAC with learnable filters * **ISACSpec**: Spectrogram variant with temporal averaging for robust feature extraction * **ISACCC**: Cepstral coefficient extractor for speech recognition applications Key Features ------------ ✨ **PyTorch Integration**: All filterbanks are implemented as ``nn.Module`` for seamless integration into neural networks 🎯 **Auditory Modeling**: Based on human auditory perception principles (mel, ERB, bark scales) ⚡ **Fast Implementation**: Optimized using FFT-based circular convolution 🔧 **Flexible Configuration**: Customizable kernel sizes, frequency ranges, and scales 📊 **Frame Theory**: Built-in functions for frame bounds, condition numbers, and stability analysis 🎨 **Visualization**: Rich plotting capabilities for filter responses and time-frequency representations Installation ------------ We publish all releases on PyPi. You can install the current version by running: :: pip install hybra Quick Start ----------- Basic ISAC Filterbank ~~~~~~~~~~~~~~~~~~~~~ .. code-block:: python :linenos: import torch from hybra import ISAC # Create ISAC filterbank filterbank = ISAC( kernel_size=128, num_channels=40, fs=16000, L=16000, scale='mel' ) # Process audio signal x = torch.randn(1, 16000) # Random signal for demo coefficients = filterbank(x) reconstructed = filterbank.decoder(coefficients) # Visualize filterbank.plot_response() filterbank.ISACgram(x, log_scale=True) HybrA with Learnable Filters ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: python :linenos: from hybra import HybrA # Create hybrid filterbank with learnable components hybrid_fb = HybrA( kernel_size=128, learned_kernel_size=23, num_channels=40, fs=16000, L=16000 ) # Forward pass (supports gradients) x = torch.randn(1, 16000, requires_grad=True) y = hybrid_fb(x) # Check condition number for stability print(f"Condition number: {hybrid_fb.condition_number():.2f}") ISAC Spectrograms and MFCCs ~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: python :linenos: from hybra import ISACSpec, ISACCC # Spectrogram with temporal averaging spectrogram = ISACSpec( num_channels=40, fs=16000, L=16000, power=2.0, is_log=True ) # MFCC-like cepstral coefficients mfcc_extractor = ISACCC( num_channels=40, num_cc=13, fs=16000, L=16000 ) x = torch.randn(1, 16000) spec = spectrogram(x) mfccs = mfcc_extractor(x) It is also straightforward to include them in any model, e.g., as an encoder/decoder pair. .. code-block:: python :linenos: :caption: HybrA model example import torch import torch.nn as nn import torchaudio from hybra import HybrA class Net(nn.Module): def __init__(self): super().__init__() self.linear_before = nn.Linear(40, 400) self.gru = nn.GRU( input_size=400, hidden_size=400, num_layers=2, batch_first=True, ) self.linear_after = nn.Linear(400, 600) self.linear_after2 = nn.Linear(600, 600) self.linear_after3 = nn.Linear(600, 40) def forward(self, x): x = x.permute(0, 2, 1) x = torch.relu(self.linear_before(x)) x, _ = self.gru(x) x = torch.relu(self.linear_after(x)) x = torch.relu(self.linear_after2(x)) x = torch.sigmoid(self.linear_after3(x)) x = x.permute(0, 2, 1) return x class HybridfilterbankModel(nn.Module): def __init__(self): super().__init__() self.nsnet = Net() self.fb = HybrA() def forward(self, x): x = self.fb(x) mask = self.nsnet(torch.log10(torch.max(x.abs()**2, 1e-8 * torch.ones_like(x, dtype=torch.float32)))) return self.fb.decoder(x*mask) if __name__ == '__main__': audio, fs = torchaudio.load('your_audio.wav') model = HybridfilterbankModel() model(audio) Citation -------- If you find our work valuable, please cite :: @article{HaiderTight2024, title={Hold me Tight: Trainable and stable hybrid auditory filterbanks for speech enhancement}, author={Haider, Daniel and Perfler, Felix and Lostanlen, Vincent and Ehler, Martin and Balazs, Peter}, journal={arXiv preprint arXiv:2408.17358}, year={2024} } @article{HaiderISAC2025, title={ISAC: An Invertible and Stable Auditory Filter Bank with Customizable Kernels for ML Integration}, author={Daniel Haider and Felix Perfler and Peter Balazs and Clara Hollomey and Nicki Holighaus}, year={2025}, url={arXiv preprint arXiv:2505.07709}, } .. toctree:: :maxdepth: 2 :caption: Documentation: api examples mathematical_background .. toctree:: :maxdepth: 1 :caption: Links: GitHub Repository PyPI Package Indices and tables ================== * :ref:`genindex` * :ref:`modindex` * :ref:`search`