Decoding Sound: I Built an AI Audio Analyzer & Vocoder in the Browser
Projects

Decoding Sound: I Built an AI Audio Analyzer & Vocoder in the Browser

By Ilias LaoukiliNovember 20, 20253 min

Sound is more than just waves—it’s data. Whether it’s the rhythmic beat of a song or the subtle tremor in a nervous speaker's voice, audio carries a wealth of hidden information.

For my latest academic project, I wanted to bridge the gap between manipulating sound (DSP) and understanding it (Machine Learning). The result is Sentim, a Python-based web application that you can run directly in your browser.

🚀 Launch the Live App

The Playground (DSP)

The first part of the app is a custom Vocoder. It demonstrates how we can use mathematics to alter the physics of sound without destroying it. I used librosa and scipy to build tools that allow you to:

  • Time-Stretch: Slow down audio without the "slow-motion voice" effect (using Phase Vocoding).
  • Pitch-Shift: Change the key of a voice without changing the speed.
  • Robotize: Apply Ring Modulation for a classic sci-fi aesthetic.

The Intelligence (Emotion Analysis)

The second part answers a complex question: Can we quantify emotion?

Using the RAVDESS dataset (Ryerson Audio-Visual Database of Emotional Speech and Song), I trained a Random Forest Classifier. The app extracts acoustic features—like timbre (MFCCs), pitch, and spectral contrast—to predict if an uploaded audio clip conveys Joy, Anger, Sadness, or a Neutral state.

It also features a "Heuristic Mode," allowing you to compare modern Machine Learning predictions against a traditional rule-based baseline.

Under the Hood

This project was built entirely in Python 3.9+, leveraging the power of open-source data science:

  • Streamlit: For the interactive web interface.
  • Librosa: For feature extraction and STFT operations.