Combining Multiple Public Pre-Trained Models for Better Audio Segmentation

The session explains how merging Whisper, WebRTC‑VAD, and other public models creates a multi‑model pipeline that improves audio segmentation without extra preprocessing.

Overview

The work-in-progress includes an audio processing pipeline. The audio is segmented and transcribed. Linguistic text features are extracted from the transcribed text. The use of publicly pre-trained AI models has provided high quality results. The next development stage will be focused on finer segmentation of the audio using the outputs of these public models.

Links

https://ietech.ca/
Helps businesses transform using data science, digital strategy, and analytics expertise.

Tech stack