On Mac & Linux, Using OpenAI Whisper To Transcribe Live Interview

Sean Song
2 min readJun 10, 2023

While experimenting AI Voice oriented applications, I found an interesting project: https://github.com/SevaSk/ecoute Which is live transcription of an interview, at the same time, the OpenAI Chat windows can offer you background of your topics. It is like a Job Interview guidance or assistance.

Ecoute demo

It is pretty fun to play with, but the project was dedicated on Windows, because it leveraged the PyAudioWPatch to manage audio devices. One major advantage of PyAudioWPatch over generic PyAudio is its Windows LoopBack audio support. It is for any speaker (or sink) signal to be recorded (or re-used) at the driver level. For example, in many VoIP applications, Acoustic Echo Cancellation (AEC) is a key feature to remove echos from the microphone, so LoopBack is necessary to do the echo signal processing.

AEC illustrated

After some intensive research, I didn’t find any straight forward alternatives to PyAudioWPatch on Mac and Linux. There are a couple tutorials about Mac, you have to install audio tools to facilitate such action: LoopBack sink signals.

My solution to this issue was simple: I have HiDock, which performs very well in term of AEC. The microphone just removes its speaker signal by default. So I don’t have to worry about the microphone signal input to a computer contains any speaker data. It is perfect for Ecoute like applications. So I just hook my HiDock to my computer and modify a little bit Ecoute code for Linux & Mac. It worked!

HiDock with Linux SBC

Code is here: https://github.com/oldsongsz/ecoute for Mac and Linux.

Update, the author https://github.com/SevaSk updated Ecoute to support both offline Whisper and online OpenAI API, the later one runs much faster. And by default, it could support multilingual translation. Pretty cool, it is a good project to learn audio transcription from scratch.

--

--