Most of the attention in the AI text space is on chatbots like ChatGPT , Claude and Google Gemini but some of the most powerful features are hidden away in APIs and studios. One example of this is in the Google AI Studio where Gemini Flash 1.5 can do everything from transcription to video analysis.
Gemini Flash 1.5 is one of the most recent models unveiled by Google at its I/O event earlier this year and, as well as being among its most capable, is also cheaper and faster to run. This means using it is currently free in the Google AI studio — so it's a great time to put it to the test.
Inspiration for this article came after a request to transcribe a 15-minute video. While I might usually have used a local Whisper ( OpenAI 's open-source AI transcription model) code on my Mac, this comes with the drawback of dumping it as a single block of text with no breakdown by the speaker. So I tried a few alternatives including Otter and Rev.
Both are useful but come with a cost. The impressive Plaude Note can now identify a recording by speaker, but that is primarily for recordings you’ve made rather than audio files you already have — so I tried Google’s AI studio. I decided to first convert the video to audio for the sake of file size — the video was 5 GB — and then I loaded it into Gemini 1.
5 Flash. It gave me a block of text like Whisper so I asked it to identify speakers and display it in a readable format. It split it out and displayed it like a script, even picking u.
