Transcribe a talk into a blog post

A couple of years ago, Diane Mueller shared with me her experience converting her talks into blog posts or written content to reuse later. And I’ve tried to use it with one of my latest talk to easily turn it into a draft for a blog post. How was the experience?

Getting the audio from the talk

There are several ways to get the audio from your own talks. The one I’ve used the most is to use the audio record functionality in my phone. No extra apps nor hardware needed.

In other cases, if the talk is published on YouTube, you can download it with youtube-dl:

$ youtube-dl <YOUTUBE-VIDEO-URL>

In that case, to get the audio from the mp4 file you can use ffmpeg:

$ ffmpeg -i <VIDEO-FILE-NAME>.mp4 -f mp3 -ab 192000 -vn <AUDIO-FILE-NAME>.mp3

Getting the text from the talk

There are several speech to text services. Some are fully automatic, and some include partial or full human content curation. I’ve decided to try AWS Transcribe to test the outcome of AI transcription.

The process is not very straight forward, but it’s simple enough for me. First, you need to upload the file to an AWS S3 bucket. Once uploaded, you need to copy the file address, because you’ll need it later. It’ll be something like:

s3://<S3-BUCKET-NAME>/<AUDIO-FILE-NAME>.mp3

In AWS Transcribe, create a new job by giving it a name and the S3 address of the file to transcribe:

AWS Transcribe job set up

There are some limitations. For example, it’s not possible to transcribe audio files that last more than 4 hours. Well, I can speak a lot during a talk, but not that much before people get asleep.

As an outcome, it produces a json file where one of the fields (transcript as one of the items in results.transcripts) is the transcription produced. You can see a preview in the job outcome page:

AWS Transcribe job outcome preview

Testing with a real talk

I’ve tried with my talk about Open Source Program Office (OSPO) from last Social Northern California Linux Expo (SCaLE):

Of course, English is not my mother language and it could explain some big discrepancies between my talk and the transcription generated.

What do you thin about these AI solutions? Have you already tried one? What is your experience? Please, feel free to share your answers as comments to this post. Thank you!