I think we can make the golden globe winner Bong Joon-ho‘s wish even easier, thanks to technology.

The Idea

Recently I became extremely interested in all sorts of podcasts. Besides the fact that it fits well into my daily routine of listening to a podcast during my London commute to work, and the numerous flights I took for Christmas to join my very international family, I enjoy this type of content format. When I hear something interesting my first thought is to share it with people around me, that might be interested in the topic. Recently I wanted to share an English podcast with my mother. She doesn’t speak fluent English, so I started looking into ways of translating it for her in French. This is how my side project begun.

As Tim Berners-Lee says “The dream behind the Web is of a common information space in which we communicate by sharing information.”

So let’s share this information with everyone regardless of which language they speak, by using technology.

Problem

I reached out to Spotify, given it was a podcast centric idea, unfortunately they didn’t have the stats I was looking for. During my search I did find this old (2015) article that summarises my point.

72% of (non-English speaking) consumers spend most of their time on a very, very small fraction of the web

Mission

Use technology to start the move towards content accessibility in any language.

Baby steps

It has been a while since I touched a line of code, so I had to install pretty much everything from scratch. I used Visual Studio Code (as suggested by Carbs).  I didn’t really know which language I would code with. It seemed like a good idea to do the project in python, even though I have never coded in python before as my background is in C/C++ mainly. I have to say I really enjoyed the foreignness of using python language.

After a few failures and realisations, I quickly had to revert back to the agile approach and create something much smaller than my original idea (I keep reminding myself that that’s OK). This was a bit frustrating as it was difficult to let go of the bigger beautiful picture I had in my head. But it is important to have a retrospective and re-adjust.

It is normal to feel overwhelmed and have a 100 tabs open trying to find out what dependencies are missing and how to fix it, which creates another error etc etc…

Fortunately after a fair bit of research and quite a few hours/days spent on YouTube tutorials, Stackoverflow and Carbs rescues, I managed to move on to the next exciting step. If I had to draw a graph of what happened it would probably look like this:

Objective

  1. Speech to text
    1. Input speech to text or Input .wav file to text
    2. Output speechtotext.txt file
  2. Text translate
    1. Input speechtotext.txt file
    2. Output translated.txt
  3. Text to speech in the translated language
    1. Input translated.txt
    2. Output texttospeech.mp3

How to set up the venv and run your code:

  1. Activate the gtts env -> in terminal write “source venv30/bin/activate” 
    1. You need to create an environment where you would run your code, that way you are sure that all the versions and dependencies are downloaded in the correct environment. I followed this tutorial to create my venv30 you can see on my video recording.
  2. Then run your python code following the command line “python your_file_name.py”

Demo

Here’s a demo video – make sure your sound is on. 

GitHub link 

Notes

I found out that Google APIs may not understand their own Google voice very well. Also a .wav files may be transcripted into a text with a bunch of extra file information, which will need to be cleaned up before passing it onto the translate API.  It’s also worth mentioning there is a timeout parameter that behaves a bit randomly, but I decided not to spend too much time on that bit.

Last but not least if you decide not to use a .wav file for the speech to text, but your own voice you may need to turn the settings volume down manually once you are done talking, as the speech recognition is very sensitive, so it will capture every single noise and will not stop listening until there is absolute silence. Again I didn’t spend much time on that either. Maybe worth exploring at a later stage.

Conclusion

I really enjoyed going back to programming and plugging all these different APIs together. Having an idea and beginning to realise it is a truly empowering start to the year.

Take a look at my files in github and feel free to build on them! I can’t wait to see what will happen with the help of a larger community.

Acknowledgment

I would like to thank my mother for unconsciously inspiring me for this idea.

Big thank you to Carbs who has helped me with some great technical suggestions and unblocked my code a few times.

And of course thank you to NotBinary for supporting me and allowing me to have a few days working on this side project.

References:

  • Google speech recognition API accepts only wav, here is a site I used for the mp3 to wav conversion.
  • I followed this video for the google translate step
  • This was helpful to set up an environment in Python
  • Text to speech video and this video
  • Speech recognition video and this article

Comments

Add a comment

Your comment will be revised by the site if needed.