Bypassing reCAPTCHA with Azure Text to Speech and Python

“I’m not a robot, Google.” can be annoying when you’re trying to automate the boring stuff. For me, I love to find bargain basement flight deals, and reCAPTCHA really hinders my ability to automate it. That is where Azure’s Text to Speech comes in handy with its free 5 hours a month and quick conversion time, we can take the reCAPTCHA’s sound and convert it into texts almost instantly. Please continue reading below to find out how you can also do it.


  • Azure Account (you can do this with AWS as well, but this tutorial will focus on Azure)
  • Python3 with the following libraries: pydub, azure
    • pip3 install pydub azure
  • FFmpeg
  • Optional: BypassRecaptcha Github

Step 1: Generate a cognitive services API key

After you have downloaded all the pre-requisites and created an Azure account, you will want to generate an Azure Speech API key.

  1. Go to the speech services create page via
  2. Select the subscription you want, a unique name, your preferred region, a resource group, and F0 for the pricing tier (free), so you don’t get charged. Similar to the image below
  3. Click create. Once it’s done deploying, click “Go to resource” and select Keys and endpoints. Save the Key & Region as we will need this later on in the tutorial

Step 2: Setup azure_speech_to_text function

  1. First let’s import the Azure Speech library
import azure.cognitiveservices.speech as speechsdk
  1. Next let’s set up the azure_speech_to_text function. Here you’ll have to import the key and region in the highlighted area
def azure_speech_to_text(file):
        speech_key, service_region = "KEY", "REGION"
        speech_config = speechsdk.SpeechConfig(subscription=speech_key, region=service_region)
        audio_input = speechsdk.AudioConfig(filename=file)
        speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config, audio_config=audio_input)
        print("Recognizing first result...")
        result = speech_recognizer.recognize_once()
        if result.reason == speechsdk.ResultReason.RecognizedSpeech:
            print("Recognized: {}".format(result.text))
            return format(result.text).replace(".", "")
        elif result.reason == speechsdk.ResultReason.NoMatch:
            print("No speech could be recognized: {}".format(result.no_match_details))
        elif result.reason == speechsdk.ResultReason.Canceled:
            cancellation_details = result.cancellation_details
            print("Speech Recognition canceled: {}".format(cancellation_details.reason))
            if cancellation_details.reason == speechsdk.CancellationReason.Error:
                print("Error details: {}".format(cancellation_details.error_details))
    except Exception as e:
        print("Could not request results from Microsoft Azure Voice Recognition service; {0}".format(e))
        return False
  1. Next, we need to set up our file conversion tool. Here we will be using AudioSegment from the PyDub library
from pydub import AudioSegment
  1. This is the code snippet you’ll be using to convert the audio file from MP3 to WAV to work with Azure Text to Speech.
sound = AudioSegment.from_mp3("audio.mp3")
sound.export("audio.wav", format="wav")

Step 3: Setup audio file converter function

  1. Next, we need to set up our file conversion tool. Here we will be using AudioSegment from the PyDub library, and URL Retrieve from URL Library
from pydub import AudioSegment
import urllib.request
  1. This is the code snippet you’ll be using to download & convert the audio file from MP3 to WAV to work with Azure Text to Speech.
def convert_audio(download_url):
    urllib.request.urlretrieve(download_url, 'audio.mp3')
    sound = AudioSegment.from_mp3("audio.mp3")
    sound.export("audio.wav", format="wav")

Step 4: Using this together

Now that we have our file download/conversion function and Azure text to speech, we will need to implement it. Typically I’ll use Selenium for this work. Here is an example of it in action.

from import By
from import WebDriverWait
from import expected_conditions as EC
from selenium import webdriver
from selenium.webdriver.common.action_chains import ActionChains
from time import sleep

chromeOpts = webdriver.ChromeOptions()
chromeOpts.add_experimental_option("excludeSwitches", ["enable-automation"])

cd_path = 'driver/chromedriver.exe'

driver = webdriver.Chrome(executable_path=cd_path, options=chromeOpts)
iframes = driver.find_elements_by_tag_name("iframe")
action = ActionChains(driver)
action.move_by_offset(uniform(0.5, 8.1), uniform(0.5, 8.1))
sleep(uniform(3.7, 6.1))
sleep(uniform(3.4, 6.9))
sleep(uniform(4.1, 6.7))

sleep(uniform(6.7, 9.1))
if driver.find_element_by_xpath('//*[@id="recaptcha-audio-button"]'):
    wait = WebDriverWait(driver, 10)
    element = wait.until(EC.element_to_be_clickable((By.ID, 'recaptcha-audio-button')))
	action = ActionChains(driver)
	action.move_by_offset(uniform(0.5, 2.1), uniform(0.5, 2.1))

	sleep(uniform(1.00, 1.99))

		noaudio = None
		noaudio = None

	download_url = driver.find_element_by_xpath(
	urllib.request.urlretrieve(download_url, dir_path + 'audio.mp3')
	sleep(uniform(1.00, 2.99))

	sound = convert_audio(download_url)

	sleep(uniform(1.00, 1.99))
	result = azure("audio.wav")
	sleep(uniform(1.00, 2.99))

	action = ActionChains(driver)
	action.pause(uniform(0.102, 0.351))
	action.move_by_offset(uniform(0.5, 2.1), uniform(0.5, 2.1))
	action.pause(uniform(0.102, 0.351))
	action.pause(uniform(1.7, 3.5))
	for character in result:
		action.pause(uniform(0.103, 0.401))

	sleep(uniform(0.5, 1.1))

	action = ActionChains(driver)
	action.pause(uniform(0.102, 0.351))
	action.move_by_offset(uniform(0.5, 2.1), uniform(0.5, 2.1))
	action.pause(uniform(0.101, 0.351))

	print("--Captcha good")

You can find the full script at BypassRecaptcha Github. Feel free to share this post or comment if you have any questions!

Leave a Reply

Your email address will not be published. Required fields are marked *