Speech-to-Text

OCI Speech Real-time Capture

Posted on

Capturing Speech-to-Text is a straight forward step. I’ve written previously about this, giving an example. But what if you want the code to constantly monitor for text input, giving a continuous. For this we need to use the asyncio python library. Using the OCI Speech-to-Text API in combination with asyncio we can monitor a microphone (speech input) on a continuous basis.

There are a few additional configuration settings needed, including configuring a speech-to-text listener. Here is an example of what is needed

lass MyListener(RealtimeSpeechClientListener):
    def on_result(self, result):
        if result["transcriptions"][0]["isFinal"]:
            print(f"1-Received final results: {transcription}")
        else:
            print(f"2-{result['transcriptions'][0]['transcription']} \n")

    def on_ack_message(self, ackmessage):
        return super().on_ack_message(ackmessage)

    def on_connect(self):
        return super().on_connect()

    def on_connect_message(self, connectmessage):
        return super().on_connect_message(connectmessage)

    def on_network_event(self, ackmessage):
        return super().on_network_event(ackmessage)

    def on_error(self, error_message):
        return super().on_error(error_message)

    def on_close(self, error_code, error_message):
        print(f'\nOCI connection closing.')

async def start_realtime_session(customizations=[], compartment_id=None, region=None):
    rt_client = RealtimeSpeechClient(
        config=config,
        realtime_speech_parameters=realtime_speech_parameters,
        listener=MyListener(),
        service_endpoint=realtime_speech_url,
        signer=None,   #authenticator(),
        compartment_id=compartment_id,
    )
    asyncio.create_task(send_audio(rt_client))

if __name__ == "__main__":
    asyncio.run(
        start_realtime_session(
            customizations=customization_ids,
            compartment_id=COMPARTMENT_ID,
            region=REGION_ID,
        )
    )

Additional customizations can be added to the Listener, for example, what to do with the Audio captured, what to do with the text, how to mange the speech-to-text (there are lots of customizations)