'How to use twilio bi-directional stream feature to play raw audio data

I'm using Twilio Programmable Voice to process phone calls.

I want to use bi-directional stream feature to send some raw audio data to play by twilio, the initialization code looks like,

from twilio.twiml.voice_response import Connect, VoiceResponse, Stream

response = VoiceResponse()
connect = Connect()
connect.stream(url='wss://mystream.ngrok.io/audiostream')
response.append(connect)

Then when got wss connection from twilio, I start to send raw audio data to twilio, like this

    async def send_raw_audio(self, ws, stream_sid):
        print('send raw audio')
        import base64
        import json
        with open('test.wav', 'rb') as wav:
            while True:
                frame_data = wav.read(1024)
                if len(frame_data) == 0:
                    print('no more data')
                    break
                base64_data = base64.b64encode(frame_data).decode('utf-8')
                print('send base64 data')
                media_data = {
                    "event": "media",
                    "streamSid": stream_sid,
                    "media": {
                        "playload": base64_data
                    }
                }
                media = json.dumps(media_data)
                print(f"media: {media}")
                await ws.send(media)
            print('finished sending')

test.wav is a wav file encoded audio/x-mulaw with a sample rate of 8000.

But when run, I can't hear anything, and on twilio console, it said

31951 - Stream - Protocol - Invalid Message
Possible Causes
 - Message does not have JSON format
 - Unknown message type
 - Missing or extra field in message
 - Wrong Stream SID used in message

I have no idea which part is wrong. Does anyone know what's my problem? I can't find an example about this scenario, just follow instructions here, really appreciate it if someone knows there is an example about this, thanks.



Solution 1:[1]

Not sure if this will fix it but I use .decode("ascii"), not "utf-8"

Solution 2:[2]

Question is probably not relevant anymore, but I came across this while debugging my bi-directional stream, so it might be useful for someone:

  1. Main reason why were you receiving this error because of the typo in json content. You are sending "playload" instead of "payload".
  2. Another issue when sending data to twilio stream is that you should send mark message at the end of data stream to notify twilio that complete payload was sent. https://www.twilio.com/docs/voice/twiml/stream#message-mark-to-twilio
  3. When sending data back to twilio stream, be aware that payload should not contain audio file type header bytes, so make sure you remove them from your recording or alternatively skip them while sending data to twilio.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Albert Ko
Solution 2 aron23