'PyAudio Recording and Playing Back in Real Time

I am trying to record audio from the microphone and then play that audio through the speakers. Eventually I want to modify the audio before playing it back, but I'm having trouble taking the input data and successfully play it back through the speakers.

The format for the input stream I'm using is Int16 and for the output stream is Float32. These were the only ones which made any sound at all (albeit a demonic one).

First I tried simply putting the input data into the output stream. This outputs a demonic sound:

import pyaudio
import numpy as np
import struct

FORMATIN = pyaudio.paInt16
FORMATOUT = pyaudio.paFloat32
CHANNELS = 1
RATE = 44100
CHUNK = 1024


audio = pyaudio.PyAudio()

# start Recording
streamIn = audio.open(format=FORMATIN, channels=CHANNELS,
                      rate=RATE, input=True, input_device_index=0,
                      frames_per_buffer=CHUNK)
streamOut = audio.open(format=FORMATOUT, channels=CHANNELS,
                       rate=RATE, output=True, input_device_index=0,
                       frames_per_buffer=CHUNK)
print("recording...")


while True:
    in_data = streamIn.read(CHUNK)
    streamOut.write(in_data)

in_data is as follows when printed:

1\x00\x12\x00\x0f\x00\x05\x00\x14\x00\x1e\x00\x16\x00\x14\x00\x12\x00\x10\x00\x02\x00\xf7\xff\xf7\xff\xd4\xff\xde\xff\xf8\xff\xd3\xff\xe9\xff\x14\x00@\x00Z\x00\xb9\xfft\xff\xce\x00\x93\x01\xc2\xff\xe4\xfe\x93\x00d\x00\xca\xff\x94\x01V\x01\xc8\xffS\x00t\x00\xc4\xffi\x00\xaf\x01l\x00\xdb\xfeM\xffw\xffp\x01\xf5\xffr\xfc\x97\x00~\x02S\x00\x97\x00v\x00\x87\xfe\xb7\xfc\x81\xff\xf6\x00\xef\x00\xc4\x03\x84\x02\x99\xfd`\xfc\xe2\x01b\x03\xda\xfe\xc4\xff\xfd\x00:\x00\xc6\x00\xf1\xfcV\xfd\xf0\x02\xdc\xff&\xff\xa1\x02\xc7\xff\xf5\xfe\xa9\xfe\x99\xfa\x06\xfdo\x04\xaa\x02\x8f\xfe\xec\x00\x1b\xffZ\xfe;\x01t\xfe<\xffd\x02<\x02\x04\x02\xcd\xfd\xe8\xfd\xf3\x00i\xfcD\xfa\x86\xfe\xb3\x01\xea\x00$\x00q\x00\x03\x022\x00d\xf9\x14\xfa\x86\xfdQ\xfd\xc5\xfe\x81\x02\xc2\x02=\x01\xfc\x00\xe5\xfd\t\xff\x93\xff\x83\xffd\x00(\xfeQ\xffM\x01\xb1\x01\xde\xfdE\xfd\xfe\xff\x00\x00\x06\x00\x02\xffV\xff\xcd\xffJ\xff\xfb\xfc\x86\xfd^\x00\x8d\x00\x91\xff\xb6\xfe\xf7\x00\x95\x01E\x00\x1b\xff9\xfe8\xff\xa7\xff\xd4\xff\xdd\xff\xb0\x00\x97\x01\xe8\x00\xa7\xff\xd8\xfe\x89\xff\x0c\x00\x81\xff\x81\xfe\xd1\xfeN\x00\x1a\x01\xcb\x00\x19\x00\x90\x00`\x00\x93\xff5\xff\x9b\xff\\\x00\x08\x00\xc0\xff,\x00\xc0\x00\xba\x00\x83\x00\x0f\x00\xf5\xffY\x00\x19\

Then I tried changing in_data to Float32, but that did not work either:

in_data = np.frombuffer(in_data, np.float32))

I tried various clipping and packing of the data, none of which worked:

in_data = np.clip(in_data, -2**15+1, 2**15-1)
in_data = struct.pack('d' * 1024, *in_data)

Does anyone know how to record audio from the microphone and then output it through speakers? Thank you.



Solution 1:[1]

Set FORMATOUT =FORMATIN.


Currently, your code does the following:

  • 44100 times per second, a frame is recorded
  • each frame is a 16 bit signed number (16 bit LPCM). It takes 2 bytes to encode a frame. This is the FORMATIN = pyaudio.paInt16 setting you chose.
  • when 1024 frames have been recorded (this takes ca 23 ms), these are returned as a bytes-object in python. It consists of 2048 bytes. you call this variable in_data
  • then, you pass these 2048 bytes to the output device via the .write call.
  • the output device works in pyaudio.paFloat32, which means that it believes each frame is 32 bits (4 bytes). It concludes that you have provided 2048/4=512 frames for it to play back. the output unit is set to 44100Hz as well, so it takes ca 12 ms to play back. the values it plays back are a mess, since it tries to interpret integers as floats. both the bitrate and the encoding mismatches, and the sound in your speaker seems to be from tormented souls in the purgatory.
  • then the whole process repeats

matching the input and output format should resolve these issues.

Solution 2:[2]

Audio data with 16-bit signed integer format will have values between 32768 and -32767. Data with float (32 bit or 64 bit) will be in range 1.0 to -1.0.

I would recommend doing all processing in floating point in Python. So try to do in_data = (in_data / 32768), before processing or sending to the output.

Solution 3:[3]

if you useing linux you can put os.system("pactl load-module module-loopback latency_msec=1")

at the beginning of the script and os.system("/usr/bin/pulseaudio --kill") at the end

pls tell me if it work now

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 LudvigH
Solution 2 Jon Nordby
Solution 3 eyal