'Can gRPC be used for audio-video streaming over Internet?

I understand in a client-server model gRPC can do a bidirectional streaming of data. I have not tried yet, but want to know will it be possible to stream audio and video data from a source to cloud server using gRPC and then broadcast to multiple client, all in real time ?



Solution 1:[1]

TLDR: I would not recommend video over gRPC. Primarily because it wasn't designed for it, so doing it would take a lot of hacking. You should probably take a look at WebRTC + a specific video codec.

More information below:

gRPC has no video compression

  • When sending video, we want to send things efficiently because sending it raw could require 1GB/s connectivity.
  • So we use video compression / video encoding. For example, H.264, VP8 or AV1.
  • Understand how video compression works (eg saving bandwidth by minimising similar data shared between frames in a video)
  • There is no video encoder for protobufs (the format used by gRPC).
  • You could then try image compression and save the images in a bytes field (e.g. bytes image_frame = 1;, but this is less efficient and definitely takes up unnecessary space for videos.

It's probably possible to encode frames into protobufs using a video encoder (e.g. H.264) and then decode them to play in applications. However, it might take a lot of hacking/engineering effort. This use case is not what gRPC/protobufs is designed for and not commonly done. Let me know if you hack something together, I would be curious.

gRPC is reliable

  • gRPC uses TCP (not UDP), which is reliable.
  • At a glance, reliability might be handy, to avoid corrupting data or lost data. However, depending on the use case (realtime video or audio calls), we may prefer to skip frames if they are dropped or delayed. The losses may be unnoticeable or painless to the user.
    • If the packet is delayed, it will wait for the packet before playing the rest. (aka. out of order delivery)
    • If the packet is dropped, it will resend it (aka. packet loss)
  • Therefore, video conferencing apps usually use WebRTC/RTP (configured to be unreliable)
  • Having mentioned that, looks like Zoom was able to implement Video-over-WebSockets, which is also a reliable transport (over TCP). So it's not "game-over", just highly not recommended and a lot more effort. They have moved over to WebRTC though.

Data received on the WebSockets goes into a WebAssembly (WASM) based decoder. Audio is fed to an AudioWorklet in browsers that support that. From there the decoded audio is played using the WebAudio “magic” destination node.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1