'Ffmpeg - How can I create HLS multiple language streams, in multiple qualities?

Preface

I'm working on converting videos from 4k to multiple qualities with multiple languages but am having issues with the multiple languages overlaying, sometimes losing quality and sometimes being out of sync. (this is less of a problem in the German audio, as this is voice over anyhow)

We as a team are complete noobs in terms of Video / Audio + HLS -- I'm a front end developer who has no experience of this so apologies if my question is poorly phrased


Videos

I have the video in a 4k format and have removed the original sound as I have English and German audio files that need to be overlayed. I am then taking these files and throwing them together into a .ts file like this:

$ ffmpeg -i ep03-ns-4k.mp4 -i nkit-ep3-de-output.m4a -i nkit-ep3-en-output.m4a \
> -thread 0 -muxdelay 0 -y \
> -map 0:v -map 1 -map 2  -movflags +faststart -refs 1 \
> -vcodec libx264 -acodec aac -profile:v baseline -level 30 -ar 44100 -ab 64k -f mpegts out.ts 

This outputs a 4k out.ts video, with both audio tracks playing.

The hard part

This is where I'm finding it tricky, I now need to convert this single file into multiple quality levels (480, 720, 1080, 1920) and I attempt this with the following command:

ffmpeg -hide_banner -y -i out.ts \
-crf 20 -sc_threshold 0 -g 48 -keyint_min 48 -ar 48000 \
-map 0:v:0 -map 0:v:0 -map 0:v:0 -map 0:v:0 \
-c:v:0 h264 -profile:v:0 main -filter:v:0 "scale=w=848:h=480:force_original_aspect_ratio=decrease" -b:v:0 1400k -maxrate:v:0 1498k -bufsize:v:0 2100k \
-c:v:1 h264 -profile:v:1 main -filter:v:1 "scale=w=1280:h=720:force_original_aspect_ratio=decrease" -b:v:1 2800k -maxrate:v:1 2996k -bufsize:v:1 4200k \
-c:v:2 h264 -profile:v:2 main -filter:v:2 "scale=w=1920:h=1080:force_original_aspect_ratio=decrease" -b:v:2 5600k -maxrate:v:2 5992k -bufsize:v:2 8400k \
-c:v:3 h264 -profile:v:3 main -filter:v:3 "scale=w=3840:h=1920:force_original_aspect_ratio=decrease" -b:v:3 11200k -maxrate:v:3 11984k -bufsize:v:3 16800k \
-var_stream_map "v:0 v:1 v:2 v:3" \
-master_pl_name master.m3u8 \
-f hls -hls_time 4 -hls_playlist_type vod -hls_list_size 0 \
-hls_segment_filename "%v/episode-%03d.ts" "%v/episode.m3u8"

This creates the required qualities, but I'm now at a loss of how this might work with the audio

Audio

For the audio I run this command:

ffmpeg -i out.ts -threads 0 -muxdelay 0 -y -map 0:a:0 -codec copy -f segment -segment_time 4 -segment_list_size 0 -segment_list audio-de/audio-de.m3u8 -segment_format mpegts audio-de/audio-de_%d.aac
ffmpeg -i out.ts -threads 0 -muxdelay 0 -y -map 0:a:1 -codec copy -f segment -segment_time 4 -segment_list_size 0 -segment_list audio-en/audio-en.m3u8 -segment_format mpegts audio-en/audio-en_%d.aac

This creates the required audio segments.

The question

I realise this is quite an ask, but is there anything wrong with our inputs? Is there a way that this can be done a bit more streamlined?

Any answers are greatly appreciated.



Solution 1:[1]

Lets say you have:

VideoA

AudioB-> Language 1

AudioC-> Language 2

AudioD-> Language 3

Although it can be done all together, it is better to use different commands for each language instance.

Note that the following are schematics only- some values and parameters will need to be filled in by you. However, this provides a scheme of how to connect the entities. Also I have simply set the size, and NOT used a scale filter. You can use a scale filter instead. Filters will go in place of the size parameter (-s 1280x720 etc).

ffmpeg -i VideoA -i AudioB -map [0:v] -map [1:a] -s 1280x720 -acodec aac -b:a 128k \
-vcodec libx264 -pix_fmt yuv420p [your other parameters go here] -movflags +faststart \
OutputAB_720p.mp4 -map [0:v] -map [1:a] -s 1920x1080 -acodec aac -b:a 128k -vcodec \
libx264 -pix_fmt yuv420p  [your other parameters go here] -movflags +faststart \
OutputAB_1080p.mp4

The above shows a scheme for 2 resolutions, 720p and 1080p, merging VideoA with AudioB. To do the same scheme for AudioC you would repeat:

ffmpeg -i VideoA -i AudioC -map [0:v] -map [1:a] -s 1280x720 -acodec aac -b:a 128k \
-vcodec libx264 -pix_fmt yuv420p [your other parameters go here] -movflags +faststart \
OutputAC_720p.mp4 -map [0:v] -map [1:a] -s 1920x1080 -acodec aac -b:a 128k -vcodec \
libx264 -pix_fmt yuv420p  [your other parameters go here] -movflags +faststart \
OutputAC_1080p.mp4

You could put all the inputs together:

ffmpeg -i VideoA -i AudioB -i AudioC -i AudioD

and accordingly map each for every language:

-map [0:v] -map [1:a]
-map [0:v] -map [2:a]
-map [0:v] -map [3:a]
etc.

But I feel such long commands that will result make it difficult to read, maintain and correct.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Rajib