Recently I decided to get some video cameras for my bike. These cameras, filming both the front and rear of my bike, capture footage in 5 minute long files.

Though there is an app that comes with my cameras, and they are both made by the same manufacturer, there is no way to easily combine a long ride into a single video for easy (or uneasy) viewing. I set about creating a workflow to make a complete video of my rides. My desired end goal looks something like this (the image layout, not the impending sense of doom):

Point of view image of a cyclist being side-swiped with a rear-view inset image.

Another beautiful day by the beach.

Tools and Requirements

ffmpeg is the go-to for many command-line approaches to processing a variety of moving images, and this use-case is not particularly special. However I had not used ffmpeg before, let along constructed a semi-complicated workflow. I began first with almost no knowledge of ffmpeg, so didn’t know what requirements would be reasonable.

After a little testing and playing around, I developed the following list:

  1. Picture in picture; larger view of the main camera, with and inset view of the rear camera.
  2. The inset image should have a frame, to subtly visually separate the two images.
  3. The images had to be exactly aligned in time (more on this later).
  4. The inset image should look similar to an Australian ‘rear view mirror’ perspective. I tried other approaches but they made less visual sense to me than looking up and to the left to see a horizontally flipped image of the rear view.
  5. Make the dull images from my bike cameras ‘pop’ a bit more.

File Combination

The files from the microSD cards come across as chunks of MP4 files of 5 minute duration.

Sample listing of files created by front and rear cameras.

File listing.

There are a few ways to do this, but the quickest result from a web search was what I ended up using. This is not that interesting:

# front
ls front/*.MP4 | awk '{print "file " "'\''" $0 "'\''"}' | tee front-files.txt
ffmpeg -f concat -i front-files.txt -c copy -flags +global_header -c:a aac output/front.mp4
rm front-files.txt

# back
ls back/*.MP4 | awk '{print "file " "'\''" $0 "'\''"}' | tee back-files.txt
ffmpeg -f concat -i back-files.txt -c copy -flags +global_header -c:a aac output/back.mp4
rm back-files.txt

Complex filter

I had not come across the filter_complex argument in ffmpeg before, but it’s neat and easy to use. The -i (input) arguments you give to ffmpeg correspond respectively to [0], [1], and so on. You can then process, mix and mash, like this:

ffmpeg -i A.mp4 -i B.mp4 \
-filter_complex "[1]scale=iw/4:ih/4 [m]; [m]hflip[pre]; \
[pre]pad=w=10+iw:h=10+ih:x=512:y=512:color=black[pip]; \
[0][pip] overlay=main_w/20:main_h/20" \
-vcodec h264 -c:a aac pip.mp4

The user is able to use all of the various filter options in ffmpeg, and chain them together to create what can be a complex manipulation process.

Synchronisation

The cameras both start recording as soon as they are switched on. I often switch them on around the same time, but they have differing and variable start-up times, so the footage is all but guaranteed to be out of sync. When you’re trying to present a realistic ‘rear view’ mirror experience, even one tenth of a second in error can be obvious to the viewer in some circumstances. I was able to hand-tune a few videos, but this was a laborious process requiring me to change, validate, and re-encode each time I wanted to fine-tune the delay. I wanted a robust algorithmic solution.

I had some vague ideas of writing an auto-correlation function to line up the audio, but thankfully people who know signal processing better than me over at the BBC have written a tool called audio-offset-finder that fits the use-case perfectly.

After a few trials and low certainty results, I found that clapping a few times at the start of the recording, and running the tool over only these first few moments, seems to work to sufficiently align the two video files.

By running the following command, the tool will even create a chart of possible delays and its most confident match.

~/audio-offset-finder/bin/audio-offset-finder \
--find-offset-of ~/video_process/front_test.mp4 \
--within ~/video_process/back_test.mp4 \
--save-plot ~/video_process/offset.png

A delay of 3.312 seconds

I then run a command to create a back2.mp4 from back.mp4 which is a slightly cut version, to be merged with front.mp4

PTS-STARTPTS

Sometimes the inset picture will not start for a few frames. After some reading about Presentation Time Stamps (PTS) I found that the rear video had a non-zero starting point, which was somehow giving it a delay of ~18 frames. By inserting the setpts=PTS-STARTPTS line across both video inputs, I was able to have picture-in-picture results from the first frame.

Final Product

The command to bring together the combined and then time-aligned files is the following:

ffmpeg -i output/front.mp4 -i output/back2.mp4 \
-filter_complex "[0]setpts=PTS-STARTPTS,eq=saturation=1.3:gamma=1.05[main]; \
[1]setpts=PTS-STARTPTS,scale=iw/4:ih/4 [m]; [m]hflip[pre]; \
[pre]pad=w=10+iw:h=10+ih:x=512:y=512:color=black[pip]; \
[main][pip] overlay=main_w/20:main_h/20" \
-vcodec h264 -c:a aac pip.mp4

A then you can make fun gifs of people doing things like this:

Point of view image of a cyclist again almost being side-swiped, with a rear-view inset image.

Please do not do this.

Bonus Features

Watermark

If you wanted to turn your footage into a branded experience, you might run something like

ffmpeg -i output/front_test.mp4 -i output/back_test.mp4 \
-i watermark.png \
-filter_complex "[0]setpts=PTS-STARTPTS,eq=saturation=1.3:gamma=1.05[main];
[1]setpts=PTS-STARTPTS,scale=iw/4:ih/4 [m];
[m]hflip[pre]; [pre]pad=w=10+iw:h=10+ih:x=512:y=512:color=black[pip];
[main][pip] overlay=main_w/20:main_h/20[vid];
[2]scale=128:-1 [ovrl];
[vid][ovrl] overlay=main_w-overlay_w-100:main_h-overlay_h-50" -vcodec h264 -c:a aac pip_watermark.mp4

Sharpening

I tried the following command, but didn’t much like the results, and didn’t find that it improved licence plate readability, which was my main reason for investigating it.

ffmpeg -i output/front.mp4 -i output/back2.mp4 \
-filter_complex "[0]eq=saturation=1.3:gamma=1.05[main-presharpen]; \
[main-presharpen]unsharp=3:3:1.5[main]; [1]unsharp=3:3:1.5[pip-sharpened]; \
[pip-sharpened]scale=iw/4:ih/4 [m]; [m]hflip[pre]; \
[pre]pad=w=10+iw:h=10+ih:x=512:y=512:color=black[pip]; \
[main][pip] overlay=main_w/20:main_h/20" \
-vcodec h264 -c:a aac pip_sharpened.mp4

References

https://www.oodlestechnologies.com/blogs/PICTURE-IN-PICTURE-effect-using-FFMPEG/