My Bike and Multi-camera Post-ride Processing Workflow
Update: I have now published a GitHub gist that automates the video processing described below, and can run against my video files immiediately after each ride to create a single, time-synchronised video output.
Recently I decided to get some video cameras for my bike. These cameras, filming both the front and rear of my bike, capture footage in 5 minute long files.
Though there is an app that comes with my cameras, and they are both made by the same manufacturer, there is no way to easily combine a long ride into a single video for easy (or uneasy) viewing. I set about creating a workflow to make a complete video of my rides. My desired end goal looks something like this (the image layout, not the impending sense of doom):
Tools and Requirements
ffmpeg
is the go-to for many command-line approaches to processing a variety of moving images, and this use-case is not particularly special. However I had not used ffmpeg
before, let along constructed a semi-complicated workflow. I began first with almost no knowledge of ffmpeg
, so didn’t know what requirements would be reasonable.
After a little testing and playing around, I developed the following list:
- Picture in picture; larger view of the main camera, with and inset view of the rear camera.
- The inset image should have a frame, to subtly visually separate the two images.
- The images had to be exactly aligned in time (more on this later).
- The inset image should look similar to an Australian ‘rear view mirror’ perspective. I tried other approaches but they made less visual sense to me than looking up and to the left to see a horizontally flipped image of the rear view.
- Make the dull images from my bike cameras ‘pop’ a bit more.
File Combination
The files from the microSD cards come across as chunks of MP4 files of 5 minute duration.
There are a few ways to do this, but the quickest result from a web search was what I ended up using. This is not that interesting:
# front
ls front/*.MP4 | awk '{print "file " "'\''" $0 "'\''"}' | tee front-files.txt
ffmpeg -f concat -i front-files.txt -c copy -flags +global_header -c:a aac output/front.mp4
rm front-files.txt
# back
ls back/*.MP4 | awk '{print "file " "'\''" $0 "'\''"}' | tee back-files.txt
ffmpeg -f concat -i back-files.txt -c copy -flags +global_header -c:a aac output/back.mp4
rm back-files.txt
Complex filter
I had not come across the filter_complex
argument in ffmpeg
before, but it’s neat and easy to use. The -i
(input) arguments you give to ffmpeg
correspond respectively to [0]
, [1]
, and so on. You can then process, mix and mash, like this:
ffmpeg -i A.mp4 -i B.mp4 \
-filter_complex "[1]scale=iw/4:ih/4 [m]; [m]hflip[pre]; \
[pre]pad=w=10+iw:h=10+ih:x=512:y=512:color=black[pip]; \
[0][pip] overlay=main_w/20:main_h/20" \
-vcodec h264 -c:a aac pip.mp4
The user is able to use all of the various filter options in ffmpeg
, and chain them together to create what can be a complex manipulation process.
Synchronisation
The cameras both start recording as soon as they are switched on. I often switch them on around the same time, but they have differing and variable start-up times, so the footage is all but guaranteed to be out of sync. When you’re trying to present a realistic ‘rear view’ mirror experience, even one tenth of a second in error can be obvious to the viewer in some circumstances. I was able to hand-tune a few videos, but this was a laborious process requiring me to change, validate, and re-encode each time I wanted to fine-tune the delay. I wanted a robust algorithmic solution.
I had some vague ideas of writing an auto-correlation function to line up the audio, but thankfully people who know signal processing better than me over at the BBC have written a tool called audio-offset-finder that fits the use-case perfectly.
After a few trials and low certainty results, I found that clapping a few times at the start of the recording, and running the tool over only these first few moments, seems to work to sufficiently align the two video files.
By running the following command, the tool will even create a chart of possible delays and its most confident match.
~/audio-offset-finder/bin/audio-offset-finder \
--find-offset-of ~/video_process/front_test.mp4 \
--within ~/video_process/back_test.mp4 \
--save-plot ~/video_process/offset.png
I then run a command to create a back2.mp4
from back.mp4
which is a slightly cut version, to be merged with front.mp4
PTS-STARTPTS
Sometimes the inset picture will not start for a few frames. After some reading about Presentation Time Stamps (PTS) I found that the rear video had a non-zero starting point, which was somehow giving it a delay of ~18 frames. By inserting the setpts=PTS-STARTPTS
line across both video inputs, I was able to have picture-in-picture results from the first frame.
Final Product
The command to bring together the combined and then time-aligned files is the following:
ffmpeg -i output/front.mp4 -i output/back2.mp4 \
-filter_complex "[0]setpts=PTS-STARTPTS,eq=saturation=1.3:gamma=1.05[main]; \
[1]setpts=PTS-STARTPTS,scale=iw/4:ih/4 [m]; [m]hflip[pre]; \
[pre]pad=w=10+iw:h=10+ih:x=512:y=512:color=black[pip]; \
[main][pip] overlay=main_w/20:main_h/20" \
-vcodec h264 -c:a aac pip.mp4
And then you can make fun gifs of people doing things like this:
Bonus Features
Watermark
If you wanted to turn your footage into a branded experience, you might run something like
ffmpeg -i output/front_test.mp4 -i output/back_test.mp4 \
-i watermark.png \
-filter_complex "[0]setpts=PTS-STARTPTS,eq=saturation=1.3:gamma=1.05[main];
[1]setpts=PTS-STARTPTS,scale=iw/4:ih/4 [m];
[m]hflip[pre]; [pre]pad=w=10+iw:h=10+ih:x=512:y=512:color=black[pip];
[main][pip] overlay=main_w/20:main_h/20[vid];
[2]scale=128:-1 [ovrl];
[vid][ovrl] overlay=main_w-overlay_w-100:main_h-overlay_h-50" -vcodec h264 -c:a aac pip_watermark.mp4
Sharpening
I tried the following command, but didn’t much like the results, and didn’t find that it improved licence plate readability, which was my main reason for investigating it.
ffmpeg -i output/front.mp4 -i output/back2.mp4 \
-filter_complex "[0]eq=saturation=1.3:gamma=1.05[main-presharpen]; \
[main-presharpen]unsharp=3:3:1.5[main]; [1]unsharp=3:3:1.5[pip-sharpened]; \
[pip-sharpened]scale=iw/4:ih/4 [m]; [m]hflip[pre]; \
[pre]pad=w=10+iw:h=10+ih:x=512:y=512:color=black[pip]; \
[main][pip] overlay=main_w/20:main_h/20" \
-vcodec h264 -c:a aac pip_sharpened.mp4
References
https://www.oodlestechnologies.com/blogs/PICTURE-IN-PICTURE-effect-using-FFMPEG/