Skip to main content
Francisco G.

Stitching together videos and images with FFmpeg

I wanted do a quick write-up about something I figured out with code.

Back in early January I was trying to create one of those year-end highlight reels you see on social media that quickly goes through a bunch of photos and videos. Google Photos has a feature where you can combine photos and videos into a "movie" but it has a limit of 50 items, I had 124 items 😬. I knew there was a way to do this with FFmpeg but I had never tried it before. So I took a crack at it and it wasn't that hard in the end.

The data source:

Desired outcome:

I'm going to start by showing you an example of what the CLI command looks like and then I'll break down and explain each part.

In this command we are combining 2 videos and 2 images into 1 video.

ffmpeg \
-loop 1 -t 1 -i photo1.jpg \
-t 5 -i video1.mp4 \
-loop 1 -t 1 -i photo2.jpg \
-t 5 -i video2.mp4 \
-f lavfi -t 1 -i anullsrc \
-filter_complex "\
[0:v]scale=1080:-1,setsar=1,pad=1080:1920:0:(oh-ih)/2[v0];\
[1:v]scale=1080:-1,setsar=1,pad=1080:1920:0:(oh-ih)/2[v1];\
[2:v]scale=1080:-1,setsar=1,pad=1080:1920:0:(oh-ih)/2[v2];\
[3:v]scale=1080:-1,setsar=1,pad=1080:1920:0:(oh-ih)/2[v3];\
[v0][4:a][v1][1:a][v2][4:a][v3][3:a]concat=n=4:v=1:a=1[v][a]" \
-map "[v]" -map "[a]" "output.mp4"

Lets break this down.

Part 1: The Inputs #

-loop 1 -t 1 -i photo1.jpg \
-t 5 -i video1.mp4 \
-loop 1 -t 1 -i photo2.jpg \
-t 5 -i video2.mp4 \
-f lavfi -t 1 -i anullsrc \

This is one is a little different but it's important.

-f lavfi -t 1 -i anullsrc

Part 2: The -filter_complex #

You'll see this repeated a few times:

[0:v]scale=1080:-1,setsar=1,pad=1080:1920:0:(oh-ih)/2[v0];

Part 3: The concat filter #

[v0][4:a][v1][1:a][v2][4:a][v3][3:a]

This pairs the labeled video inputs with an audio. You'll notice that [4:a] is used twice. This is a reference to the silent audio track paired with an image input. Remember that we have 5 inputs, 4 files and the 5th one being the generated silent audio. [4:a] refers to that 5th audio input in our 0-indexed inputs array.

concat=n=4:v=1:a=1[v][a]

This is what concatenates multiple input streams.

Part 4: Map to output #

-map "[v]" -map "[a]" "output.mp4"

Automate #

As you can see, the command for just 4 files is not very small. I had 124 files to handle. I wrote a Bash script to help me with this:

#!/bin/bash
# Usage: ./concat.sh <folder> <output_file>
# Example: ./concat.sh /home/user/videos output.mp4

if [ "$#" -ne 2 ]; then
    echo "Usage: $0 <folder> <output_file>"
    exit 1
fi

folder=$1
output_file=$2

inputs=""
filter_complex=""
concat_filter=""
index=0

jpegCount=$(find "$folder" -type f -name "*.jpg" | wc -l)
mp4Count=$(find "$folder" -type f -name "*.mp4" | wc -l)

# count all the files because this will be used as the index for the silent audio
totalFiles=$((jpegCount + mp4Count))
echo "Total files: $totalFiles"

for file in "$folder"/*; do
    if [[ $file == *.jpg ]]; then
        inputs+="-loop 1 -t 1 -i $file "
        concat_filter+="[v$index][$totalFiles:a]" # [$totalFiles:a] is the silent audio
    elif [[ $file == *.mp4 ]]; then
        inputs+="-t 5 -i $file "
        concat_filter+="[v$index][$index:a]"
    fi
    filter_complex+="[$index:v]scale=1080:-1,setsar=1,pad=1080:1920:0:(oh-ih)/2[v$index];"
    index=$((index + 1))
done

# Add silent audio for the images
inputs+="-f lavfi -t 1 -i anullsrc=channel_layout=stereo:sample_rate=44100 "

# Concatenate all video streams
concat_filter+="concat=n=$index:v=1:a=1[v][a]"

ffmpeg $inputs -filter_complex "$filter_complex $concat_filter" -map "[v]" -map "[a]" "$output_file"

That's it! Hope you found this useful and hope I didn't get anything too glaringly wrong. This script worked perfectly for me, but I'm sure there's probably even better and more concise ways of doing it.