Video Background Music Generation
with Controllable Music Transformer

Shangzhe Di1*
Zeren Jiang1*
Si Liu1
Zhaokai Wang1
Leyan Zhu1
Zexin He1
Hongming Liu2
Shuicheng Yan3
1Beihang University
2Charterhouse School
3Sea AI Lab

*Equal Contribution

Demo Videos

The background music is generated by our method. Please note the rhythmic relations between videos and music.

For a carefully edited video [original]

For a raw video shot with iPhone XR

For an animation video [original]

Our story vlog


In this work, we address the task of video background music generation. Some previous works achieve effective music generation but are unable to generate melodious music specifically for a given video, and none of them considers the video-music rhythmic consistency. To generate the background music that well matches the given video, we first establish the rhythmic relationships between video and background music. In particular, we connect timing, motion speed, and motion saliency from video with beat, simu-note density, and simu-note strength from music, respectively. We then propose CMT, a Controllable Music Transformer that enables the local control of the aforementioned rhythmic features, as well as the global control of the music genre and the used instrument specified by users. Objective and subjective evaluations show that the generated background music has achieved satisfactory compatibility with the input videos, and at the same time, impressive music quality.


Shangzhe* Di, Zeren Jiang*, Si Liu, Zhaokai Wang, Leyan Zhu, Zexin He, Hongming Liu, Shuicheng Yan
ACM Multimedia, 2021