WuYun: Skeleton-Guided Melody Generation with Long-Term Structure

Abstract

Deep learning has significantly advanced melody generation, but generating structured melodies remains an important challenge. Previous works often use either single-stage models, which directly map inputs to outputs, or two-stage frameworks that incorporate structural similarities, such as repetition and variation, for melody generation. However, both approaches ignore the explicit differentiation of musical events' structural importance when modeling melodic hierarchical structure. In this paper, we introduce WuYun, a novel two-stage Transformer-based melody generation framework guided by melodic skeletons. The framework first constructs the melodic skeleton by generating structurally important notes and subsequently realizes melodic prolongation by infilling the skeleton with decorative notes, facilitating complete melody creation. Specifically, we propose a knowledge-based and data-driven method to effectively identify and extract notes of significant structural importance from three aspects: meter, rhythm, and harmony, thus forming the melodic skeleton. The extracted skeletons serve a dual purpose: they are used to train an autoregressive model for generating new melodic skeletons and to provide structural guidance for melody generation. Both subjective and objective results demonstrate that WuYun generates melodies with improved long-term structure and musicality, outperforming other state-of-the-art methods.


WuYun Overview

WuYun Samples

Generated Melody with Chord Generated Melody w/o Chord
midi midi
midi midi
midi midi

Baseline Models

Here are melodies generated by the baseline models:
MT midi midi midi
CWT midi midi midi
Melons midi midi midi
WuYun-Base midi midi midi