WuYun: Skeleton-Guided Melody Generation with Long-Term Structure

Abstract

Deep learning has significantly advanced melody generation, but generating structured melodies remains an important challenge. Previous works often use either single-stage models, which directly map inputs to outputs, or two-stage frameworks that incorporate structural similarities, such as repetition and variation, for melody generation. However, both approaches ignore the explicit differentiation of musical events' structural importance when modeling melodic hierarchical structure. In this paper, we introduce WuYun, a novel two-stage Transformer-based melody generation framework guided by melodic skeletons. The framework first constructs the melodic skeleton by generating structurally important notes and subsequently realizes melodic prolongation by infilling the skeleton with decorative notes, facilitating complete melody creation. Specifically, we propose a knowledge-based and data-driven method to effectively identify and extract notes of significant structural importance from three aspects: meter, rhythm, and harmony, thus forming the melodic skeleton. The extracted skeletons serve a dual purpose: they are used to train an autoregressive model for generating new melodic skeletons and to provide structural guidance for melody generation. Both subjective and objective results demonstrate that WuYun generates melodies with improved long-term structure and musicality, outperforming other state-of-the-art methods.

WuYun Overview

WuYun Samples

Generated Melody with Chord		Generated Melody w/o Chord
	midi		midi
	midi		midi
	midi		midi

Baseline Models

Here are melodies generated by the baseline models:


MT	midi	midi	midi
CWT	midi	midi	midi
Melons	midi	midi	midi
WuYun-Base	midi	midi	midi