Generative Adversarial Post-Training Mitigates Reward Hacking in Live Human-AI Music Interaction

Anonymous authors

We evaluate Generative Adversarial Post-Training (GAPT) for real-time melody-to-chord accompaniment. The examples below shows piano-roll visualizations with four systems: the ground-truth data, GAPT (ours), GAPT without adversarial post-training, and the ReaLchords baseline.

Examples of Hooktheory test set

Hooktheory Example 1

Hooktheory example 103 comparison

Ground truth

GAPT (ours)

GAPT w/o adversarial post-training

ReaLchords baseline

Hooktheory Example 2

Hooktheory example 106 comparison

Ground truth

GAPT (ours)

GAPT w/o adversarial post-training

ReaLchords baseline

Hooktheory Example 3

Hooktheory example 109 comparison

Ground truth

GAPT (ours)

GAPT w/o adversarial post-training

ReaLchords baseline

Hooktheory Example 4

Hooktheory example 113 comparison

Ground truth

GAPT (ours)

GAPT w/o adversarial post-training

ReaLchords baseline

Examples of Nottingham test set

Nottingham Example 1

Nottingham example 4 comparison

Ground truth

GAPT (ours)

GAPT w/o adversarial post-training

ReaLchords baseline

Nottingham Example 11

Nottingham example 11 comparison

Ground truth

GAPT (ours)

GAPT w/o adversarial post-training

ReaLchords baseline

Nottingham Example 2

Nottingham example 15 comparison

Ground truth

GAPT (ours)

GAPT w/o adversarial post-training

ReaLchords baseline

Nottingham Example 3

Nottingham example 34 comparison

Ground truth

GAPT (ours)

GAPT w/o adversarial post-training

ReaLchords baseline