Deep Dive: Implementing the Game Boy Audio Processing Unit

A comprehensive exploration of the Game Boy's sound hardware emulation, covering everything from register-level implementation to modern audio pipeline integration.

This will be the first in a series that will follow my implementation of a Game Boy emulator in C++. I have very little prior experience with building an emulator, I have only built a Chip8 emulator before, so I am learning a lot as I go. I will be documenting my progress and sharing the code as I go along. This post will focus on the audio processing unit (APU) of the Game Boy. I started with the APU first because I heard it was the most complex part of the system and I wanted to get it out of the way first. I also thought it would be a good way to learn about digital audio synthesis and how to implement it in C++.


APU Architecture and Fundamentals

The Game Boy's APU is a sophisticated piece of hardware for its time, operating at the system's master clock frequency of 4.194304 MHz. Unlike modern audio processors, it generates sound entirely through digital synthesis rather than using samples or advanced DSP.

Key architectural features:

  • 4 Independent Channels: Each with dedicated registers and unique synthesis methods
  • Frame Sequencer: A 512Hz timer that coordinates envelope and length operations
  • Digital-to-Analog Conversion: Each channel has its own DAC before mixing
  • Stereo Panning: Individual left/right control per channel
  • Master Control: NR52 register enables/disables entire APU

The APU's memory map occupies registers from FF10-FF3F, with each channel having specific register allocations:

Channel Registers Control Bits
Channel 1 (Pulse with Sweep) FF10-FF14 Sweep, Duty, Volume, Frequency
Channel 2 (Pulse) FF15-FF19 Duty, Volume, Frequency
Channel 3 (Wave) FF1A-FF1E Enable, Length, Volume, Frequency FF30-FF3F Wave Pattern RAM
Channel 4 (Noise) FF1F-FF23 Length, Volume, Polynomial, Clock
Control FF24-FF26 Volume, Panning, Power

The frame sequencer is particularly interesting - it's a 8-step timer clocked at 512Hz that triggers different events at specific intervals:

  • Step 0: Length
  • Step 1: Nothing
  • Step 2: Length
  • Step 3: Sweep
  • Step 4: Length
  • Step 5: Nothing
  • Step 6: Length
  • Step 7: Envelope and Sweep

Channel Implementation Deep Dives

Channel 1: Pulse Wave with Sweep

Duty Cycle Patterns:

12.5%: _-------_-------
25%:   __------__------
50%:   ____----____----
75%:   ______--______--
                

The sweep unit is one of the most complex parts of Channel 1. It modifies the channel's frequency periodically according to a sweep period, direction, and pace (steps per clocked cycle) set in the NR10 register. The sweep unit is clocked at 128Hz.

void ChannelOne::sweepIteration() {
  if (getSweepPace() == 0 || !isEnabled()) return;
  
  state.sweepTimer--;
  if (state.sweepTimer <= 0) {
    state.sweepTimer = getSweepPace();
    int period = getPeriod();
    int shift = getSweepStep();
    
    if (getSweepDirection()) {
      // Increasing frequency
      int newPeriod = period + (period / (1 << shift));
      if (newPeriod > 2047) {
        setEnabled(false); // Overflow disables channel
      } else {
        setPeriod(newPeriod);
      }
    } else {
      // Decreasing frequency
      int newPeriod = period - (period / (1 << shift));
      if (newPeriod >= 0) {
        setPeriod(newPeriod);
      }
    }
  }
}

The envelope generator has is clocked at 64Hz (triggered by the frame sequencer) and controls the volume of the channel. The envelope period is set in the NR12 register and can be set to 0 to disable it.

void ChannelOne::updateEnvelope() {
  if (state.envelopePeriod == 0) return;
  
  state.envelopeTimer--;
  if (state.envelopeTimer <= 0) {
    state.envelopeTimer = state.envelopePeriod;
    if (state.envelopeDirection && state.volume < 15) {
      state.volume++;
    } else if (!state.envelopeDirection && state.volume > 0) {
      state.volume--;
    }
  }
}

Channel 2 is exactly the same as channel 1 but without the sweep functionality, and thus no equivalent to NR10 register.

Channel 3: Wave Pattern Generator

The wave channel is unique in that it plays back user-defined 4-bit samples from its 16-byte wave RAM (32 samples × 4 bits each). The sample playback timing is calculated as:

void ChannelThree::updateSampleTimer(int cycles) {
  if (!isEnabled()) return;
  
  state.sampleTimer -= cycles;
  if (state.sampleTimer <= 0) {
    // 4MHz clock divided by (2048 - period)*2
    state.sampleTimer = (2048 - getPeriod()) * 2;
    state.sampleSelection = (state.sampleSelection + 1) % 32;
    
    // Get 4-bit sample (alternating upper/lower nibbles)
    bool upperNibble = (state.sampleSelection % 2 == 0);
    int ramIndex = state.sampleSelection / 2;
    state.sampleBuffer = getNibbleWavePatternRAM(ramIndex, upperNibble);
  }
}

The wave channel's volume control is implemented through bit shifting:

float ChannelThree::getSample() {
  if (!state.dacEnabled) return 0.0f;
  
  int sample = state.sampleBuffer;
  switch (getOutputLevel()) {
    case 0: return 0.0f;       // Mute
    case 1: return sample;     // 100%
    case 2: return sample >> 1; // 50%
    case 3: return sample >> 2; // 25%
    default: return 0.0f;
  }
}
Channel 4: Noise Generator

The noise channel uses a 15-bit or 7-bit linear feedback shift register (LFSR) for pseudo-random noise generation. The LFSR is clocked at:

f = 524288 Hz / r / 2^(s+1)

Where r is the divider (0 treated as 0.5) and s is the clock shift

void ChannelFour::updateLFSR(int cycles) {
  state.lfsrTimer -= cycles;
  if (state.lfsrTimer <= 0) {
    // Reset timer based on current parameters
    state.lfsrTimer = (16 * (getClockDivider() == 0 ? 0.5 : getClockDivider()) 
                      * (1 << getClockShift()));
    
    // XOR bits 0 and 1
    bool xorResult = ((state.lfsr & 1) ^ ((state.lfsr >> 1) & 1));
    
    if (state.lfsrWidth) {
      // 7-bit mode: XOR result goes to bit 6
      state.lfsr = (state.lfsr >> 1) | (xorResult << 6);
    } else {
      // 15-bit mode: XOR result goes to bit 14
      state.lfsr = (state.lfsr >> 1) | (xorResult << 14);
    }
  }
}

SDL Audio Backend Implementation

The SDL audio implementation bridges the gap between the Game Boy's digital audio synthesis and modern operating system audio APIs. Key implementation details include:

Initialization and Configuration:

APU::APU() {
  SDL_AudioSpec audioSpec;
  audioSpec.freq = 44100;         // CD-quality sample rate
  audioSpec.format = AUDIO_F32SYS; // 32-bit float samples
  audioSpec.channels = 2;         // Stereo output
  audioSpec.samples = 4096;       // Buffer size
  audioSpec.callback = NULL;      // Using push architecture
  SDL_OpenAudio(&audioSpec, &obtainedSpec);
}

Key Design Decisions:

  • Float32 format preserves dynamic range during mixing
  • Stereo panning through NR51 register bitmask processing
  • Asynchronous queued audio to prevent buffer underruns

The audio mixing pipeline follows this sequence:

  1. Normalize each channel's 4-bit output to 0.0-1.0 range
  2. Apply per-channel panning from NR51 register
  3. Sum contributions for left/right channels
  4. Apply master volume scaling from NR50 register
// Simplified mixing implementation
void APU::getAudioSample() {
  // Sum normalized channel contributions
  float left = (ch1 * panLeft[0] + ch2 * panLeft[1] + 
              ch3 * panLeft[2] + ch4 * panLeft[3]) / 15.0f;
              
  float right = (ch1 * panRight[0] + ch2 * panRight[1] + 
               ch3 * panRight[2] + ch4 * panRight[3]) / 15.0f;

  // Apply master volume (NR50)
  left *= (volLeft / 7.0f);
  right *= (volRight / 7.0f);

  // Queue to SDL buffer
  buffer[bufferFill++] = left;
  buffer[bufferFill++] = right;
}
Synchronization Challenges

The implementation must balance:

  • Game Boy's 4.19MHz clock domain
  • SDL's fixed 44.1kHz sample rate
  • Real-time audio buffer maintenance

This is solved using cycle-counting with overflow handling:

void cpuUpdate(int cycles) {
  cyclesAccumulated += cycles;
  while (cyclesAccumulated >= CYCLES_PER_SAMPLE) {
    apu.generateSample();
    cyclesAccumulated -= CYCLES_PER_SAMPLE;
  }
}

Buffer Management Strategy:

  • Double-buffering to prevent audio glitches
  • Non-blocking SDL_QueueAudio() calls
  • Automatic sample rate adaptation through cycle counting

Challenges

The hardest aspect of this was just trying to understand how everything fit together. Since I had not worked on many emulators in the past, understanding where each channel and timing fit in the grand scheme of things was challenging for me. Eventually I got it all working, but it took a lot of trial and error to get there as well as reading of manuals and documentation (Thanks Pan Docs!).

One particular thing that stood out was with the channel 3 tone frequency. Since the channel 3's channel frequency was faster than channel 1 and 2, but had a longer waveform cycle, channel 3 should have produced a lower tone frequency than channel 1 and 2. However, when I first tested it, I found that channel 3 was producing a higher tone frequency than channel 1 and 2. After some debugging, I found that I misunderstood what the waveform cycle was. The waveform cycle I was testing channel 3 with was actually only 1/4 of the proper cycles length (32), thus leading to a higher tone frequency than expected. After correcting this mistake, I was able to get the proper channel 3 tone frequency. This bug was very frustrating to track down since it was not in my implementation but rather my testing that was wrong. I think I lost a day to this bug, but it was a good learning experience for me. I learned that testing is just as important as implementation and that you should always double check your tests to make sure they are correct.


Conclusion

Implementing the Game Boy APU has been an incredibly educational experience in both retro hardware architecture and audio programming. Some key takeaways:

  • The importance of cycle-accurate timing in faithful emulation
  • How simple digital synthesis can create rich soundscapes
  • The challenges of bridging vintage hardware designs with modern audio APIs

Now that the APU is done, it is time to move on to the next part. I think I will tackle timers and interrupts next. Probably no post about that since it is not as interesting as the APU, but do definitly expect one for the Graphics (GPU) and Cartridge system because those look really interesting.

The complete source code is available on GitHub.

Check out Pan Docs for great documentation on Game Boy hardware.


Posted by: Aidan Vidal

Posted on: April 7, 2025