Deep Dive: Implementing the Game Boy Graphics Processing Unit

After tackling the APU, I turned my attention to the Game Boy's GPU—a fascinating mix of simplicity and clever design that brings iconic 8-bit worlds to life. This post breaks down the technical journey of emulating its tile-based rendering system, sprite handling, and the quirks of integrating it with modern SDL.


GPU Architecture and Registers

The GPU operates in lockstep with the CPU, managing a 160x144 pixel display through a set of memory-mapped registers and two key memory regions:

  • VRAM (0x8000-0x9FFF): Stores tile data and background/window maps.
  • OAM (0xFE00-0xFE9F): Holds sprite attributes (position, tile index, flags).

Key registers control the display:

  • LCDC (0xFF40): Master switch for display, background, sprites, and window.
  • STAT (0xFF41): Tracks current mode (HBlank, VBlank, OAM Search, Pixel Transfer) and triggers interrupts.
  • SCY/SCX (0xFF42-43): Scroll offsets for the background.
  • BGP/OBP0/OBP1 (0xFF47-49): Define grayscale palettes for backgrounds and sprites.

The GPU cycles through four states every frame, synchronized to the CPU’s clock:

  1. OAM Search: Identifies sprites visible on the current scanline (max 10 per line).
  2. Pixel Transfer: Renders tiles and sprites to a line buffer.
  3. HBlank: Waits for the CPU while the display scans horizontally.
  4. VBlank: Triggers an interrupt and pauses rendering for 10 scanlines.

Rendering Pipeline Breakdown

Background & Window

The background is a grid of 32x32 tiles, each 8x8 pixels. Scrolling (SCX/SCY) lets the viewport "move" over this grid. Tiles are fetched from VRAM using two addressing modes: 0x8000 (unsigned) or 0x8800 (signed). Each tile’s pixel data is stored as two bits per pixel across two bytes, decoded into four shades of gray using the BGP palette.

The window overlays a static image (like a HUD) using its own position registers (WX/WY). It shares tile data with the background but renders independently, incrementing its internal line counter with each scanline.

Sprite Rendering

Sprites are 8x8 or 8x16 pixels, positioned via OAM entries. During OAM Search, the GPU checks which sprites intersect the current scanline. Rendering prioritizes sprites by X-coordinate (or OAM order in CGB mode), with lower X values drawing first. Each sprite’s attributes control flipping, palette selection, and background priority.

Priority Rules:

  • Transparent pixels (color 0) are skipped.
  • In DMG mode, sprites with higher priority (lower X) overwrite lower ones.
  • Background tiles can override sprites if their priority flag is set.

Game Boy Color Enhancements

Emulating CGB mode added layers of complexity. The GPU supports:

  • VRAM Banking: Two 8KB banks for expanded tile data.
  • Per-Tile Attributes: Palettes, X/Y flipping, and VRAM bank selection.
  • Dynamic Palettes: 8 background and 8 sprite palettes, each with 4 RGB555 colors.

Palette updates are managed through dedicated registers (BCPS/OCPS), which auto-increment indexes to streamline color data writes. Converting RGB555 to modern ARGB8888 required careful bit-shifting and masking to preserve the original’s muted charm.


Challenges & Debugging Sagas

The biggest hurdle was synchronizing the GPU’s state machine with the CPU’s clock. A single misaligned cycle could desync interrupts or corrupt the frame. I spent days troubleshooting flickering sprites, only to realize my OAM Search wasn’t resetting properly between frames.

Another headache was CGB priority handling. Testing with Pokémon Crystal revealed invisible menus—turns out I’d misread the priority bit logic, letting background tiles override sprites even when they shouldn’t. Fixing it felt like finally solving a puzzle where the pieces were all 8x8 pixels.


Conclusion

Building the GPU taught me to appreciate the elegance of constraint. Every hardware quirk—from the 10-sprite-per-line limit to the rigid 456-cycles-per-scanline rule—shaped the look and feel of Game Boy games. Key takeaways:

  • Tile-based rendering is a masterclass in memory efficiency.
  • Hardware-level timing is non-negotiable for accurate emulation.
  • Sprite priority logic is deceptively complex (but oh-so-satisfying when it clicks).

Next up: cartridge emulation and MBC chips. Will I finally get Link’s Awakening to boot? Stay tuned.

The full GPU implementation is available on GitHub. Again thanks to Pan Docs for the great documentation.


Posted by: Aidan Vidal

Posted on: April 16, 2025