pbs-obsidian-vault/PBS/Tech/Sessions/server-stability-mysql-oom.md

5.9 KiB

project type status tags created updated path
server-stability-mysql-oom session-notes active
pbs
docker
production
wordpress
2026-03-23 2026-03-23 PBS/Tech/Sessions/

Server Stability - MySQL OOM Fix & Memory Monitoring

Summary

Two server crashes in 48 hours (Saturday March 22 ~6AM ET, Monday March 23 ~6:20AM ET) traced to MySQL being OOM-killed by the Linux kernel. Root cause: MySQL had no memory limits and was consuming ~1.8GB before the OOM killer intervened, triggering a cascading failure that made the server completely unresponsive.

Investigation Findings

OOM Kill Evidence (from systemd journal)

  • Saturday crash: Out of memory: Killed process 4138817 (mysqld) total-vm:1841380kB
  • Monday crash: Out of memory: Killed process 13015 (mysqld) total-vm:1828060kB
  • Both crashes followed the same pattern: MySQL OOM-killed → Docker restarts MySQL → system still memory-starved → swapoff killed → complete server lockup → manual Linode reboot required

Crash Timeline

  • Both crashes occurred around 6:00-6:20 AM Eastern (10:00-10:20 UTC — server runs in UTC)
  • WooCommerce installed Saturday — first crash Saturday night, second Monday morning
  • WooCommerce Action Scheduler showed no failed/stuck tasks — likely not the direct trigger
  • Wordfence scan logs showed a ~1 minute scan on March 19 at ~10PM ET — does not align with crash window
  • Wordfence scan scheduling is automatic on free tier (no manual schedule control)

Ruled Out

  • WooCommerce Action Scheduler runaway tasks (all showed completed status)
  • Wordfence scan timing (didn't align with crash window)
  • Multiple MySQL instances (htop showed threads, not separate processes — press H in htop to toggle thread view)

Not Yet Determined

  • Exact trigger causing MySQL to balloon to 1.8GB overnight
  • Whether WooCommerce's added baseline DB load is the tipping point
  • apt-daily.service was running during Monday's crash — may be contributing to memory pressure

Changes Made

MySQL Memory Cap & Tuning (compose.yml)

Added to the mysql service in /opt/docker/wordpress/compose.yml:

  mysql:
    image: mysql:8.0
    container_name: wordpress_mysql
    restart: unless-stopped
    deploy:
      resources:
        limits:
          memory: 768M
        reservations:
          memory: 256M
    environment:
      MYSQL_DATABASE: wordpress
      MYSQL_USER: wordpress
      MYSQL_PASSWORD: ${MYSQL_PASSWORD}
      MYSQL_ROOT_PASSWORD: ${MYSQL_ROOT_PASSWORD}
    volumes:
      - mysql_data:/var/lib/mysql
    networks:
      - internal
    command: >-
      --default-authentication-plugin=mysql_native_password
      --innodb-buffer-pool-size=256M
      --innodb-log-buffer-size=16M
      --max-connections=50
      --key-buffer-size=16M
      --tmp-table-size=32M
      --max-heap-table-size=32M
      --table-open-cache=256
      --performance-schema=OFF      

What each setting does:

  • limits: memory: 768M — Docker kills MySQL if it exceeds 768MB (controlled restart vs kernel OOM)
  • reservations: memory: 256M — Guarantees MySQL gets at least 256MB
  • innodb-buffer-pool-size=256M — Caps InnoDB cache (MySQL's biggest memory consumer)
  • max-connections=50 — Reduced from default 151 (less memory per connection)
  • performance-schema=OFF — Saves ~200-400MB (internal MySQL monitoring not needed)

Result:

Metric Before After
MySQL memory usage 474MB (uncapped, spiked to 1.8GB) 225MB (capped at 768MB)
MySQL % of cap N/A 29%
Total stack memory ~2.05GB ~2.0GB

Memory Monitoring Script

Created /usr/local/bin/docker-mem-log.sh — logs per-container memory usage every 5 minutes via cron:

#!/bin/bash
LOG_FILE="/var/log/pbs-monitoring/container-memory.log"
echo "$(date -u '+%Y-%m-%d %H:%M:%S UTC') | $(docker stats --no-stream --format '{{.Name}}:{{.MemUsage}}' | tr '\n' ' ')" >> "$LOG_FILE"

Cron entry at /etc/cron.d/docker-mem-monitor:

*/5 * * * * root /usr/local/bin/docker-mem-log.sh

Check logs with: tail -20 /var/log/pbs-monitoring/container-memory.log

Journal Persistence Confirmed

  • /var/log/journal exists and is retaining logs across reboots
  • journalctl --list-boots shows 5 boot sessions dating back to May 2025
  • OOM kill evidence was successfully retrieved from previous boots

Current Server Memory Snapshot (post-fix)

Container Memory % of Limit
wordpress 1.11 GB 29% (of system)
wordpress_mysql 225 MB 29% (of 768MB cap)
n8n 200 MB 5%
uptime-kuma 100 MB 3%
traefik 37 MB 1%
pbs-api 28 MB 1%
redis 13 MB 2% (of 640MB cap)
wpcron 8 MB <1%

Still Open

  • Monitor overnight stability — check memory logs tomorrow AM
  • Add log rotation for /var/log/pbs-monitoring/container-memory.log
  • Investigate apt-daily.service — consider disabling automatic apt updates
  • Server sizing discussion: 4GB may be tight for adding Gitea + Authelia
  • Determine if Wordfence free-tier scan is contributing to memory pressure
  • Consider setting server timezone to Eastern for easier log reading
  • Investigate root cause of MySQL memory bloat (WooCommerce correlation still strong)

Key Learnings

  • htop shows threads, not processes — press H to toggle thread visibility; one MySQL process can show as dozens of rows
  • systemd journal persists across reboots if /var/log/journal exists and Storage=auto or Storage=persistent is set
  • journalctl -b -1 shows previous boot logs; use --since/--until for large time ranges to avoid hanging
  • performance-schema=OFF in MySQL saves ~200-400MB with no downside for production WordPress
  • Docker deploy.resources.limits.memory provides a controlled cap — Docker restarts the container instead of the kernel OOM-killing it and cascading
  • Server timezone is UTC — subtract 4 hours for Eastern time when reading logs