--- project: server-stability-mysql-oom type: session-notes status: active tags: - pbs - docker - production - wordpress created: 2026-03-23 updated: 2026-03-23 path: PBS/Tech/Sessions/ --- # Server Stability - MySQL OOM Fix & Memory Monitoring ## Summary Two server crashes in 48 hours (Saturday March 22 ~6AM ET, Monday March 23 ~6:20AM ET) traced to MySQL being OOM-killed by the Linux kernel. Root cause: MySQL had no memory limits and was consuming ~1.8GB before the OOM killer intervened, triggering a cascading failure that made the server completely unresponsive. ## Investigation Findings ### OOM Kill Evidence (from systemd journal) - **Saturday crash:** `Out of memory: Killed process 4138817 (mysqld) total-vm:1841380kB` - **Monday crash:** `Out of memory: Killed process 13015 (mysqld) total-vm:1828060kB` - Both crashes followed the same pattern: MySQL OOM-killed → Docker restarts MySQL → system still memory-starved → swapoff killed → complete server lockup → manual Linode reboot required ### Crash Timeline - Both crashes occurred around 6:00-6:20 AM Eastern (10:00-10:20 UTC — server runs in UTC) - WooCommerce installed Saturday — first crash Saturday night, second Monday morning - WooCommerce Action Scheduler showed no failed/stuck tasks — likely not the direct trigger - Wordfence scan logs showed a ~1 minute scan on March 19 at ~10PM ET — does not align with crash window - Wordfence scan scheduling is automatic on free tier (no manual schedule control) ### Ruled Out - WooCommerce Action Scheduler runaway tasks (all showed completed status) - Wordfence scan timing (didn't align with crash window) - Multiple MySQL instances (htop showed threads, not separate processes — press `H` in htop to toggle thread view) ### Not Yet Determined - Exact trigger causing MySQL to balloon to 1.8GB overnight - Whether WooCommerce's added baseline DB load is the tipping point - `apt-daily.service` was running during Monday's crash — may be contributing to memory pressure ## Changes Made ### MySQL Memory Cap & Tuning (compose.yml) Added to the `mysql` service in `/opt/docker/wordpress/compose.yml`: ```yaml mysql: image: mysql:8.0 container_name: wordpress_mysql restart: unless-stopped deploy: resources: limits: memory: 768M reservations: memory: 256M environment: MYSQL_DATABASE: wordpress MYSQL_USER: wordpress MYSQL_PASSWORD: ${MYSQL_PASSWORD} MYSQL_ROOT_PASSWORD: ${MYSQL_ROOT_PASSWORD} volumes: - mysql_data:/var/lib/mysql networks: - internal command: >- --default-authentication-plugin=mysql_native_password --innodb-buffer-pool-size=256M --innodb-log-buffer-size=16M --max-connections=50 --key-buffer-size=16M --tmp-table-size=32M --max-heap-table-size=32M --table-open-cache=256 --performance-schema=OFF ``` **What each setting does:** - `limits: memory: 768M` — Docker kills MySQL if it exceeds 768MB (controlled restart vs kernel OOM) - `reservations: memory: 256M` — Guarantees MySQL gets at least 256MB - `innodb-buffer-pool-size=256M` — Caps InnoDB cache (MySQL's biggest memory consumer) - `max-connections=50` — Reduced from default 151 (less memory per connection) - `performance-schema=OFF` — Saves ~200-400MB (internal MySQL monitoring not needed) **Result:** | Metric | Before | After | |--------|--------|-------| | MySQL memory usage | 474MB (uncapped, spiked to 1.8GB) | 225MB (capped at 768MB) | | MySQL % of cap | N/A | 29% | | Total stack memory | ~2.05GB | ~2.0GB | ### Memory Monitoring Script Created `/usr/local/bin/docker-mem-log.sh` — logs per-container memory usage every 5 minutes via cron: ```bash #!/bin/bash LOG_FILE="/var/log/pbs-monitoring/container-memory.log" echo "$(date -u '+%Y-%m-%d %H:%M:%S UTC') | $(docker stats --no-stream --format '{{.Name}}:{{.MemUsage}}' | tr '\n' ' ')" >> "$LOG_FILE" ``` Cron entry at `/etc/cron.d/docker-mem-monitor`: ``` */5 * * * * root /usr/local/bin/docker-mem-log.sh ``` **Check logs with:** `tail -20 /var/log/pbs-monitoring/container-memory.log` ### Journal Persistence Confirmed - `/var/log/journal` exists and is retaining logs across reboots - `journalctl --list-boots` shows 5 boot sessions dating back to May 2025 - OOM kill evidence was successfully retrieved from previous boots ## Current Server Memory Snapshot (post-fix) | Container | Memory | % of Limit | |-----------|--------|------------| | wordpress | 1.11 GB | 29% (of system) | | wordpress_mysql | 225 MB | 29% (of 768MB cap) | | n8n | 200 MB | 5% | | uptime-kuma | 100 MB | 3% | | traefik | 37 MB | 1% | | pbs-api | 28 MB | 1% | | redis | 13 MB | 2% (of 640MB cap) | | wpcron | 8 MB | <1% | ## Still Open - [ ] Monitor overnight stability — check memory logs tomorrow AM - [ ] Add log rotation for `/var/log/pbs-monitoring/container-memory.log` - [ ] Investigate `apt-daily.service` — consider disabling automatic apt updates - [ ] Server sizing discussion: 4GB may be tight for adding Gitea + Authelia - [ ] Determine if Wordfence free-tier scan is contributing to memory pressure - [ ] Consider setting server timezone to Eastern for easier log reading - [ ] Investigate root cause of MySQL memory bloat (WooCommerce correlation still strong) ## Key Learnings - **htop shows threads, not processes** — press `H` to toggle thread visibility; one MySQL process can show as dozens of rows - **systemd journal persists across reboots** if `/var/log/journal` exists and `Storage=auto` or `Storage=persistent` is set - **`journalctl -b -1`** shows previous boot logs; use `--since`/`--until` for large time ranges to avoid hanging - **`performance-schema=OFF`** in MySQL saves ~200-400MB with no downside for production WordPress - **Docker `deploy.resources.limits.memory`** provides a controlled cap — Docker restarts the container instead of the kernel OOM-killing it and cascading - **Server timezone is UTC** — subtract 4 hours for Eastern time when reading logs