5.9 KiB
5.9 KiB
| project | type | status | tags | created | updated | path | ||||
|---|---|---|---|---|---|---|---|---|---|---|
| server-stability-mysql-oom | session-notes | active |
|
2026-03-23 | 2026-03-23 | PBS/Tech/Sessions/ |
Server Stability - MySQL OOM Fix & Memory Monitoring
Summary
Two server crashes in 48 hours (Saturday March 22 ~6AM ET, Monday March 23 ~6:20AM ET) traced to MySQL being OOM-killed by the Linux kernel. Root cause: MySQL had no memory limits and was consuming ~1.8GB before the OOM killer intervened, triggering a cascading failure that made the server completely unresponsive.
Investigation Findings
OOM Kill Evidence (from systemd journal)
- Saturday crash:
Out of memory: Killed process 4138817 (mysqld) total-vm:1841380kB - Monday crash:
Out of memory: Killed process 13015 (mysqld) total-vm:1828060kB - Both crashes followed the same pattern: MySQL OOM-killed → Docker restarts MySQL → system still memory-starved → swapoff killed → complete server lockup → manual Linode reboot required
Crash Timeline
- Both crashes occurred around 6:00-6:20 AM Eastern (10:00-10:20 UTC — server runs in UTC)
- WooCommerce installed Saturday — first crash Saturday night, second Monday morning
- WooCommerce Action Scheduler showed no failed/stuck tasks — likely not the direct trigger
- Wordfence scan logs showed a ~1 minute scan on March 19 at ~10PM ET — does not align with crash window
- Wordfence scan scheduling is automatic on free tier (no manual schedule control)
Ruled Out
- WooCommerce Action Scheduler runaway tasks (all showed completed status)
- Wordfence scan timing (didn't align with crash window)
- Multiple MySQL instances (htop showed threads, not separate processes — press
Hin htop to toggle thread view)
Not Yet Determined
- Exact trigger causing MySQL to balloon to 1.8GB overnight
- Whether WooCommerce's added baseline DB load is the tipping point
apt-daily.servicewas running during Monday's crash — may be contributing to memory pressure
Changes Made
MySQL Memory Cap & Tuning (compose.yml)
Added to the mysql service in /opt/docker/wordpress/compose.yml:
mysql:
image: mysql:8.0
container_name: wordpress_mysql
restart: unless-stopped
deploy:
resources:
limits:
memory: 768M
reservations:
memory: 256M
environment:
MYSQL_DATABASE: wordpress
MYSQL_USER: wordpress
MYSQL_PASSWORD: ${MYSQL_PASSWORD}
MYSQL_ROOT_PASSWORD: ${MYSQL_ROOT_PASSWORD}
volumes:
- mysql_data:/var/lib/mysql
networks:
- internal
command: >-
--default-authentication-plugin=mysql_native_password
--innodb-buffer-pool-size=256M
--innodb-log-buffer-size=16M
--max-connections=50
--key-buffer-size=16M
--tmp-table-size=32M
--max-heap-table-size=32M
--table-open-cache=256
--performance-schema=OFF
What each setting does:
limits: memory: 768M— Docker kills MySQL if it exceeds 768MB (controlled restart vs kernel OOM)reservations: memory: 256M— Guarantees MySQL gets at least 256MBinnodb-buffer-pool-size=256M— Caps InnoDB cache (MySQL's biggest memory consumer)max-connections=50— Reduced from default 151 (less memory per connection)performance-schema=OFF— Saves ~200-400MB (internal MySQL monitoring not needed)
Result:
| Metric | Before | After |
|---|---|---|
| MySQL memory usage | 474MB (uncapped, spiked to 1.8GB) | 225MB (capped at 768MB) |
| MySQL % of cap | N/A | 29% |
| Total stack memory | ~2.05GB | ~2.0GB |
Memory Monitoring Script
Created /usr/local/bin/docker-mem-log.sh — logs per-container memory usage every 5 minutes via cron:
#!/bin/bash
LOG_FILE="/var/log/pbs-monitoring/container-memory.log"
echo "$(date -u '+%Y-%m-%d %H:%M:%S UTC') | $(docker stats --no-stream --format '{{.Name}}:{{.MemUsage}}' | tr '\n' ' ')" >> "$LOG_FILE"
Cron entry at /etc/cron.d/docker-mem-monitor:
*/5 * * * * root /usr/local/bin/docker-mem-log.sh
Check logs with: tail -20 /var/log/pbs-monitoring/container-memory.log
Journal Persistence Confirmed
/var/log/journalexists and is retaining logs across rebootsjournalctl --list-bootsshows 5 boot sessions dating back to May 2025- OOM kill evidence was successfully retrieved from previous boots
Current Server Memory Snapshot (post-fix)
| Container | Memory | % of Limit |
|---|---|---|
| wordpress | 1.11 GB | 29% (of system) |
| wordpress_mysql | 225 MB | 29% (of 768MB cap) |
| n8n | 200 MB | 5% |
| uptime-kuma | 100 MB | 3% |
| traefik | 37 MB | 1% |
| pbs-api | 28 MB | 1% |
| redis | 13 MB | 2% (of 640MB cap) |
| wpcron | 8 MB | <1% |
Still Open
- Monitor overnight stability — check memory logs tomorrow AM
- Add log rotation for
/var/log/pbs-monitoring/container-memory.log - Investigate
apt-daily.service— consider disabling automatic apt updates - Server sizing discussion: 4GB may be tight for adding Gitea + Authelia
- Determine if Wordfence free-tier scan is contributing to memory pressure
- Consider setting server timezone to Eastern for easier log reading
- Investigate root cause of MySQL memory bloat (WooCommerce correlation still strong)
Key Learnings
- htop shows threads, not processes — press
Hto toggle thread visibility; one MySQL process can show as dozens of rows - systemd journal persists across reboots if
/var/log/journalexists andStorage=autoorStorage=persistentis set journalctl -b -1shows previous boot logs; use--since/--untilfor large time ranges to avoid hangingperformance-schema=OFFin MySQL saves ~200-400MB with no downside for production WordPress- Docker
deploy.resources.limits.memoryprovides a controlled cap — Docker restarts the container instead of the kernel OOM-killing it and cascading - Server timezone is UTC — subtract 4 hours for Eastern time when reading logs