146 lines
5.9 KiB
Markdown
146 lines
5.9 KiB
Markdown
---
|
|
project: server-stability-mysql-oom
|
|
type: session-notes
|
|
status: active
|
|
tags:
|
|
- pbs
|
|
- docker
|
|
- production
|
|
- wordpress
|
|
created: 2026-03-23
|
|
updated: 2026-03-23
|
|
path: PBS/Tech/Sessions/
|
|
---
|
|
|
|
# Server Stability - MySQL OOM Fix & Memory Monitoring
|
|
|
|
## Summary
|
|
|
|
Two server crashes in 48 hours (Saturday March 22 ~6AM ET, Monday March 23 ~6:20AM ET) traced to MySQL being OOM-killed by the Linux kernel. Root cause: MySQL had no memory limits and was consuming ~1.8GB before the OOM killer intervened, triggering a cascading failure that made the server completely unresponsive.
|
|
|
|
## Investigation Findings
|
|
|
|
### OOM Kill Evidence (from systemd journal)
|
|
- **Saturday crash:** `Out of memory: Killed process 4138817 (mysqld) total-vm:1841380kB`
|
|
- **Monday crash:** `Out of memory: Killed process 13015 (mysqld) total-vm:1828060kB`
|
|
- Both crashes followed the same pattern: MySQL OOM-killed → Docker restarts MySQL → system still memory-starved → swapoff killed → complete server lockup → manual Linode reboot required
|
|
|
|
### Crash Timeline
|
|
- Both crashes occurred around 6:00-6:20 AM Eastern (10:00-10:20 UTC — server runs in UTC)
|
|
- WooCommerce installed Saturday — first crash Saturday night, second Monday morning
|
|
- WooCommerce Action Scheduler showed no failed/stuck tasks — likely not the direct trigger
|
|
- Wordfence scan logs showed a ~1 minute scan on March 19 at ~10PM ET — does not align with crash window
|
|
- Wordfence scan scheduling is automatic on free tier (no manual schedule control)
|
|
|
|
### Ruled Out
|
|
- WooCommerce Action Scheduler runaway tasks (all showed completed status)
|
|
- Wordfence scan timing (didn't align with crash window)
|
|
- Multiple MySQL instances (htop showed threads, not separate processes — press `H` in htop to toggle thread view)
|
|
|
|
### Not Yet Determined
|
|
- Exact trigger causing MySQL to balloon to 1.8GB overnight
|
|
- Whether WooCommerce's added baseline DB load is the tipping point
|
|
- `apt-daily.service` was running during Monday's crash — may be contributing to memory pressure
|
|
|
|
## Changes Made
|
|
|
|
### MySQL Memory Cap & Tuning (compose.yml)
|
|
Added to the `mysql` service in `/opt/docker/wordpress/compose.yml`:
|
|
|
|
```yaml
|
|
mysql:
|
|
image: mysql:8.0
|
|
container_name: wordpress_mysql
|
|
restart: unless-stopped
|
|
deploy:
|
|
resources:
|
|
limits:
|
|
memory: 768M
|
|
reservations:
|
|
memory: 256M
|
|
environment:
|
|
MYSQL_DATABASE: wordpress
|
|
MYSQL_USER: wordpress
|
|
MYSQL_PASSWORD: ${MYSQL_PASSWORD}
|
|
MYSQL_ROOT_PASSWORD: ${MYSQL_ROOT_PASSWORD}
|
|
volumes:
|
|
- mysql_data:/var/lib/mysql
|
|
networks:
|
|
- internal
|
|
command: >-
|
|
--default-authentication-plugin=mysql_native_password
|
|
--innodb-buffer-pool-size=256M
|
|
--innodb-log-buffer-size=16M
|
|
--max-connections=50
|
|
--key-buffer-size=16M
|
|
--tmp-table-size=32M
|
|
--max-heap-table-size=32M
|
|
--table-open-cache=256
|
|
--performance-schema=OFF
|
|
```
|
|
|
|
**What each setting does:**
|
|
- `limits: memory: 768M` — Docker kills MySQL if it exceeds 768MB (controlled restart vs kernel OOM)
|
|
- `reservations: memory: 256M` — Guarantees MySQL gets at least 256MB
|
|
- `innodb-buffer-pool-size=256M` — Caps InnoDB cache (MySQL's biggest memory consumer)
|
|
- `max-connections=50` — Reduced from default 151 (less memory per connection)
|
|
- `performance-schema=OFF` — Saves ~200-400MB (internal MySQL monitoring not needed)
|
|
|
|
**Result:**
|
|
| Metric | Before | After |
|
|
|--------|--------|-------|
|
|
| MySQL memory usage | 474MB (uncapped, spiked to 1.8GB) | 225MB (capped at 768MB) |
|
|
| MySQL % of cap | N/A | 29% |
|
|
| Total stack memory | ~2.05GB | ~2.0GB |
|
|
|
|
### Memory Monitoring Script
|
|
Created `/usr/local/bin/docker-mem-log.sh` — logs per-container memory usage every 5 minutes via cron:
|
|
|
|
```bash
|
|
#!/bin/bash
|
|
LOG_FILE="/var/log/pbs-monitoring/container-memory.log"
|
|
echo "$(date -u '+%Y-%m-%d %H:%M:%S UTC') | $(docker stats --no-stream --format '{{.Name}}:{{.MemUsage}}' | tr '\n' ' ')" >> "$LOG_FILE"
|
|
```
|
|
|
|
Cron entry at `/etc/cron.d/docker-mem-monitor`:
|
|
```
|
|
*/5 * * * * root /usr/local/bin/docker-mem-log.sh
|
|
```
|
|
|
|
**Check logs with:** `tail -20 /var/log/pbs-monitoring/container-memory.log`
|
|
|
|
### Journal Persistence Confirmed
|
|
- `/var/log/journal` exists and is retaining logs across reboots
|
|
- `journalctl --list-boots` shows 5 boot sessions dating back to May 2025
|
|
- OOM kill evidence was successfully retrieved from previous boots
|
|
|
|
## Current Server Memory Snapshot (post-fix)
|
|
| Container | Memory | % of Limit |
|
|
|-----------|--------|------------|
|
|
| wordpress | 1.11 GB | 29% (of system) |
|
|
| wordpress_mysql | 225 MB | 29% (of 768MB cap) |
|
|
| n8n | 200 MB | 5% |
|
|
| uptime-kuma | 100 MB | 3% |
|
|
| traefik | 37 MB | 1% |
|
|
| pbs-api | 28 MB | 1% |
|
|
| redis | 13 MB | 2% (of 640MB cap) |
|
|
| wpcron | 8 MB | <1% |
|
|
|
|
## Still Open
|
|
|
|
- [ ] Monitor overnight stability — check memory logs tomorrow AM
|
|
- [ ] Add log rotation for `/var/log/pbs-monitoring/container-memory.log`
|
|
- [ ] Investigate `apt-daily.service` — consider disabling automatic apt updates
|
|
- [ ] Server sizing discussion: 4GB may be tight for adding Gitea + Authelia
|
|
- [ ] Determine if Wordfence free-tier scan is contributing to memory pressure
|
|
- [ ] Consider setting server timezone to Eastern for easier log reading
|
|
- [ ] Investigate root cause of MySQL memory bloat (WooCommerce correlation still strong)
|
|
|
|
## Key Learnings
|
|
|
|
- **htop shows threads, not processes** — press `H` to toggle thread visibility; one MySQL process can show as dozens of rows
|
|
- **systemd journal persists across reboots** if `/var/log/journal` exists and `Storage=auto` or `Storage=persistent` is set
|
|
- **`journalctl -b -1`** shows previous boot logs; use `--since`/`--until` for large time ranges to avoid hanging
|
|
- **`performance-schema=OFF`** in MySQL saves ~200-400MB with no downside for production WordPress
|
|
- **Docker `deploy.resources.limits.memory`** provides a controlled cap — Docker restarts the container instead of the kernel OOM-killing it and cascading
|
|
- **Server timezone is UTC** — subtract 4 hours for Eastern time when reading logs |