Create server-stability-mysql-oom.md via n8n
This commit is contained in:
parent
bb943889ec
commit
dc34f927fb
146
PBS/Tech/Sessions/server-stability-mysql-oom.md
Normal file
146
PBS/Tech/Sessions/server-stability-mysql-oom.md
Normal file
@ -0,0 +1,146 @@
|
||||
---
|
||||
project: server-stability-mysql-oom
|
||||
type: session-notes
|
||||
status: active
|
||||
tags:
|
||||
- pbs
|
||||
- docker
|
||||
- production
|
||||
- wordpress
|
||||
created: 2026-03-23
|
||||
updated: 2026-03-23
|
||||
path: PBS/Tech/Sessions/
|
||||
---
|
||||
|
||||
# Server Stability - MySQL OOM Fix & Memory Monitoring
|
||||
|
||||
## Summary
|
||||
|
||||
Two server crashes in 48 hours (Saturday March 22 ~6AM ET, Monday March 23 ~6:20AM ET) traced to MySQL being OOM-killed by the Linux kernel. Root cause: MySQL had no memory limits and was consuming ~1.8GB before the OOM killer intervened, triggering a cascading failure that made the server completely unresponsive.
|
||||
|
||||
## Investigation Findings
|
||||
|
||||
### OOM Kill Evidence (from systemd journal)
|
||||
- **Saturday crash:** `Out of memory: Killed process 4138817 (mysqld) total-vm:1841380kB`
|
||||
- **Monday crash:** `Out of memory: Killed process 13015 (mysqld) total-vm:1828060kB`
|
||||
- Both crashes followed the same pattern: MySQL OOM-killed → Docker restarts MySQL → system still memory-starved → swapoff killed → complete server lockup → manual Linode reboot required
|
||||
|
||||
### Crash Timeline
|
||||
- Both crashes occurred around 6:00-6:20 AM Eastern (10:00-10:20 UTC — server runs in UTC)
|
||||
- WooCommerce installed Saturday — first crash Saturday night, second Monday morning
|
||||
- WooCommerce Action Scheduler showed no failed/stuck tasks — likely not the direct trigger
|
||||
- Wordfence scan logs showed a ~1 minute scan on March 19 at ~10PM ET — does not align with crash window
|
||||
- Wordfence scan scheduling is automatic on free tier (no manual schedule control)
|
||||
|
||||
### Ruled Out
|
||||
- WooCommerce Action Scheduler runaway tasks (all showed completed status)
|
||||
- Wordfence scan timing (didn't align with crash window)
|
||||
- Multiple MySQL instances (htop showed threads, not separate processes — press `H` in htop to toggle thread view)
|
||||
|
||||
### Not Yet Determined
|
||||
- Exact trigger causing MySQL to balloon to 1.8GB overnight
|
||||
- Whether WooCommerce's added baseline DB load is the tipping point
|
||||
- `apt-daily.service` was running during Monday's crash — may be contributing to memory pressure
|
||||
|
||||
## Changes Made
|
||||
|
||||
### MySQL Memory Cap & Tuning (compose.yml)
|
||||
Added to the `mysql` service in `/opt/docker/wordpress/compose.yml`:
|
||||
|
||||
```yaml
|
||||
mysql:
|
||||
image: mysql:8.0
|
||||
container_name: wordpress_mysql
|
||||
restart: unless-stopped
|
||||
deploy:
|
||||
resources:
|
||||
limits:
|
||||
memory: 768M
|
||||
reservations:
|
||||
memory: 256M
|
||||
environment:
|
||||
MYSQL_DATABASE: wordpress
|
||||
MYSQL_USER: wordpress
|
||||
MYSQL_PASSWORD: ${MYSQL_PASSWORD}
|
||||
MYSQL_ROOT_PASSWORD: ${MYSQL_ROOT_PASSWORD}
|
||||
volumes:
|
||||
- mysql_data:/var/lib/mysql
|
||||
networks:
|
||||
- internal
|
||||
command: >-
|
||||
--default-authentication-plugin=mysql_native_password
|
||||
--innodb-buffer-pool-size=256M
|
||||
--innodb-log-buffer-size=16M
|
||||
--max-connections=50
|
||||
--key-buffer-size=16M
|
||||
--tmp-table-size=32M
|
||||
--max-heap-table-size=32M
|
||||
--table-open-cache=256
|
||||
--performance-schema=OFF
|
||||
```
|
||||
|
||||
**What each setting does:**
|
||||
- `limits: memory: 768M` — Docker kills MySQL if it exceeds 768MB (controlled restart vs kernel OOM)
|
||||
- `reservations: memory: 256M` — Guarantees MySQL gets at least 256MB
|
||||
- `innodb-buffer-pool-size=256M` — Caps InnoDB cache (MySQL's biggest memory consumer)
|
||||
- `max-connections=50` — Reduced from default 151 (less memory per connection)
|
||||
- `performance-schema=OFF` — Saves ~200-400MB (internal MySQL monitoring not needed)
|
||||
|
||||
**Result:**
|
||||
| Metric | Before | After |
|
||||
|--------|--------|-------|
|
||||
| MySQL memory usage | 474MB (uncapped, spiked to 1.8GB) | 225MB (capped at 768MB) |
|
||||
| MySQL % of cap | N/A | 29% |
|
||||
| Total stack memory | ~2.05GB | ~2.0GB |
|
||||
|
||||
### Memory Monitoring Script
|
||||
Created `/usr/local/bin/docker-mem-log.sh` — logs per-container memory usage every 5 minutes via cron:
|
||||
|
||||
```bash
|
||||
#!/bin/bash
|
||||
LOG_FILE="/var/log/pbs-monitoring/container-memory.log"
|
||||
echo "$(date -u '+%Y-%m-%d %H:%M:%S UTC') | $(docker stats --no-stream --format '{{.Name}}:{{.MemUsage}}' | tr '\n' ' ')" >> "$LOG_FILE"
|
||||
```
|
||||
|
||||
Cron entry at `/etc/cron.d/docker-mem-monitor`:
|
||||
```
|
||||
*/5 * * * * root /usr/local/bin/docker-mem-log.sh
|
||||
```
|
||||
|
||||
**Check logs with:** `tail -20 /var/log/pbs-monitoring/container-memory.log`
|
||||
|
||||
### Journal Persistence Confirmed
|
||||
- `/var/log/journal` exists and is retaining logs across reboots
|
||||
- `journalctl --list-boots` shows 5 boot sessions dating back to May 2025
|
||||
- OOM kill evidence was successfully retrieved from previous boots
|
||||
|
||||
## Current Server Memory Snapshot (post-fix)
|
||||
| Container | Memory | % of Limit |
|
||||
|-----------|--------|------------|
|
||||
| wordpress | 1.11 GB | 29% (of system) |
|
||||
| wordpress_mysql | 225 MB | 29% (of 768MB cap) |
|
||||
| n8n | 200 MB | 5% |
|
||||
| uptime-kuma | 100 MB | 3% |
|
||||
| traefik | 37 MB | 1% |
|
||||
| pbs-api | 28 MB | 1% |
|
||||
| redis | 13 MB | 2% (of 640MB cap) |
|
||||
| wpcron | 8 MB | <1% |
|
||||
|
||||
## Still Open
|
||||
|
||||
- [ ] Monitor overnight stability — check memory logs tomorrow AM
|
||||
- [ ] Add log rotation for `/var/log/pbs-monitoring/container-memory.log`
|
||||
- [ ] Investigate `apt-daily.service` — consider disabling automatic apt updates
|
||||
- [ ] Server sizing discussion: 4GB may be tight for adding Gitea + Authelia
|
||||
- [ ] Determine if Wordfence free-tier scan is contributing to memory pressure
|
||||
- [ ] Consider setting server timezone to Eastern for easier log reading
|
||||
- [ ] Investigate root cause of MySQL memory bloat (WooCommerce correlation still strong)
|
||||
|
||||
## Key Learnings
|
||||
|
||||
- **htop shows threads, not processes** — press `H` to toggle thread visibility; one MySQL process can show as dozens of rows
|
||||
- **systemd journal persists across reboots** if `/var/log/journal` exists and `Storage=auto` or `Storage=persistent` is set
|
||||
- **`journalctl -b -1`** shows previous boot logs; use `--since`/`--until` for large time ranges to avoid hanging
|
||||
- **`performance-schema=OFF`** in MySQL saves ~200-400MB with no downside for production WordPress
|
||||
- **Docker `deploy.resources.limits.memory`** provides a controlled cap — Docker restarts the container instead of the kernel OOM-killing it and cascading
|
||||
- **Server timezone is UTC** — subtract 4 hours for Eastern time when reading logs
|
||||
Loading…
Reference in New Issue
Block a user