herbygitea/pbs-obsidian-vault

Fork 0

herbygitea dc34f927fb Create server-stability-mysql-oom.md via n8n

2026-03-24 19:15:02 +00:00

5.9 KiB

Raw Blame History

project

type

status

Server Stability - MySQL OOM Fix & Memory Monitoring

Summary

Two server crashes in 48 hours (Saturday March 22 ~6AM ET, Monday March 23 ~6:20AM ET) traced to MySQL being OOM-killed by the Linux kernel. Root cause: MySQL had no memory limits and was consuming ~1.8GB before the OOM killer intervened, triggering a cascading failure that made the server completely unresponsive.

Investigation Findings

OOM Kill Evidence (from systemd journal)

Saturday crash: Out of memory: Killed process 4138817 (mysqld) total-vm:1841380kB
Monday crash: Out of memory: Killed process 13015 (mysqld) total-vm:1828060kB
Both crashes followed the same pattern: MySQL OOM-killed → Docker restarts MySQL → system still memory-starved → swapoff killed → complete server lockup → manual Linode reboot required

Crash Timeline

Both crashes occurred around 6:00-6:20 AM Eastern (10:00-10:20 UTC — server runs in UTC)
WooCommerce installed Saturday — first crash Saturday night, second Monday morning
WooCommerce Action Scheduler showed no failed/stuck tasks — likely not the direct trigger
Wordfence scan logs showed a ~1 minute scan on March 19 at ~10PM ET — does not align with crash window
Wordfence scan scheduling is automatic on free tier (no manual schedule control)

Ruled Out

WooCommerce Action Scheduler runaway tasks (all showed completed status)
Wordfence scan timing (didn't align with crash window)
Multiple MySQL instances (htop showed threads, not separate processes — press H in htop to toggle thread view)

Not Yet Determined

Exact trigger causing MySQL to balloon to 1.8GB overnight
Whether WooCommerce's added baseline DB load is the tipping point
apt-daily.service was running during Monday's crash — may be contributing to memory pressure

Changes Made

MySQL Memory Cap & Tuning (compose.yml)

Added to the mysql service in /opt/docker/wordpress/compose.yml:

  mysql:
    image: mysql:8.0
    container_name: wordpress_mysql
    restart: unless-stopped
    deploy:
      resources:
        limits:
          memory: 768M
        reservations:
          memory: 256M
    environment:
      MYSQL_DATABASE: wordpress
      MYSQL_USER: wordpress
      MYSQL_PASSWORD: ${MYSQL_PASSWORD}
      MYSQL_ROOT_PASSWORD: ${MYSQL_ROOT_PASSWORD}
    volumes:
      - mysql_data:/var/lib/mysql
    networks:
      - internal
    command: >-
      --default-authentication-plugin=mysql_native_password
      --innodb-buffer-pool-size=256M
      --innodb-log-buffer-size=16M
      --max-connections=50
      --key-buffer-size=16M
      --tmp-table-size=32M
      --max-heap-table-size=32M
      --table-open-cache=256
      --performance-schema=OFF

What each setting does:

limits: memory: 768M — Docker kills MySQL if it exceeds 768MB (controlled restart vs kernel OOM)
reservations: memory: 256M — Guarantees MySQL gets at least 256MB
innodb-buffer-pool-size=256M — Caps InnoDB cache (MySQL's biggest memory consumer)
max-connections=50 — Reduced from default 151 (less memory per connection)
performance-schema=OFF — Saves ~200-400MB (internal MySQL monitoring not needed)

Result:

Metric	Before	After
MySQL memory usage	474MB (uncapped, spiked to 1.8GB)	225MB (capped at 768MB)
MySQL % of cap	N/A	29%
Total stack memory	~2.05GB	~2.0GB

Memory Monitoring Script

Created /usr/local/bin/docker-mem-log.sh — logs per-container memory usage every 5 minutes via cron:

#!/bin/bash
LOG_FILE="/var/log/pbs-monitoring/container-memory.log"
echo "$(date -u '+%Y-%m-%d %H:%M:%S UTC') | $(docker stats --no-stream --format '{{.Name}}:{{.MemUsage}}' | tr '\n' ' ')" >> "$LOG_FILE"

Cron entry at /etc/cron.d/docker-mem-monitor:

*/5 * * * * root /usr/local/bin/docker-mem-log.sh

Check logs with: tail -20 /var/log/pbs-monitoring/container-memory.log

Journal Persistence Confirmed

/var/log/journal exists and is retaining logs across reboots
journalctl --list-boots shows 5 boot sessions dating back to May 2025
OOM kill evidence was successfully retrieved from previous boots

Current Server Memory Snapshot (post-fix)

Container	Memory	% of Limit
wordpress	1.11 GB	29% (of system)
wordpress_mysql	225 MB	29% (of 768MB cap)
n8n	200 MB	5%
uptime-kuma	100 MB	3%
traefik	37 MB	1%
pbs-api	28 MB	1%
redis	13 MB	2% (of 640MB cap)
wpcron	8 MB	<1%

Still Open

Monitor overnight stability — check memory logs tomorrow AM
Add log rotation for /var/log/pbs-monitoring/container-memory.log
Investigate apt-daily.service — consider disabling automatic apt updates
Server sizing discussion: 4GB may be tight for adding Gitea + Authelia
Determine if Wordfence free-tier scan is contributing to memory pressure
Consider setting server timezone to Eastern for easier log reading
Investigate root cause of MySQL memory bloat (WooCommerce correlation still strong)

Key Learnings

htop shows threads, not processes — press H to toggle thread visibility; one MySQL process can show as dozens of rows
systemd journal persists across reboots if /var/log/journal exists and Storage=auto or Storage=persistent is set
journalctl -b -1 shows previous boot logs; use --since/--until for large time ranges to avoid hanging
performance-schema=OFF in MySQL saves ~200-400MB with no downside for production WordPress
Docker deploy.resources.limits.memory provides a controlled cap — Docker restarts the container instead of the kernel OOM-killing it and cascading
Server timezone is UTC — subtract 4 hours for Eastern time when reading logs

5.9 KiB Raw Blame History