Create pbs-seo-automation.md via n8n

2026-03-24 19:33:21 +00:00 · 2026-03-24 19:33:21 +00:00 · 98a30ea354
commit 98a30ea354
parent 71cfea51db
1 changed files with 507 additions and 0 deletions
--- a/PBS/Tech/Projects/pbs-seo-automation.md
+++ b/PBS/Tech/Projects/pbs-seo-automation.md
@ -0,0 +1,507 @@
+---
+project: pbs-seo-automation
+type: project-plan
+status: active
+tags:
+  - pbs
+  - seo
+  - python
+  - automation
+  - n8n
+  - wordpress
+  - yoast
+  - streamlit
+  - analytics
+  - google-search-console
+created: 2026-03-23
+updated: 2026-03-23
+path: PBS/Tech/Projects/
+---
+
+# PBS SEO Automation Pipeline
+
+## Project Goal
+Build a self-hosted SEO automation pipeline that replaces manual copy-paste
+SEO workflows with an automated system: collecting search performance data,
+tracking rankings over time, researching keywords, generating optimized
+meta content via Claude API, and pushing it back to WordPress — all
+orchestrated through n8n and visualized in Streamlit.
+
+## Why This Matters
+PBS is already getting organic search traffic, which means SEO is working
+to some degree. But optimizing it is currently a manual, disjointed process
+— Yoast shows fields but doesn't help fill them intelligently, and there's
+no automation connecting keyword research to content optimization. This
+project turns SEO from a chore into a data-driven pipeline that works
+across all PBS content types.
+
+## Content Types Covered
+- **Recipes** (live) — highest SEO value, drives organic discovery
+- **Blog/editorial** (live) — builds authority, targets informational
+queries
+- **Cookbook landing pages** (future) — transactional/promotional SEO
+- **Merch pages** (future) — Product schema, transactional keywords
+- **Membership/classes** (future) — funnel-driven, conversion-focused
+
+The pipeline is designed to handle all content types from day one, even if
+only recipes and blog posts exist today.
+
+---
+
+## Architecture Overview
+
+```
+Google Search Console API (free)
+        |
+   Python Collector (PyCharm + UV)
+        |
+   SQLite Database
+        |
+   ┌────────────┴────────────┐
+   │                         │
+  n8n                    Streamlit
+(orchestration)         (dashboard)
+   │
+   ├─ Claude API (generate meta titles/descriptions)
+   ├─ WordPress REST API (push meta back to Yoast)
+   └─ Google Chat (alerts & digests)
+```
+
+### Shared Infrastructure with YouTube Analytics Project
+- Same Streamlit instance (separate pages/tabs)
+- Same n8n server for orchestration
+- Separate SQLite database (keeps projects independent)
+- Same Traefik reverse proxy for dashboard access
+- Same Google Cloud project for API credentials
+
+---
+
+## Phase 1: Google Search Console Setup
+**Estimated Time:** 1-2 hours
+**Goal:** Connect Search Console to PBS site and verify API access
+
+### Tasks
+- [ ] Verify Google Search Console is connected for plantbasedsoutherner.com
+  - If yes: confirm data is flowing, check how far back data goes
+  - If no: add property, verify via DNS (Cloudflare), wait for data
+collection to begin
+- [ ] Enable Google Search Console API in Google Cloud project
+  - Can reuse the same project created for YouTube Analytics
+- [ ] Create service account OR extend existing OAuth credentials with
+scope:
+  `https://www.googleapis.com/auth/webmasters.readonly`
+- [ ] Test API access — pull a sample query report to confirm data flows
+
+### Key Details
+- Search Console retains 16 months of historical data
+- Data is typically delayed 2-3 days
+- API uses `google-api-python-client` (same library as YouTube project)
+- Service account auth is simpler for automated/server-side collection (no
+browser needed)
+
+### Deliverable
+Working API access to PBS search performance data
+
+---
+
+## Phase 2: Search Data Collector
+**Estimated Time:** 3-4 hours
+**Goal:** Python script that pulls search performance data into SQLite
+**Tools:** PyCharm Professional, UV package manager
+
+### Tasks
+- [ ] Initialize project with UV (`uv init pbs-seo-analytics`)
+- [ ] Install dependencies: `google-api-python-client`, `google-auth`
+- [ ] Build auth module (service account preferred for server-side)
+- [ ] Build search query collector (queries, impressions, clicks, CTR,
+position by page)
+- [ ] Build page performance collector (aggregate metrics per URL)
+- [ ] Build device/country breakdown collector
+- [ ] Design and create SQLite schema
+- [ ] Implement data ingestion with upsert logic (idempotent runs)
+- [ ] Add CLI interface for manual runs and backfill (up to 16 months)
+- [ ] Initial backfill of all available historical data
+
+### SQLite Schema (Initial Design)
+
+```sql
+-- Pages tracked on the site
+CREATE TABLE pages (
+    url TEXT PRIMARY KEY,
+    page_type TEXT CHECK(page_type IN ('recipe', 'blog', 'merch',
+'cookbook', 'membership', 'landing', 'other')),
+    title TEXT,
+    first_seen TEXT,
+    created_at TEXT DEFAULT CURRENT_TIMESTAMP,
+    updated_at TEXT DEFAULT CURRENT_TIMESTAMP
+);
+
+-- Daily search performance per query per page
+CREATE TABLE search_queries (
+    date TEXT NOT NULL,
+    query TEXT NOT NULL,
+    page_url TEXT NOT NULL,
+    clicks INTEGER DEFAULT 0,
+    impressions INTEGER DEFAULT 0,
+    ctr REAL DEFAULT 0,
+    avg_position REAL DEFAULT 0,
+    device TEXT DEFAULT 'all',
+    country TEXT DEFAULT 'all',
+    created_at TEXT DEFAULT CURRENT_TIMESTAMP,
+    PRIMARY KEY (date, query, page_url, device, country)
+);
+
+-- Daily aggregate performance per page
+CREATE TABLE page_daily_metrics (
+    date TEXT NOT NULL,
+    page_url TEXT NOT NULL,
+    total_clicks INTEGER DEFAULT 0,
+    total_impressions INTEGER DEFAULT 0,
+    avg_ctr REAL DEFAULT 0,
+    avg_position REAL DEFAULT 0,
+    created_at TEXT DEFAULT CURRENT_TIMESTAMP,
+    PRIMARY KEY (date, page_url)
+);
+
+-- Keyword tracking: queries we want to monitor over time
+CREATE TABLE tracked_keywords (
+    keyword TEXT PRIMARY KEY,
+    category TEXT,
+    target_page_url TEXT,
+    added_at TEXT DEFAULT CURRENT_TIMESTAMP,
+    notes TEXT
+);
+
+-- Snapshot of rank position for tracked keywords
+CREATE TABLE keyword_rank_history (
+    keyword TEXT NOT NULL,
+    date TEXT NOT NULL,
+    avg_position REAL,
+    impressions INTEGER DEFAULT 0,
+    clicks INTEGER DEFAULT 0,
+    best_page_url TEXT,
+    created_at TEXT DEFAULT CURRENT_TIMESTAMP,
+    PRIMARY KEY (keyword, date),
+    FOREIGN KEY (keyword) REFERENCES tracked_keywords(keyword)
+);
+
+-- SEO meta content generated and applied
+CREATE TABLE seo_meta_log (
+    page_url TEXT NOT NULL,
+    generated_at TEXT NOT NULL,
+    meta_title TEXT,
+    meta_description TEXT,
+    focus_keyword TEXT,
+    model_used TEXT DEFAULT 'claude-sonnet',
+    pushed_to_wordpress INTEGER DEFAULT 0,
+    pushed_at TEXT,
+    PRIMARY KEY (page_url, generated_at)
+);
+
+-- Site-level daily summary
+CREATE TABLE site_daily_metrics (
+    date TEXT PRIMARY KEY,
+    total_clicks INTEGER DEFAULT 0,
+    total_impressions INTEGER DEFAULT 0,
+    avg_ctr REAL DEFAULT 0,
+    avg_position REAL DEFAULT 0,
+    unique_queries INTEGER DEFAULT 0,
+    created_at TEXT DEFAULT CURRENT_TIMESTAMP
+);
+```
+
+### What the Data Tells You
+Google Search Console API returns four core metrics per query/page
+combination:
+- **Clicks** — how many times someone clicked through to your site
+- **Impressions** — how many times your page appeared in search results
+- **CTR (Click-Through Rate)** — clicks / impressions
+- **Average Position** — where you ranked (1 = top of page 1)
+
+You can slice these by: date, query, page, device (mobile/desktop/tablet),
+country, and search type (web/image/video).
+
+### Deliverable
+Python CLI tool that backfills and incrementally collects PBS search data
+into SQLite
+
+---
+
+## Phase 3: n8n Orchestration + LLM Meta Generation
+**Estimated Time:** 4-5 hours
+**Goal:** Automate data collection, generate SEO meta content with Claude,
+push to WordPress
+
+### Tasks
+- [ ] Create n8n workflow: daily scheduled trigger → Execute Command →
+Python collector
+- [ ] Build Claude API integration for meta generation:
+  - Input: page content, current keywords ranking, content type
+  - Output: optimized meta title, meta description, focus keyword
+  - System prompt tuned for PBS brand voice (whole food plant based,
+southern, warm, NOT "vegan")
+- [ ] Build WordPress REST API integration to push meta back to Yoast
+fields:
+  - `_yoast_wpseo_title` (meta title)
+  - `_yoast_wpseo_metadesc` (meta description)
+  - `_yoast_wpseo_focuskw` (focus keyword)
+- [ ] Add WPCode snippet to expose Yoast fields via WordPress REST API
+(required for write access)
+- [ ] Create approval workflow: generate meta → notify Travis/Jenny via
+Google Chat → approve/reject → push to WordPress
+- [ ] Create weekly SEO digest alert for Google Chat
+- [ ] Error handling and failure notifications
+
+### LLM Meta Generation Flow
+```
+n8n detects new/updated post in WordPress
+        │
+  Fetch page content + current search queries ranking for that URL
+        │
+  Send to Claude API with SEO-optimized system prompt
+        │
+  Claude generates: meta title, meta description, focus keyword
+        │
+  Store in seo_meta_log table
+        │
+  Send to Google Chat for approval
+        │
+  On approval: push to WordPress via REST API
+```
+
+### WordPress Integration Detail
+Yoast's REST API is read-only by default. To write meta fields, we need a
+small WPCode snippet that registers Yoast fields on the WordPress REST API.
+This is a lightweight approach — about 20 lines of PHP via WPCode Lite
+(already installed), no additional plugins needed.
+
+Alternatively, n8n can update post meta directly via the WordPress API
+using the `meta` field in a PUT request to `/wp-json/wp/v2/posts/`.
+
+### Alert Ideas
+- **New content alert:** "Jenny published a new recipe. Claude generated
+meta — approve?"
+- **Weekly digest:** Top gaining keywords, biggest position changes, pages
+needing optimization
+- **Opportunity alert:** "You're ranking #11 for 'plant based collard
+greens' — small push could hit page 1"
+- **Cannibalization alert:** Multiple PBS pages competing for the same
+keyword
+
+### Deliverable
+Fully automated pipeline: collect → analyze → generate → approve → publish
+SEO meta
+
+---
+
+## Phase 4: Streamlit SEO Dashboard
+**Estimated Time:** 4-6 hours
+**Goal:** Visual SEO analytics dashboard integrated alongside YouTube
+analytics
+
+### Tasks
+- [ ] Add SEO pages to existing Streamlit app (or create separate app)
+- [ ] Build search performance overview (clicks, impressions, CTR trends)
+- [ ] Build keyword rank tracker (position changes over time)
+- [ ] Build page-level deep dive (which queries drive traffic to each page)
+- [ ] Build content gap analysis view (queries with high impressions but
+low CTR)
+- [ ] Build content type comparison (recipe SEO vs blog SEO performance)
+- [ ] Build "opportunities" view (keywords close to page 1, quick wins)
+- [ ] Build meta generation log view (what Claude generated, what was
+approved)
+
+### Dashboard Pages (Initial Concept)
+1. **Search Overview** — total clicks/impressions/CTR trend, top queries,
+top pages
+2. **Keyword Tracker** — track specific keywords over time, position change
+alerts
+3. **Page Deep Dive** — select a page, see all queries driving traffic,
+position trends
+4. **Content Gaps** — high impression / low click pages (title/description
+need work)
+5. **Opportunities** — keywords ranking positions 8-20 (striking distance
+of page 1)
+6. **Content Type Breakdown** — SEO performance by content type (recipe vs
+blog vs merch)
+7. **Meta Generation Log** — what Claude generated, approval status,
+before/after
+
+### Deliverable
+Live SEO dashboard with actionable insights for content strategy
+
+---
+
+## Phase 5: Competitor Intelligence (Open — Free vs Paid)
+**Estimated Time:** TBD based on approach
+**Goal:** Understand competitive landscape and find content opportunities
+
+### Option A: DIY / Free Approach
+- **Manual competitor research:** Periodically Google target keywords and
+note who ranks
+- **Python scraping:** Build a lightweight rank checker that searches
+Google for target keywords and records positions (note: Google may
+rate-limit or block; use responsibly)
+- **Free tools:** Google Trends API for search interest over time,
+AnswerThePublic for question-based keyword ideas
+- **Search Console mining:** Analyze existing query data to find patterns
+and gaps — you'd be surprised how much insight is already in your own data
+- **Cost:** $0
+- **Limitation:** No competitor backlink data, no domain authority scores,
+limited keyword volume estimates
+
+### Option B: Budget Paid Tools (~$50-75/month)
+- **SERPApi or DataForSEO:** Programmatic access to Google search results
+  - Track competitor rankings for your target keywords
+  - Get search volume estimates
+  - API-friendly, integrates cleanly with Python pipeline
+- **Best for:** Automated daily rank tracking beyond what Search Console
+provides
+- **Cost:** ~$50-75/month depending on query volume
+
+### Option C: Full SEO Platform (~$99-200+/month)
+- **Ahrefs, SEMrush, or Moz:** Comprehensive SEO intelligence
+  - Competitor keyword analysis (what they rank for that you don't)
+  - Backlink profiles and domain authority
+  - Content gap analysis at scale
+  - Keyword difficulty scores
+- **Best for:** When you've outgrown Search Console data and need
+competitive intelligence
+- **Cost:** $99-200+/month
+
+### Recommendation
+Start with Option A (free). Build the pipeline around Google Search Console
+data first. After 1-2 months of collecting data, evaluate what questions
+you can't answer with free data alone. That will tell you whether Option B
+or C is worth the investment. Many sites PBS's size never need to go past
+Option A.
+
+### Deliverable
+Decision on competitive intelligence approach based on data from earlier
+phases
+
+---
+
+## Phase 6: Advanced SEO Automation & Iteration
+**Estimated Time:** Ongoing
+**Goal:** Deepen automation and cross-platform insights
+
+### Future Ideas
+- [ ] Auto-detect new WordPress posts and trigger SEO meta generation
+without manual intervention
+- [ ] Cross-reference YouTube retention data with recipe page SEO
+performance (which videos drive search traffic?)
+- [ ] Automated internal linking suggestions (connect related recipes/blog
+posts)
+- [ ] Schema markup validation and monitoring (ensure WPRM recipe schema
+stays healthy)
+- [ ] Page speed monitoring integration (Core Web Vitals affect rankings)
+- [ ] Seasonal keyword planning (predict trending search terms by season
+for recipe content)
+- [ ] A/B test meta titles: generate two versions, measure CTR difference
+- [ ] Content calendar integration: use keyword gaps to suggest what Jenny
+should create next
+- [ ] Extend to merch, cookbook, and membership pages as they launch
+
+---
+
+## Prerequisites & Dependencies
+
+| Requirement | Status | Notes |
+|---|---|---|
+| Google Search Console verified | Needs check | May already be connected
+via Workspace |
+| Google Cloud project | Shared | Same project as YouTube Analytics |
+| Search Console API enabled | Needed | Free, quota-based |
+| OAuth/Service Account credentials | Needed | Can extend existing YouTube
+creds |
+| Python + UV | Ready | Travis's local dev setup |
+| Anthropic API key | Needed | For Claude meta generation |
+| WPCode Lite (WordPress) | Ready | Already installed — needed for REST API
+Yoast fields |
+| n8n | Ready | Already running on Linode |
+| Streamlit | Shared | Same instance as YouTube dashboard |
+
+---
+
+## API Quotas & Costs
+
+| Service | Quota/Cost | Notes |
+|---|---|---|
+| Google Search Console API | 2000 queries/day (free) | More than enough
+for PBS |
+| Claude API (Sonnet) | ~$0.003 per meta generation | Pennies per recipe |
+| WordPress REST API | Unlimited (self-hosted) | No external cost |
+| Google Chat webhooks | Unlimited (free) | Already configured for n8n |
+
+---
+
+## Key Decisions
+
+| Decision | Choice | Rationale |
+|---|---|---|
+| Data source | Google Search Console (free) | Actual Google data, not
+estimates. 16 months history. Sufficient for PBS scale. |
+| Competitor intelligence | Deferred (Phase 5) | Start free, evaluate need
+after collecting own data. |
+| LLM for meta generation | Claude API (Anthropic) | Consistent with PBS
+brand, excellent at structured content, cost-effective. |
+| Meta push to WordPress | REST API via WPCode snippet | Lightweight, no
+extra plugins, uses existing WPCode Lite install. |
+| Dashboard | Streamlit (shared with YouTube) | Single analytics platform
+for all PBS data. |
+| Approval workflow | Google Chat notification | Keeps human in the loop
+before meta goes live. Jenny/Travis approve. |
+
+---
+
+## Sequencing & Priority
+1. **Phase 1** (Search Console Setup) → unblocks data collection
+2. **Phase 2** (Data Collector) → starts building historical dataset,
+enables analysis
+3. **Phase 3** (n8n + LLM Meta Generation) → the automation sweet spot — no
+more copy-paste
+4. **Phase 4** (Streamlit Dashboard) → visualize what's working, find
+opportunities
+5. **Phase 5** (Competitor Intelligence) → evaluate free vs paid based on
+real needs
+6. **Phase 6** (Advanced) → cross-platform insights, deeper automation
+
+---
+
+## Relationship to Other PBS Projects
+- **YouTube Analytics Pipeline:** Shared Streamlit dashboard, shared Google
+Cloud project, parallel development
+- **PBS Content Hub (Phase 5):** SEO dashboard could become a Content Hub
+tab
+- **Instagram Automation:** Cross-platform content performance analysis
+(search + social)
+- **WordPress-to-MySQL sync:** Trigger SEO meta generation when new recipes
+are synced
+- **Authelia SSO:** Will protect Streamlit dashboard access
+- **Yoast SEO plugin:** Stays installed for technical plumbing (sitemaps,
+canonical URLs, Open Graph) — but meta content is now generated and pushed
+by the pipeline, not manually entered
+
+---
+
+## Note on Yoast
+Yoast stays installed but its role changes. It continues handling:
+- XML sitemap generation
+- Canonical URL management
+- Open Graph / social sharing meta tags
+- Basic schema markup (supplementing WPRM's recipe schema)
+
+What it NO LONGER does:
+- You stop manually filling in meta titles/descriptions (the pipeline does
+this)
+- You ignore the content scoring stoplight (Claude's output is smarter than
+Yoast's rules)
+- Focus keywords are set by data-driven keyword research, not gut feeling
+
+Yoast becomes invisible plumbing. The pipeline becomes the brain.
+
+---
+
+*Next Step: Phase 1 — Check if Google Search Console is connected for
+plantbasedsoutherner.com*