pbs-obsidian-vault/PBS/Tech/Projects/pbs-seo-automation.md

18 KiB

project type status tags created updated path
pbs-seo-automation project-plan active
pbs
seo
python
automation
n8n
wordpress
yoast
streamlit
analytics
google-search-console
2026-03-23 2026-03-23 PBS/Tech/Projects/

PBS SEO Automation Pipeline

Project Goal

Build a self-hosted SEO automation pipeline that replaces manual copy-paste SEO workflows with an automated system: collecting search performance data, tracking rankings over time, researching keywords, generating optimized meta content via Claude API, and pushing it back to WordPress — all orchestrated through n8n and visualized in Streamlit.

Why This Matters

PBS is already getting organic search traffic, which means SEO is working to some degree. But optimizing it is currently a manual, disjointed process — Yoast shows fields but doesn't help fill them intelligently, and there's no automation connecting keyword research to content optimization. This project turns SEO from a chore into a data-driven pipeline that works across all PBS content types.

Content Types Covered

  • Recipes (live) — highest SEO value, drives organic discovery
  • Blog/editorial (live) — builds authority, targets informational queries
  • Cookbook landing pages (future) — transactional/promotional SEO
  • Merch pages (future) — Product schema, transactional keywords
  • Membership/classes (future) — funnel-driven, conversion-focused

The pipeline is designed to handle all content types from day one, even if only recipes and blog posts exist today.


Architecture Overview

Google Search Console API (free)
        |
   Python Collector (PyCharm + UV)
        |
   SQLite Database
        |
   ┌────────────┴────────────┐
   │                         │
  n8n                    Streamlit
(orchestration)         (dashboard)
   │
   ├─ Claude API (generate meta titles/descriptions)
   ├─ WordPress REST API (push meta back to Yoast)
   └─ Google Chat (alerts & digests)

Shared Infrastructure with YouTube Analytics Project

  • Same Streamlit instance (separate pages/tabs)
  • Same n8n server for orchestration
  • Separate SQLite database (keeps projects independent)
  • Same Traefik reverse proxy for dashboard access
  • Same Google Cloud project for API credentials

Phase 1: Google Search Console Setup

Estimated Time: 1-2 hours Goal: Connect Search Console to PBS site and verify API access

Tasks

  • Verify Google Search Console is connected for plantbasedsoutherner.com
    • If yes: confirm data is flowing, check how far back data goes
    • If no: add property, verify via DNS (Cloudflare), wait for data collection to begin
  • Enable Google Search Console API in Google Cloud project
    • Can reuse the same project created for YouTube Analytics
  • Create service account OR extend existing OAuth credentials with scope: https://www.googleapis.com/auth/webmasters.readonly
  • Test API access — pull a sample query report to confirm data flows

Key Details

  • Search Console retains 16 months of historical data
  • Data is typically delayed 2-3 days
  • API uses google-api-python-client (same library as YouTube project)
  • Service account auth is simpler for automated/server-side collection (no browser needed)

Deliverable

Working API access to PBS search performance data


Phase 2: Search Data Collector

Estimated Time: 3-4 hours Goal: Python script that pulls search performance data into SQLite Tools: PyCharm Professional, UV package manager

Tasks

  • Initialize project with UV (uv init pbs-seo-analytics)
  • Install dependencies: google-api-python-client, google-auth
  • Build auth module (service account preferred for server-side)
  • Build search query collector (queries, impressions, clicks, CTR, position by page)
  • Build page performance collector (aggregate metrics per URL)
  • Build device/country breakdown collector
  • Design and create SQLite schema
  • Implement data ingestion with upsert logic (idempotent runs)
  • Add CLI interface for manual runs and backfill (up to 16 months)
  • Initial backfill of all available historical data

SQLite Schema (Initial Design)

-- Pages tracked on the site
CREATE TABLE pages (
    url TEXT PRIMARY KEY,
    page_type TEXT CHECK(page_type IN ('recipe', 'blog', 'merch',
'cookbook', 'membership', 'landing', 'other')),
    title TEXT,
    first_seen TEXT,
    created_at TEXT DEFAULT CURRENT_TIMESTAMP,
    updated_at TEXT DEFAULT CURRENT_TIMESTAMP
);

-- Daily search performance per query per page
CREATE TABLE search_queries (
    date TEXT NOT NULL,
    query TEXT NOT NULL,
    page_url TEXT NOT NULL,
    clicks INTEGER DEFAULT 0,
    impressions INTEGER DEFAULT 0,
    ctr REAL DEFAULT 0,
    avg_position REAL DEFAULT 0,
    device TEXT DEFAULT 'all',
    country TEXT DEFAULT 'all',
    created_at TEXT DEFAULT CURRENT_TIMESTAMP,
    PRIMARY KEY (date, query, page_url, device, country)
);

-- Daily aggregate performance per page
CREATE TABLE page_daily_metrics (
    date TEXT NOT NULL,
    page_url TEXT NOT NULL,
    total_clicks INTEGER DEFAULT 0,
    total_impressions INTEGER DEFAULT 0,
    avg_ctr REAL DEFAULT 0,
    avg_position REAL DEFAULT 0,
    created_at TEXT DEFAULT CURRENT_TIMESTAMP,
    PRIMARY KEY (date, page_url)
);

-- Keyword tracking: queries we want to monitor over time
CREATE TABLE tracked_keywords (
    keyword TEXT PRIMARY KEY,
    category TEXT,
    target_page_url TEXT,
    added_at TEXT DEFAULT CURRENT_TIMESTAMP,
    notes TEXT
);

-- Snapshot of rank position for tracked keywords
CREATE TABLE keyword_rank_history (
    keyword TEXT NOT NULL,
    date TEXT NOT NULL,
    avg_position REAL,
    impressions INTEGER DEFAULT 0,
    clicks INTEGER DEFAULT 0,
    best_page_url TEXT,
    created_at TEXT DEFAULT CURRENT_TIMESTAMP,
    PRIMARY KEY (keyword, date),
    FOREIGN KEY (keyword) REFERENCES tracked_keywords(keyword)
);

-- SEO meta content generated and applied
CREATE TABLE seo_meta_log (
    page_url TEXT NOT NULL,
    generated_at TEXT NOT NULL,
    meta_title TEXT,
    meta_description TEXT,
    focus_keyword TEXT,
    model_used TEXT DEFAULT 'claude-sonnet',
    pushed_to_wordpress INTEGER DEFAULT 0,
    pushed_at TEXT,
    PRIMARY KEY (page_url, generated_at)
);

-- Site-level daily summary
CREATE TABLE site_daily_metrics (
    date TEXT PRIMARY KEY,
    total_clicks INTEGER DEFAULT 0,
    total_impressions INTEGER DEFAULT 0,
    avg_ctr REAL DEFAULT 0,
    avg_position REAL DEFAULT 0,
    unique_queries INTEGER DEFAULT 0,
    created_at TEXT DEFAULT CURRENT_TIMESTAMP
);

What the Data Tells You

Google Search Console API returns four core metrics per query/page combination:

  • Clicks — how many times someone clicked through to your site
  • Impressions — how many times your page appeared in search results
  • CTR (Click-Through Rate) — clicks / impressions
  • Average Position — where you ranked (1 = top of page 1)

You can slice these by: date, query, page, device (mobile/desktop/tablet), country, and search type (web/image/video).

Deliverable

Python CLI tool that backfills and incrementally collects PBS search data into SQLite


Phase 3: n8n Orchestration + LLM Meta Generation

Estimated Time: 4-5 hours Goal: Automate data collection, generate SEO meta content with Claude, push to WordPress

Tasks

  • Create n8n workflow: daily scheduled trigger → Execute Command → Python collector
  • Build Claude API integration for meta generation:
    • Input: page content, current keywords ranking, content type
    • Output: optimized meta title, meta description, focus keyword
    • System prompt tuned for PBS brand voice (whole food plant based, southern, warm, NOT "vegan")
  • Build WordPress REST API integration to push meta back to Yoast fields:
    • _yoast_wpseo_title (meta title)
    • _yoast_wpseo_metadesc (meta description)
    • _yoast_wpseo_focuskw (focus keyword)
  • Add WPCode snippet to expose Yoast fields via WordPress REST API (required for write access)
  • Create approval workflow: generate meta → notify Travis/Jenny via Google Chat → approve/reject → push to WordPress
  • Create weekly SEO digest alert for Google Chat
  • Error handling and failure notifications

LLM Meta Generation Flow

n8n detects new/updated post in WordPress
        │
  Fetch page content + current search queries ranking for that URL
        │
  Send to Claude API with SEO-optimized system prompt
        │
  Claude generates: meta title, meta description, focus keyword
        │
  Store in seo_meta_log table
        │
  Send to Google Chat for approval
        │
  On approval: push to WordPress via REST API

WordPress Integration Detail

Yoast's REST API is read-only by default. To write meta fields, we need a small WPCode snippet that registers Yoast fields on the WordPress REST API. This is a lightweight approach — about 20 lines of PHP via WPCode Lite (already installed), no additional plugins needed.

Alternatively, n8n can update post meta directly via the WordPress API using the meta field in a PUT request to /wp-json/wp/v2/posts/.

Alert Ideas

  • New content alert: "Jenny published a new recipe. Claude generated meta — approve?"
  • Weekly digest: Top gaining keywords, biggest position changes, pages needing optimization
  • Opportunity alert: "You're ranking #11 for 'plant based collard greens' — small push could hit page 1"
  • Cannibalization alert: Multiple PBS pages competing for the same keyword

Deliverable

Fully automated pipeline: collect → analyze → generate → approve → publish SEO meta


Phase 4: Streamlit SEO Dashboard

Estimated Time: 4-6 hours Goal: Visual SEO analytics dashboard integrated alongside YouTube analytics

Tasks

  • Add SEO pages to existing Streamlit app (or create separate app)
  • Build search performance overview (clicks, impressions, CTR trends)
  • Build keyword rank tracker (position changes over time)
  • Build page-level deep dive (which queries drive traffic to each page)
  • Build content gap analysis view (queries with high impressions but low CTR)
  • Build content type comparison (recipe SEO vs blog SEO performance)
  • Build "opportunities" view (keywords close to page 1, quick wins)
  • Build meta generation log view (what Claude generated, what was approved)

Dashboard Pages (Initial Concept)

  1. Search Overview — total clicks/impressions/CTR trend, top queries, top pages
  2. Keyword Tracker — track specific keywords over time, position change alerts
  3. Page Deep Dive — select a page, see all queries driving traffic, position trends
  4. Content Gaps — high impression / low click pages (title/description need work)
  5. Opportunities — keywords ranking positions 8-20 (striking distance of page 1)
  6. Content Type Breakdown — SEO performance by content type (recipe vs blog vs merch)
  7. Meta Generation Log — what Claude generated, approval status, before/after

Deliverable

Live SEO dashboard with actionable insights for content strategy


Phase 5: Competitor Intelligence (Open — Free vs Paid)

Estimated Time: TBD based on approach Goal: Understand competitive landscape and find content opportunities

Option A: DIY / Free Approach

  • Manual competitor research: Periodically Google target keywords and note who ranks
  • Python scraping: Build a lightweight rank checker that searches Google for target keywords and records positions (note: Google may rate-limit or block; use responsibly)
  • Free tools: Google Trends API for search interest over time, AnswerThePublic for question-based keyword ideas
  • Search Console mining: Analyze existing query data to find patterns and gaps — you'd be surprised how much insight is already in your own data
  • Cost: $0
  • Limitation: No competitor backlink data, no domain authority scores, limited keyword volume estimates

Option B: Budget Paid Tools (~$50-75/month)

  • SERPApi or DataForSEO: Programmatic access to Google search results
    • Track competitor rankings for your target keywords
    • Get search volume estimates
    • API-friendly, integrates cleanly with Python pipeline
  • Best for: Automated daily rank tracking beyond what Search Console provides
  • Cost: ~$50-75/month depending on query volume

Option C: Full SEO Platform (~$99-200+/month)

  • Ahrefs, SEMrush, or Moz: Comprehensive SEO intelligence
    • Competitor keyword analysis (what they rank for that you don't)
    • Backlink profiles and domain authority
    • Content gap analysis at scale
    • Keyword difficulty scores
  • Best for: When you've outgrown Search Console data and need competitive intelligence
  • Cost: $99-200+/month

Recommendation

Start with Option A (free). Build the pipeline around Google Search Console data first. After 1-2 months of collecting data, evaluate what questions you can't answer with free data alone. That will tell you whether Option B or C is worth the investment. Many sites PBS's size never need to go past Option A.

Deliverable

Decision on competitive intelligence approach based on data from earlier phases


Phase 6: Advanced SEO Automation & Iteration

Estimated Time: Ongoing Goal: Deepen automation and cross-platform insights

Future Ideas

  • Auto-detect new WordPress posts and trigger SEO meta generation without manual intervention
  • Cross-reference YouTube retention data with recipe page SEO performance (which videos drive search traffic?)
  • Automated internal linking suggestions (connect related recipes/blog posts)
  • Schema markup validation and monitoring (ensure WPRM recipe schema stays healthy)
  • Page speed monitoring integration (Core Web Vitals affect rankings)
  • Seasonal keyword planning (predict trending search terms by season for recipe content)
  • A/B test meta titles: generate two versions, measure CTR difference
  • Content calendar integration: use keyword gaps to suggest what Jenny should create next
  • Extend to merch, cookbook, and membership pages as they launch

Prerequisites & Dependencies

Requirement Status Notes
Google Search Console verified Needs check May already be connected
via Workspace
Google Cloud project Shared Same project as YouTube Analytics
Search Console API enabled Needed Free, quota-based
OAuth/Service Account credentials Needed Can extend existing YouTube
creds
Python + UV Ready Travis's local dev setup
Anthropic API key Needed For Claude meta generation
WPCode Lite (WordPress) Ready Already installed — needed for REST API
Yoast fields
n8n Ready Already running on Linode
Streamlit Shared Same instance as YouTube dashboard

API Quotas & Costs

Service Quota/Cost Notes
Google Search Console API 2000 queries/day (free) More than enough
for PBS
Claude API (Sonnet) ~$0.003 per meta generation Pennies per recipe
WordPress REST API Unlimited (self-hosted) No external cost
Google Chat webhooks Unlimited (free) Already configured for n8n

Key Decisions

Decision Choice Rationale
Data source Google Search Console (free) Actual Google data, not
estimates. 16 months history. Sufficient for PBS scale.
Competitor intelligence Deferred (Phase 5) Start free, evaluate need
after collecting own data.
LLM for meta generation Claude API (Anthropic) Consistent with PBS
brand, excellent at structured content, cost-effective.
Meta push to WordPress REST API via WPCode snippet Lightweight, no
extra plugins, uses existing WPCode Lite install.
Dashboard Streamlit (shared with YouTube) Single analytics platform
for all PBS data.
Approval workflow Google Chat notification Keeps human in the loop
before meta goes live. Jenny/Travis approve.

Sequencing & Priority

  1. Phase 1 (Search Console Setup) → unblocks data collection
  2. Phase 2 (Data Collector) → starts building historical dataset, enables analysis
  3. Phase 3 (n8n + LLM Meta Generation) → the automation sweet spot — no more copy-paste
  4. Phase 4 (Streamlit Dashboard) → visualize what's working, find opportunities
  5. Phase 5 (Competitor Intelligence) → evaluate free vs paid based on real needs
  6. Phase 6 (Advanced) → cross-platform insights, deeper automation

Relationship to Other PBS Projects

  • YouTube Analytics Pipeline: Shared Streamlit dashboard, shared Google Cloud project, parallel development
  • PBS Content Hub (Phase 5): SEO dashboard could become a Content Hub tab
  • Instagram Automation: Cross-platform content performance analysis (search + social)
  • WordPress-to-MySQL sync: Trigger SEO meta generation when new recipes are synced
  • Authelia SSO: Will protect Streamlit dashboard access
  • Yoast SEO plugin: Stays installed for technical plumbing (sitemaps, canonical URLs, Open Graph) — but meta content is now generated and pushed by the pipeline, not manually entered

Note on Yoast

Yoast stays installed but its role changes. It continues handling:

  • XML sitemap generation
  • Canonical URL management
  • Open Graph / social sharing meta tags
  • Basic schema markup (supplementing WPRM's recipe schema)

What it NO LONGER does:

  • You stop manually filling in meta titles/descriptions (the pipeline does this)
  • You ignore the content scoring stoplight (Claude's output is smarter than Yoast's rules)
  • Focus keywords are set by data-driven keyword research, not gut feeling

Yoast becomes invisible plumbing. The pipeline becomes the brain.


Next Step: Phase 1 — Check if Google Search Console is connected for plantbasedsoutherner.com