herbygitea/pbs-obsidian-vault

Fork 0

herbygitea 98a30ea354 Create pbs-seo-automation.md via n8n

2026-03-24 19:33:21 +00:00

18 KiB

Raw Blame History

project

type

status

PBS SEO Automation Pipeline

Project Goal

Build a self-hosted SEO automation pipeline that replaces manual copy-paste SEO workflows with an automated system: collecting search performance data, tracking rankings over time, researching keywords, generating optimized meta content via Claude API, and pushing it back to WordPress — all orchestrated through n8n and visualized in Streamlit.

Why This Matters

PBS is already getting organic search traffic, which means SEO is working to some degree. But optimizing it is currently a manual, disjointed process — Yoast shows fields but doesn't help fill them intelligently, and there's no automation connecting keyword research to content optimization. This project turns SEO from a chore into a data-driven pipeline that works across all PBS content types.

Content Types Covered

Recipes (live) — highest SEO value, drives organic discovery
Blog/editorial (live) — builds authority, targets informational queries
Cookbook landing pages (future) — transactional/promotional SEO
Merch pages (future) — Product schema, transactional keywords
Membership/classes (future) — funnel-driven, conversion-focused

The pipeline is designed to handle all content types from day one, even if only recipes and blog posts exist today.

Architecture Overview

Google Search Console API (free)
        |
   Python Collector (PyCharm + UV)
        |
   SQLite Database
        |
   ┌────────────┴────────────┐
   │                         │
  n8n                    Streamlit
(orchestration)         (dashboard)
   │
   ├─ Claude API (generate meta titles/descriptions)
   ├─ WordPress REST API (push meta back to Yoast)
   └─ Google Chat (alerts & digests)

Shared Infrastructure with YouTube Analytics Project

Same Streamlit instance (separate pages/tabs)
Same n8n server for orchestration
Separate SQLite database (keeps projects independent)
Same Traefik reverse proxy for dashboard access
Same Google Cloud project for API credentials

Phase 1: Google Search Console Setup

Estimated Time: 1-2 hours Goal: Connect Search Console to PBS site and verify API access

Tasks

Verify Google Search Console is connected for plantbasedsoutherner.com
- If yes: confirm data is flowing, check how far back data goes
- If no: add property, verify via DNS (Cloudflare), wait for data collection to begin
Enable Google Search Console API in Google Cloud project
- Can reuse the same project created for YouTube Analytics
Create service account OR extend existing OAuth credentials with scope: https://www.googleapis.com/auth/webmasters.readonly
Test API access — pull a sample query report to confirm data flows

Key Details

Search Console retains 16 months of historical data
Data is typically delayed 2-3 days
API uses google-api-python-client (same library as YouTube project)
Service account auth is simpler for automated/server-side collection (no browser needed)

Deliverable

Working API access to PBS search performance data

Phase 2: Search Data Collector

Estimated Time: 3-4 hours Goal: Python script that pulls search performance data into SQLite Tools: PyCharm Professional, UV package manager

Tasks

Initialize project with UV (uv init pbs-seo-analytics)
Install dependencies: google-api-python-client, google-auth
Build auth module (service account preferred for server-side)
Build search query collector (queries, impressions, clicks, CTR, position by page)
Build page performance collector (aggregate metrics per URL)
Build device/country breakdown collector
Design and create SQLite schema
Implement data ingestion with upsert logic (idempotent runs)
Add CLI interface for manual runs and backfill (up to 16 months)
Initial backfill of all available historical data

SQLite Schema (Initial Design)

-- Pages tracked on the site
CREATE TABLE pages (
    url TEXT PRIMARY KEY,
    page_type TEXT CHECK(page_type IN ('recipe', 'blog', 'merch',
'cookbook', 'membership', 'landing', 'other')),
    title TEXT,
    first_seen TEXT,
    created_at TEXT DEFAULT CURRENT_TIMESTAMP,
    updated_at TEXT DEFAULT CURRENT_TIMESTAMP
);

-- Daily search performance per query per page
CREATE TABLE search_queries (
    date TEXT NOT NULL,
    query TEXT NOT NULL,
    page_url TEXT NOT NULL,
    clicks INTEGER DEFAULT 0,
    impressions INTEGER DEFAULT 0,
    ctr REAL DEFAULT 0,
    avg_position REAL DEFAULT 0,
    device TEXT DEFAULT 'all',
    country TEXT DEFAULT 'all',
    created_at TEXT DEFAULT CURRENT_TIMESTAMP,
    PRIMARY KEY (date, query, page_url, device, country)
);

-- Daily aggregate performance per page
CREATE TABLE page_daily_metrics (
    date TEXT NOT NULL,
    page_url TEXT NOT NULL,
    total_clicks INTEGER DEFAULT 0,
    total_impressions INTEGER DEFAULT 0,
    avg_ctr REAL DEFAULT 0,
    avg_position REAL DEFAULT 0,
    created_at TEXT DEFAULT CURRENT_TIMESTAMP,
    PRIMARY KEY (date, page_url)
);

-- Keyword tracking: queries we want to monitor over time
CREATE TABLE tracked_keywords (
    keyword TEXT PRIMARY KEY,
    category TEXT,
    target_page_url TEXT,
    added_at TEXT DEFAULT CURRENT_TIMESTAMP,
    notes TEXT
);

-- Snapshot of rank position for tracked keywords
CREATE TABLE keyword_rank_history (
    keyword TEXT NOT NULL,
    date TEXT NOT NULL,
    avg_position REAL,
    impressions INTEGER DEFAULT 0,
    clicks INTEGER DEFAULT 0,
    best_page_url TEXT,
    created_at TEXT DEFAULT CURRENT_TIMESTAMP,
    PRIMARY KEY (keyword, date),
    FOREIGN KEY (keyword) REFERENCES tracked_keywords(keyword)
);

-- SEO meta content generated and applied
CREATE TABLE seo_meta_log (
    page_url TEXT NOT NULL,
    generated_at TEXT NOT NULL,
    meta_title TEXT,
    meta_description TEXT,
    focus_keyword TEXT,
    model_used TEXT DEFAULT 'claude-sonnet',
    pushed_to_wordpress INTEGER DEFAULT 0,
    pushed_at TEXT,
    PRIMARY KEY (page_url, generated_at)
);

-- Site-level daily summary
CREATE TABLE site_daily_metrics (
    date TEXT PRIMARY KEY,
    total_clicks INTEGER DEFAULT 0,
    total_impressions INTEGER DEFAULT 0,
    avg_ctr REAL DEFAULT 0,
    avg_position REAL DEFAULT 0,
    unique_queries INTEGER DEFAULT 0,
    created_at TEXT DEFAULT CURRENT_TIMESTAMP
);

What the Data Tells You

Google Search Console API returns four core metrics per query/page combination:

Clicks — how many times someone clicked through to your site
Impressions — how many times your page appeared in search results
CTR (Click-Through Rate) — clicks / impressions
Average Position — where you ranked (1 = top of page 1)

You can slice these by: date, query, page, device (mobile/desktop/tablet), country, and search type (web/image/video).

Deliverable

Python CLI tool that backfills and incrementally collects PBS search data into SQLite

Phase 3: n8n Orchestration + LLM Meta Generation

Estimated Time: 4-5 hours Goal: Automate data collection, generate SEO meta content with Claude, push to WordPress

Tasks

Create n8n workflow: daily scheduled trigger → Execute Command → Python collector
Build Claude API integration for meta generation:
- Input: page content, current keywords ranking, content type
- Output: optimized meta title, meta description, focus keyword
- System prompt tuned for PBS brand voice (whole food plant based, southern, warm, NOT "vegan")
Build WordPress REST API integration to push meta back to Yoast fields:
- _yoast_wpseo_title (meta title)
- _yoast_wpseo_metadesc (meta description)
- _yoast_wpseo_focuskw (focus keyword)
Add WPCode snippet to expose Yoast fields via WordPress REST API (required for write access)
Create approval workflow: generate meta → notify Travis/Jenny via Google Chat → approve/reject → push to WordPress
Create weekly SEO digest alert for Google Chat
Error handling and failure notifications

LLM Meta Generation Flow

n8n detects new/updated post in WordPress
        │
  Fetch page content + current search queries ranking for that URL
        │
  Send to Claude API with SEO-optimized system prompt
        │
  Claude generates: meta title, meta description, focus keyword
        │
  Store in seo_meta_log table
        │
  Send to Google Chat for approval
        │
  On approval: push to WordPress via REST API

WordPress Integration Detail

Yoast's REST API is read-only by default. To write meta fields, we need a small WPCode snippet that registers Yoast fields on the WordPress REST API. This is a lightweight approach — about 20 lines of PHP via WPCode Lite (already installed), no additional plugins needed.

Alternatively, n8n can update post meta directly via the WordPress API using the meta field in a PUT request to /wp-json/wp/v2/posts/.

Alert Ideas

New content alert: "Jenny published a new recipe. Claude generated meta — approve?"
Weekly digest: Top gaining keywords, biggest position changes, pages needing optimization
Opportunity alert: "You're ranking #11 for 'plant based collard greens' — small push could hit page 1"
Cannibalization alert: Multiple PBS pages competing for the same keyword

Deliverable

Fully automated pipeline: collect → analyze → generate → approve → publish SEO meta

Phase 4: Streamlit SEO Dashboard

Estimated Time: 4-6 hours Goal: Visual SEO analytics dashboard integrated alongside YouTube analytics

Tasks

Add SEO pages to existing Streamlit app (or create separate app)
Build search performance overview (clicks, impressions, CTR trends)
Build keyword rank tracker (position changes over time)
Build page-level deep dive (which queries drive traffic to each page)
Build content gap analysis view (queries with high impressions but low CTR)
Build content type comparison (recipe SEO vs blog SEO performance)
Build "opportunities" view (keywords close to page 1, quick wins)
Build meta generation log view (what Claude generated, what was approved)

Dashboard Pages (Initial Concept)

Search Overview — total clicks/impressions/CTR trend, top queries, top pages
Keyword Tracker — track specific keywords over time, position change alerts
Page Deep Dive — select a page, see all queries driving traffic, position trends
Content Gaps — high impression / low click pages (title/description need work)
Opportunities — keywords ranking positions 8-20 (striking distance of page 1)
Content Type Breakdown — SEO performance by content type (recipe vs blog vs merch)
Meta Generation Log — what Claude generated, approval status, before/after

Deliverable

Live SEO dashboard with actionable insights for content strategy

Phase 5: Competitor Intelligence (Open — Free vs Paid)

Estimated Time: TBD based on approach Goal: Understand competitive landscape and find content opportunities

Option A: DIY / Free Approach

Manual competitor research: Periodically Google target keywords and note who ranks
Python scraping: Build a lightweight rank checker that searches Google for target keywords and records positions (note: Google may rate-limit or block; use responsibly)
Free tools: Google Trends API for search interest over time, AnswerThePublic for question-based keyword ideas
Search Console mining: Analyze existing query data to find patterns and gaps — you'd be surprised how much insight is already in your own data
Cost: $0
Limitation: No competitor backlink data, no domain authority scores, limited keyword volume estimates

Option B: Budget Paid Tools (~$50-75/month)

SERPApi or DataForSEO: Programmatic access to Google search results
- Track competitor rankings for your target keywords
- Get search volume estimates
- API-friendly, integrates cleanly with Python pipeline
Best for: Automated daily rank tracking beyond what Search Console provides
Cost: ~$50-75/month depending on query volume

Option C: Full SEO Platform (~$99-200+/month)

Ahrefs, SEMrush, or Moz: Comprehensive SEO intelligence
- Competitor keyword analysis (what they rank for that you don't)
- Backlink profiles and domain authority
- Content gap analysis at scale
- Keyword difficulty scores
Best for: When you've outgrown Search Console data and need competitive intelligence
Cost: $99-200+/month

Recommendation

Start with Option A (free). Build the pipeline around Google Search Console data first. After 1-2 months of collecting data, evaluate what questions you can't answer with free data alone. That will tell you whether Option B or C is worth the investment. Many sites PBS's size never need to go past Option A.

Deliverable

Decision on competitive intelligence approach based on data from earlier phases

Phase 6: Advanced SEO Automation & Iteration

Estimated Time: Ongoing Goal: Deepen automation and cross-platform insights

Future Ideas

Auto-detect new WordPress posts and trigger SEO meta generation without manual intervention
Cross-reference YouTube retention data with recipe page SEO performance (which videos drive search traffic?)
Automated internal linking suggestions (connect related recipes/blog posts)
Schema markup validation and monitoring (ensure WPRM recipe schema stays healthy)
Page speed monitoring integration (Core Web Vitals affect rankings)
Seasonal keyword planning (predict trending search terms by season for recipe content)
A/B test meta titles: generate two versions, measure CTR difference
Content calendar integration: use keyword gaps to suggest what Jenny should create next
Extend to merch, cookbook, and membership pages as they launch

Prerequisites & Dependencies

Requirement	Status	Notes
Google Search Console verified	Needs check	May already be connected
via Workspace
Google Cloud project	Shared	Same project as YouTube Analytics
Search Console API enabled	Needed	Free, quota-based
OAuth/Service Account credentials	Needed	Can extend existing YouTube
creds
Python + UV	Ready	Travis's local dev setup
Anthropic API key	Needed	For Claude meta generation
WPCode Lite (WordPress)	Ready	Already installed — needed for REST API
Yoast fields
n8n	Ready	Already running on Linode
Streamlit	Shared	Same instance as YouTube dashboard

API Quotas & Costs

Service	Quota/Cost	Notes
Google Search Console API	2000 queries/day (free)	More than enough
for PBS
Claude API (Sonnet)	~$0.003 per meta generation	Pennies per recipe
WordPress REST API	Unlimited (self-hosted)	No external cost
Google Chat webhooks	Unlimited (free)	Already configured for n8n

Key Decisions

Decision	Choice	Rationale
Data source	Google Search Console (free)	Actual Google data, not
estimates. 16 months history. Sufficient for PBS scale.
Competitor intelligence	Deferred (Phase 5)	Start free, evaluate need
after collecting own data.
LLM for meta generation	Claude API (Anthropic)	Consistent with PBS
brand, excellent at structured content, cost-effective.
Meta push to WordPress	REST API via WPCode snippet	Lightweight, no
extra plugins, uses existing WPCode Lite install.
Dashboard	Streamlit (shared with YouTube)	Single analytics platform
for all PBS data.
Approval workflow	Google Chat notification	Keeps human in the loop
before meta goes live. Jenny/Travis approve.

Sequencing & Priority

Phase 1 (Search Console Setup) → unblocks data collection
Phase 2 (Data Collector) → starts building historical dataset, enables analysis
Phase 3 (n8n + LLM Meta Generation) → the automation sweet spot — no more copy-paste
Phase 4 (Streamlit Dashboard) → visualize what's working, find opportunities
Phase 5 (Competitor Intelligence) → evaluate free vs paid based on real needs
Phase 6 (Advanced) → cross-platform insights, deeper automation

Relationship to Other PBS Projects

YouTube Analytics Pipeline: Shared Streamlit dashboard, shared Google Cloud project, parallel development
PBS Content Hub (Phase 5): SEO dashboard could become a Content Hub tab
Instagram Automation: Cross-platform content performance analysis (search + social)
WordPress-to-MySQL sync: Trigger SEO meta generation when new recipes are synced
Authelia SSO: Will protect Streamlit dashboard access
Yoast SEO plugin: Stays installed for technical plumbing (sitemaps, canonical URLs, Open Graph) — but meta content is now generated and pushed by the pipeline, not manually entered

Note on Yoast

Yoast stays installed but its role changes. It continues handling:

XML sitemap generation
Canonical URL management
Open Graph / social sharing meta tags
Basic schema markup (supplementing WPRM's recipe schema)

What it NO LONGER does:

You stop manually filling in meta titles/descriptions (the pipeline does this)
You ignore the content scoring stoplight (Claude's output is smarter than Yoast's rules)
Focus keywords are set by data-driven keyword research, not gut feeling

Yoast becomes invisible plumbing. The pipeline becomes the brain.

Next Step: Phase 1 — Check if Google Search Console is connected for plantbasedsoutherner.com

18 KiB Raw Blame History

PBS SEO Automation Pipeline

Project Goal

Why This Matters

Content Types Covered

Architecture Overview

Shared Infrastructure with YouTube Analytics Project

Phase 1: Google Search Console Setup

Tasks

Key Details

Deliverable

Phase 2: Search Data Collector

Tasks

SQLite Schema (Initial Design)

What the Data Tells You

Deliverable

Phase 3: n8n Orchestration + LLM Meta Generation

Tasks

LLM Meta Generation Flow

WordPress Integration Detail

Alert Ideas

Deliverable

Phase 4: Streamlit SEO Dashboard

Tasks

Dashboard Pages (Initial Concept)

Deliverable

Phase 5: Competitor Intelligence (Open — Free vs Paid)

Option A: DIY / Free Approach

Option B: Budget Paid Tools (~$50-75/month)

Option C: Full SEO Platform (~$99-200+/month)

Recommendation

Deliverable

Phase 6: Advanced SEO Automation & Iteration

Future Ideas

Prerequisites & Dependencies

API Quotas & Costs

Key Decisions

Sequencing & Priority

Relationship to Other PBS Projects

Note on Yoast

18 KiB

Raw Blame History