---
project: pbs-seo-automation
type: project-plan
status: active
tags:
  - pbs
  - seo
  - python
  - automation
  - n8n
  - wordpress
  - yoast
  - streamlit
  - analytics
  - google-search-console
created: 2026-03-23
updated: 2026-03-23
path: PBS/Tech/Projects/
---

# PBS SEO Automation Pipeline

## Project Goal
Build a self-hosted SEO automation pipeline that replaces manual copy-paste
SEO workflows with an automated system: collecting search performance data,
tracking rankings over time, researching keywords, generating optimized
meta content via Claude API, and pushing it back to WordPress — all
orchestrated through n8n and visualized in Streamlit.

## Why This Matters
PBS is already getting organic search traffic, which means SEO is working
to some degree. But optimizing it is currently a manual, disjointed process
— Yoast shows fields but doesn't help fill them intelligently, and there's
no automation connecting keyword research to content optimization. This
project turns SEO from a chore into a data-driven pipeline that works
across all PBS content types.

## Content Types Covered
- **Recipes** (live) — highest SEO value, drives organic discovery
- **Blog/editorial** (live) — builds authority, targets informational
queries
- **Cookbook landing pages** (future) — transactional/promotional SEO
- **Merch pages** (future) — Product schema, transactional keywords
- **Membership/classes** (future) — funnel-driven, conversion-focused

The pipeline is designed to handle all content types from day one, even if
only recipes and blog posts exist today.

---

## Architecture Overview

```
Google Search Console API (free)
        |
   Python Collector (PyCharm + UV)
        |
   SQLite Database
        |
   ┌────────────┴────────────┐
   │                         │
  n8n                    Streamlit
(orchestration)         (dashboard)
   │
   ├─ Claude API (generate meta titles/descriptions)
   ├─ WordPress REST API (push meta back to Yoast)
   └─ Google Chat (alerts & digests)
```

### Shared Infrastructure with YouTube Analytics Project
- Same Streamlit instance (separate pages/tabs)
- Same n8n server for orchestration
- Separate SQLite database (keeps projects independent)
- Same Traefik reverse proxy for dashboard access
- Same Google Cloud project for API credentials

---

## Phase 1: Google Search Console Setup
**Estimated Time:** 1-2 hours
**Goal:** Connect Search Console to PBS site and verify API access

### Tasks
- [ ] Verify Google Search Console is connected for plantbasedsoutherner.com
  - If yes: confirm data is flowing, check how far back data goes
  - If no: add property, verify via DNS (Cloudflare), wait for data
collection to begin
- [ ] Enable Google Search Console API in Google Cloud project
  - Can reuse the same project created for YouTube Analytics
- [ ] Create service account OR extend existing OAuth credentials with
scope:
  `https://www.googleapis.com/auth/webmasters.readonly`
- [ ] Test API access — pull a sample query report to confirm data flows

### Key Details
- Search Console retains 16 months of historical data
- Data is typically delayed 2-3 days
- API uses `google-api-python-client` (same library as YouTube project)
- Service account auth is simpler for automated/server-side collection (no
browser needed)

### Deliverable
Working API access to PBS search performance data

---

## Phase 2: Search Data Collector
**Estimated Time:** 3-4 hours
**Goal:** Python script that pulls search performance data into SQLite
**Tools:** PyCharm Professional, UV package manager

### Tasks
- [ ] Initialize project with UV (`uv init pbs-seo-analytics`)
- [ ] Install dependencies: `google-api-python-client`, `google-auth`
- [ ] Build auth module (service account preferred for server-side)
- [ ] Build search query collector (queries, impressions, clicks, CTR,
position by page)
- [ ] Build page performance collector (aggregate metrics per URL)
- [ ] Build device/country breakdown collector
- [ ] Design and create SQLite schema
- [ ] Implement data ingestion with upsert logic (idempotent runs)
- [ ] Add CLI interface for manual runs and backfill (up to 16 months)
- [ ] Initial backfill of all available historical data

### SQLite Schema (Initial Design)

```sql
-- Pages tracked on the site
CREATE TABLE pages (
    url TEXT PRIMARY KEY,
    page_type TEXT CHECK(page_type IN ('recipe', 'blog', 'merch',
'cookbook', 'membership', 'landing', 'other')),
    title TEXT,
    first_seen TEXT,
    created_at TEXT DEFAULT CURRENT_TIMESTAMP,
    updated_at TEXT DEFAULT CURRENT_TIMESTAMP
);

-- Daily search performance per query per page
CREATE TABLE search_queries (
    date TEXT NOT NULL,
    query TEXT NOT NULL,
    page_url TEXT NOT NULL,
    clicks INTEGER DEFAULT 0,
    impressions INTEGER DEFAULT 0,
    ctr REAL DEFAULT 0,
    avg_position REAL DEFAULT 0,
    device TEXT DEFAULT 'all',
    country TEXT DEFAULT 'all',
    created_at TEXT DEFAULT CURRENT_TIMESTAMP,
    PRIMARY KEY (date, query, page_url, device, country)
);

-- Daily aggregate performance per page
CREATE TABLE page_daily_metrics (
    date TEXT NOT NULL,
    page_url TEXT NOT NULL,
    total_clicks INTEGER DEFAULT 0,
    total_impressions INTEGER DEFAULT 0,
    avg_ctr REAL DEFAULT 0,
    avg_position REAL DEFAULT 0,
    created_at TEXT DEFAULT CURRENT_TIMESTAMP,
    PRIMARY KEY (date, page_url)
);

-- Keyword tracking: queries we want to monitor over time
CREATE TABLE tracked_keywords (
    keyword TEXT PRIMARY KEY,
    category TEXT,
    target_page_url TEXT,
    added_at TEXT DEFAULT CURRENT_TIMESTAMP,
    notes TEXT
);

-- Snapshot of rank position for tracked keywords
CREATE TABLE keyword_rank_history (
    keyword TEXT NOT NULL,
    date TEXT NOT NULL,
    avg_position REAL,
    impressions INTEGER DEFAULT 0,
    clicks INTEGER DEFAULT 0,
    best_page_url TEXT,
    created_at TEXT DEFAULT CURRENT_TIMESTAMP,
    PRIMARY KEY (keyword, date),
    FOREIGN KEY (keyword) REFERENCES tracked_keywords(keyword)
);

-- SEO meta content generated and applied
CREATE TABLE seo_meta_log (
    page_url TEXT NOT NULL,
    generated_at TEXT NOT NULL,
    meta_title TEXT,
    meta_description TEXT,
    focus_keyword TEXT,
    model_used TEXT DEFAULT 'claude-sonnet',
    pushed_to_wordpress INTEGER DEFAULT 0,
    pushed_at TEXT,
    PRIMARY KEY (page_url, generated_at)
);

-- Site-level daily summary
CREATE TABLE site_daily_metrics (
    date TEXT PRIMARY KEY,
    total_clicks INTEGER DEFAULT 0,
    total_impressions INTEGER DEFAULT 0,
    avg_ctr REAL DEFAULT 0,
    avg_position REAL DEFAULT 0,
    unique_queries INTEGER DEFAULT 0,
    created_at TEXT DEFAULT CURRENT_TIMESTAMP
);
```

### What the Data Tells You
Google Search Console API returns four core metrics per query/page
combination:
- **Clicks** — how many times someone clicked through to your site
- **Impressions** — how many times your page appeared in search results
- **CTR (Click-Through Rate)** — clicks / impressions
- **Average Position** — where you ranked (1 = top of page 1)

You can slice these by: date, query, page, device (mobile/desktop/tablet),
country, and search type (web/image/video).

### Deliverable
Python CLI tool that backfills and incrementally collects PBS search data
into SQLite

---

## Phase 3: n8n Orchestration + LLM Meta Generation
**Estimated Time:** 4-5 hours
**Goal:** Automate data collection, generate SEO meta content with Claude,
push to WordPress

### Tasks
- [ ] Create n8n workflow: daily scheduled trigger → Execute Command →
Python collector
- [ ] Build Claude API integration for meta generation:
  - Input: page content, current keywords ranking, content type
  - Output: optimized meta title, meta description, focus keyword
  - System prompt tuned for PBS brand voice (whole food plant based,
southern, warm, NOT "vegan")
- [ ] Build WordPress REST API integration to push meta back to Yoast
fields:
  - `_yoast_wpseo_title` (meta title)
  - `_yoast_wpseo_metadesc` (meta description)
  - `_yoast_wpseo_focuskw` (focus keyword)
- [ ] Add WPCode snippet to expose Yoast fields via WordPress REST API
(required for write access)
- [ ] Create approval workflow: generate meta → notify Travis/Jenny via
Google Chat → approve/reject → push to WordPress
- [ ] Create weekly SEO digest alert for Google Chat
- [ ] Error handling and failure notifications

### LLM Meta Generation Flow
```
n8n detects new/updated post in WordPress
        │
  Fetch page content + current search queries ranking for that URL
        │
  Send to Claude API with SEO-optimized system prompt
        │
  Claude generates: meta title, meta description, focus keyword
        │
  Store in seo_meta_log table
        │
  Send to Google Chat for approval
        │
  On approval: push to WordPress via REST API
```

### WordPress Integration Detail
Yoast's REST API is read-only by default. To write meta fields, we need a
small WPCode snippet that registers Yoast fields on the WordPress REST API.
This is a lightweight approach — about 20 lines of PHP via WPCode Lite
(already installed), no additional plugins needed.

Alternatively, n8n can update post meta directly via the WordPress API
using the `meta` field in a PUT request to `/wp-json/wp/v2/posts/`.

### Alert Ideas
- **New content alert:** "Jenny published a new recipe. Claude generated
meta — approve?"
- **Weekly digest:** Top gaining keywords, biggest position changes, pages
needing optimization
- **Opportunity alert:** "You're ranking #11 for 'plant based collard
greens' — small push could hit page 1"
- **Cannibalization alert:** Multiple PBS pages competing for the same
keyword

### Deliverable
Fully automated pipeline: collect → analyze → generate → approve → publish
SEO meta

---

## Phase 4: Streamlit SEO Dashboard
**Estimated Time:** 4-6 hours
**Goal:** Visual SEO analytics dashboard integrated alongside YouTube
analytics

### Tasks
- [ ] Add SEO pages to existing Streamlit app (or create separate app)
- [ ] Build search performance overview (clicks, impressions, CTR trends)
- [ ] Build keyword rank tracker (position changes over time)
- [ ] Build page-level deep dive (which queries drive traffic to each page)
- [ ] Build content gap analysis view (queries with high impressions but
low CTR)
- [ ] Build content type comparison (recipe SEO vs blog SEO performance)
- [ ] Build "opportunities" view (keywords close to page 1, quick wins)
- [ ] Build meta generation log view (what Claude generated, what was
approved)

### Dashboard Pages (Initial Concept)
1. **Search Overview** — total clicks/impressions/CTR trend, top queries,
top pages
2. **Keyword Tracker** — track specific keywords over time, position change
alerts
3. **Page Deep Dive** — select a page, see all queries driving traffic,
position trends
4. **Content Gaps** — high impression / low click pages (title/description
need work)
5. **Opportunities** — keywords ranking positions 8-20 (striking distance
of page 1)
6. **Content Type Breakdown** — SEO performance by content type (recipe vs
blog vs merch)
7. **Meta Generation Log** — what Claude generated, approval status,
before/after

### Deliverable
Live SEO dashboard with actionable insights for content strategy

---

## Phase 5: Competitor Intelligence (Open — Free vs Paid)
**Estimated Time:** TBD based on approach
**Goal:** Understand competitive landscape and find content opportunities

### Option A: DIY / Free Approach
- **Manual competitor research:** Periodically Google target keywords and
note who ranks
- **Python scraping:** Build a lightweight rank checker that searches
Google for target keywords and records positions (note: Google may
rate-limit or block; use responsibly)
- **Free tools:** Google Trends API for search interest over time,
AnswerThePublic for question-based keyword ideas
- **Search Console mining:** Analyze existing query data to find patterns
and gaps — you'd be surprised how much insight is already in your own data
- **Cost:** $0
- **Limitation:** No competitor backlink data, no domain authority scores,
limited keyword volume estimates

### Option B: Budget Paid Tools (~$50-75/month)
- **SERPApi or DataForSEO:** Programmatic access to Google search results
  - Track competitor rankings for your target keywords
  - Get search volume estimates
  - API-friendly, integrates cleanly with Python pipeline
- **Best for:** Automated daily rank tracking beyond what Search Console
provides
- **Cost:** ~$50-75/month depending on query volume

### Option C: Full SEO Platform (~$99-200+/month)
- **Ahrefs, SEMrush, or Moz:** Comprehensive SEO intelligence
  - Competitor keyword analysis (what they rank for that you don't)
  - Backlink profiles and domain authority
  - Content gap analysis at scale
  - Keyword difficulty scores
- **Best for:** When you've outgrown Search Console data and need
competitive intelligence
- **Cost:** $99-200+/month

### Recommendation
Start with Option A (free). Build the pipeline around Google Search Console
data first. After 1-2 months of collecting data, evaluate what questions
you can't answer with free data alone. That will tell you whether Option B
or C is worth the investment. Many sites PBS's size never need to go past
Option A.

### Deliverable
Decision on competitive intelligence approach based on data from earlier
phases

---

## Phase 6: Advanced SEO Automation & Iteration
**Estimated Time:** Ongoing
**Goal:** Deepen automation and cross-platform insights

### Future Ideas
- [ ] Auto-detect new WordPress posts and trigger SEO meta generation
without manual intervention
- [ ] Cross-reference YouTube retention data with recipe page SEO
performance (which videos drive search traffic?)
- [ ] Automated internal linking suggestions (connect related recipes/blog
posts)
- [ ] Schema markup validation and monitoring (ensure WPRM recipe schema
stays healthy)
- [ ] Page speed monitoring integration (Core Web Vitals affect rankings)
- [ ] Seasonal keyword planning (predict trending search terms by season
for recipe content)
- [ ] A/B test meta titles: generate two versions, measure CTR difference
- [ ] Content calendar integration: use keyword gaps to suggest what Jenny
should create next
- [ ] Extend to merch, cookbook, and membership pages as they launch

---

## Prerequisites & Dependencies

| Requirement | Status | Notes |
|---|---|---|
| Google Search Console verified | Needs check | May already be connected
via Workspace |
| Google Cloud project | Shared | Same project as YouTube Analytics |
| Search Console API enabled | Needed | Free, quota-based |
| OAuth/Service Account credentials | Needed | Can extend existing YouTube
creds |
| Python + UV | Ready | Travis's local dev setup |
| Anthropic API key | Needed | For Claude meta generation |
| WPCode Lite (WordPress) | Ready | Already installed — needed for REST API
Yoast fields |
| n8n | Ready | Already running on Linode |
| Streamlit | Shared | Same instance as YouTube dashboard |

---

## API Quotas & Costs

| Service | Quota/Cost | Notes |
|---|---|---|
| Google Search Console API | 2000 queries/day (free) | More than enough
for PBS |
| Claude API (Sonnet) | ~$0.003 per meta generation | Pennies per recipe |
| WordPress REST API | Unlimited (self-hosted) | No external cost |
| Google Chat webhooks | Unlimited (free) | Already configured for n8n |

---

## Key Decisions

| Decision | Choice | Rationale |
|---|---|---|
| Data source | Google Search Console (free) | Actual Google data, not
estimates. 16 months history. Sufficient for PBS scale. |
| Competitor intelligence | Deferred (Phase 5) | Start free, evaluate need
after collecting own data. |
| LLM for meta generation | Claude API (Anthropic) | Consistent with PBS
brand, excellent at structured content, cost-effective. |
| Meta push to WordPress | REST API via WPCode snippet | Lightweight, no
extra plugins, uses existing WPCode Lite install. |
| Dashboard | Streamlit (shared with YouTube) | Single analytics platform
for all PBS data. |
| Approval workflow | Google Chat notification | Keeps human in the loop
before meta goes live. Jenny/Travis approve. |

---

## Sequencing & Priority
1. **Phase 1** (Search Console Setup) → unblocks data collection
2. **Phase 2** (Data Collector) → starts building historical dataset,
enables analysis
3. **Phase 3** (n8n + LLM Meta Generation) → the automation sweet spot — no
more copy-paste
4. **Phase 4** (Streamlit Dashboard) → visualize what's working, find
opportunities
5. **Phase 5** (Competitor Intelligence) → evaluate free vs paid based on
real needs
6. **Phase 6** (Advanced) → cross-platform insights, deeper automation

---

## Relationship to Other PBS Projects
- **YouTube Analytics Pipeline:** Shared Streamlit dashboard, shared Google
Cloud project, parallel development
- **PBS Content Hub (Phase 5):** SEO dashboard could become a Content Hub
tab
- **Instagram Automation:** Cross-platform content performance analysis
(search + social)
- **WordPress-to-MySQL sync:** Trigger SEO meta generation when new recipes
are synced
- **Authelia SSO:** Will protect Streamlit dashboard access
- **Yoast SEO plugin:** Stays installed for technical plumbing (sitemaps,
canonical URLs, Open Graph) — but meta content is now generated and pushed
by the pipeline, not manually entered

---

## Note on Yoast
Yoast stays installed but its role changes. It continues handling:
- XML sitemap generation
- Canonical URL management
- Open Graph / social sharing meta tags
- Basic schema markup (supplementing WPRM's recipe schema)

What it NO LONGER does:
- You stop manually filling in meta titles/descriptions (the pipeline does
this)
- You ignore the content scoring stoplight (Claude's output is smarter than
Yoast's rules)
- Focus keywords are set by data-driven keyword research, not gut feeling

Yoast becomes invisible plumbing. The pipeline becomes the brain.

---

*Next Step: Phase 1 — Check if Google Search Console is connected for
plantbasedsoutherner.com*