18 KiB
| project | type | status | tags | created | updated | path | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| pbs-seo-automation | project-plan | active |
|
2026-03-23 | 2026-03-23 | PBS/Tech/Projects/ |
PBS SEO Automation Pipeline
Project Goal
Build a self-hosted SEO automation pipeline that replaces manual copy-paste SEO workflows with an automated system: collecting search performance data, tracking rankings over time, researching keywords, generating optimized meta content via Claude API, and pushing it back to WordPress — all orchestrated through n8n and visualized in Streamlit.
Why This Matters
PBS is already getting organic search traffic, which means SEO is working to some degree. But optimizing it is currently a manual, disjointed process — Yoast shows fields but doesn't help fill them intelligently, and there's no automation connecting keyword research to content optimization. This project turns SEO from a chore into a data-driven pipeline that works across all PBS content types.
Content Types Covered
- Recipes (live) — highest SEO value, drives organic discovery
- Blog/editorial (live) — builds authority, targets informational queries
- Cookbook landing pages (future) — transactional/promotional SEO
- Merch pages (future) — Product schema, transactional keywords
- Membership/classes (future) — funnel-driven, conversion-focused
The pipeline is designed to handle all content types from day one, even if only recipes and blog posts exist today.
Architecture Overview
Google Search Console API (free)
|
Python Collector (PyCharm + UV)
|
SQLite Database
|
┌────────────┴────────────┐
│ │
n8n Streamlit
(orchestration) (dashboard)
│
├─ Claude API (generate meta titles/descriptions)
├─ WordPress REST API (push meta back to Yoast)
└─ Google Chat (alerts & digests)
Shared Infrastructure with YouTube Analytics Project
- Same Streamlit instance (separate pages/tabs)
- Same n8n server for orchestration
- Separate SQLite database (keeps projects independent)
- Same Traefik reverse proxy for dashboard access
- Same Google Cloud project for API credentials
Phase 1: Google Search Console Setup
Estimated Time: 1-2 hours Goal: Connect Search Console to PBS site and verify API access
Tasks
- Verify Google Search Console is connected for plantbasedsoutherner.com
- If yes: confirm data is flowing, check how far back data goes
- If no: add property, verify via DNS (Cloudflare), wait for data collection to begin
- Enable Google Search Console API in Google Cloud project
- Can reuse the same project created for YouTube Analytics
- Create service account OR extend existing OAuth credentials with
scope:
https://www.googleapis.com/auth/webmasters.readonly - Test API access — pull a sample query report to confirm data flows
Key Details
- Search Console retains 16 months of historical data
- Data is typically delayed 2-3 days
- API uses
google-api-python-client(same library as YouTube project) - Service account auth is simpler for automated/server-side collection (no browser needed)
Deliverable
Working API access to PBS search performance data
Phase 2: Search Data Collector
Estimated Time: 3-4 hours Goal: Python script that pulls search performance data into SQLite Tools: PyCharm Professional, UV package manager
Tasks
- Initialize project with UV (
uv init pbs-seo-analytics) - Install dependencies:
google-api-python-client,google-auth - Build auth module (service account preferred for server-side)
- Build search query collector (queries, impressions, clicks, CTR, position by page)
- Build page performance collector (aggregate metrics per URL)
- Build device/country breakdown collector
- Design and create SQLite schema
- Implement data ingestion with upsert logic (idempotent runs)
- Add CLI interface for manual runs and backfill (up to 16 months)
- Initial backfill of all available historical data
SQLite Schema (Initial Design)
-- Pages tracked on the site
CREATE TABLE pages (
url TEXT PRIMARY KEY,
page_type TEXT CHECK(page_type IN ('recipe', 'blog', 'merch',
'cookbook', 'membership', 'landing', 'other')),
title TEXT,
first_seen TEXT,
created_at TEXT DEFAULT CURRENT_TIMESTAMP,
updated_at TEXT DEFAULT CURRENT_TIMESTAMP
);
-- Daily search performance per query per page
CREATE TABLE search_queries (
date TEXT NOT NULL,
query TEXT NOT NULL,
page_url TEXT NOT NULL,
clicks INTEGER DEFAULT 0,
impressions INTEGER DEFAULT 0,
ctr REAL DEFAULT 0,
avg_position REAL DEFAULT 0,
device TEXT DEFAULT 'all',
country TEXT DEFAULT 'all',
created_at TEXT DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (date, query, page_url, device, country)
);
-- Daily aggregate performance per page
CREATE TABLE page_daily_metrics (
date TEXT NOT NULL,
page_url TEXT NOT NULL,
total_clicks INTEGER DEFAULT 0,
total_impressions INTEGER DEFAULT 0,
avg_ctr REAL DEFAULT 0,
avg_position REAL DEFAULT 0,
created_at TEXT DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (date, page_url)
);
-- Keyword tracking: queries we want to monitor over time
CREATE TABLE tracked_keywords (
keyword TEXT PRIMARY KEY,
category TEXT,
target_page_url TEXT,
added_at TEXT DEFAULT CURRENT_TIMESTAMP,
notes TEXT
);
-- Snapshot of rank position for tracked keywords
CREATE TABLE keyword_rank_history (
keyword TEXT NOT NULL,
date TEXT NOT NULL,
avg_position REAL,
impressions INTEGER DEFAULT 0,
clicks INTEGER DEFAULT 0,
best_page_url TEXT,
created_at TEXT DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (keyword, date),
FOREIGN KEY (keyword) REFERENCES tracked_keywords(keyword)
);
-- SEO meta content generated and applied
CREATE TABLE seo_meta_log (
page_url TEXT NOT NULL,
generated_at TEXT NOT NULL,
meta_title TEXT,
meta_description TEXT,
focus_keyword TEXT,
model_used TEXT DEFAULT 'claude-sonnet',
pushed_to_wordpress INTEGER DEFAULT 0,
pushed_at TEXT,
PRIMARY KEY (page_url, generated_at)
);
-- Site-level daily summary
CREATE TABLE site_daily_metrics (
date TEXT PRIMARY KEY,
total_clicks INTEGER DEFAULT 0,
total_impressions INTEGER DEFAULT 0,
avg_ctr REAL DEFAULT 0,
avg_position REAL DEFAULT 0,
unique_queries INTEGER DEFAULT 0,
created_at TEXT DEFAULT CURRENT_TIMESTAMP
);
What the Data Tells You
Google Search Console API returns four core metrics per query/page combination:
- Clicks — how many times someone clicked through to your site
- Impressions — how many times your page appeared in search results
- CTR (Click-Through Rate) — clicks / impressions
- Average Position — where you ranked (1 = top of page 1)
You can slice these by: date, query, page, device (mobile/desktop/tablet), country, and search type (web/image/video).
Deliverable
Python CLI tool that backfills and incrementally collects PBS search data into SQLite
Phase 3: n8n Orchestration + LLM Meta Generation
Estimated Time: 4-5 hours Goal: Automate data collection, generate SEO meta content with Claude, push to WordPress
Tasks
- Create n8n workflow: daily scheduled trigger → Execute Command → Python collector
- Build Claude API integration for meta generation:
- Input: page content, current keywords ranking, content type
- Output: optimized meta title, meta description, focus keyword
- System prompt tuned for PBS brand voice (whole food plant based, southern, warm, NOT "vegan")
- Build WordPress REST API integration to push meta back to Yoast
fields:
_yoast_wpseo_title(meta title)_yoast_wpseo_metadesc(meta description)_yoast_wpseo_focuskw(focus keyword)
- Add WPCode snippet to expose Yoast fields via WordPress REST API (required for write access)
- Create approval workflow: generate meta → notify Travis/Jenny via Google Chat → approve/reject → push to WordPress
- Create weekly SEO digest alert for Google Chat
- Error handling and failure notifications
LLM Meta Generation Flow
n8n detects new/updated post in WordPress
│
Fetch page content + current search queries ranking for that URL
│
Send to Claude API with SEO-optimized system prompt
│
Claude generates: meta title, meta description, focus keyword
│
Store in seo_meta_log table
│
Send to Google Chat for approval
│
On approval: push to WordPress via REST API
WordPress Integration Detail
Yoast's REST API is read-only by default. To write meta fields, we need a small WPCode snippet that registers Yoast fields on the WordPress REST API. This is a lightweight approach — about 20 lines of PHP via WPCode Lite (already installed), no additional plugins needed.
Alternatively, n8n can update post meta directly via the WordPress API
using the meta field in a PUT request to /wp-json/wp/v2/posts/.
Alert Ideas
- New content alert: "Jenny published a new recipe. Claude generated meta — approve?"
- Weekly digest: Top gaining keywords, biggest position changes, pages needing optimization
- Opportunity alert: "You're ranking #11 for 'plant based collard greens' — small push could hit page 1"
- Cannibalization alert: Multiple PBS pages competing for the same keyword
Deliverable
Fully automated pipeline: collect → analyze → generate → approve → publish SEO meta
Phase 4: Streamlit SEO Dashboard
Estimated Time: 4-6 hours Goal: Visual SEO analytics dashboard integrated alongside YouTube analytics
Tasks
- Add SEO pages to existing Streamlit app (or create separate app)
- Build search performance overview (clicks, impressions, CTR trends)
- Build keyword rank tracker (position changes over time)
- Build page-level deep dive (which queries drive traffic to each page)
- Build content gap analysis view (queries with high impressions but low CTR)
- Build content type comparison (recipe SEO vs blog SEO performance)
- Build "opportunities" view (keywords close to page 1, quick wins)
- Build meta generation log view (what Claude generated, what was approved)
Dashboard Pages (Initial Concept)
- Search Overview — total clicks/impressions/CTR trend, top queries, top pages
- Keyword Tracker — track specific keywords over time, position change alerts
- Page Deep Dive — select a page, see all queries driving traffic, position trends
- Content Gaps — high impression / low click pages (title/description need work)
- Opportunities — keywords ranking positions 8-20 (striking distance of page 1)
- Content Type Breakdown — SEO performance by content type (recipe vs blog vs merch)
- Meta Generation Log — what Claude generated, approval status, before/after
Deliverable
Live SEO dashboard with actionable insights for content strategy
Phase 5: Competitor Intelligence (Open — Free vs Paid)
Estimated Time: TBD based on approach Goal: Understand competitive landscape and find content opportunities
Option A: DIY / Free Approach
- Manual competitor research: Periodically Google target keywords and note who ranks
- Python scraping: Build a lightweight rank checker that searches Google for target keywords and records positions (note: Google may rate-limit or block; use responsibly)
- Free tools: Google Trends API for search interest over time, AnswerThePublic for question-based keyword ideas
- Search Console mining: Analyze existing query data to find patterns and gaps — you'd be surprised how much insight is already in your own data
- Cost: $0
- Limitation: No competitor backlink data, no domain authority scores, limited keyword volume estimates
Option B: Budget Paid Tools (~$50-75/month)
- SERPApi or DataForSEO: Programmatic access to Google search results
- Track competitor rankings for your target keywords
- Get search volume estimates
- API-friendly, integrates cleanly with Python pipeline
- Best for: Automated daily rank tracking beyond what Search Console provides
- Cost: ~$50-75/month depending on query volume
Option C: Full SEO Platform (~$99-200+/month)
- Ahrefs, SEMrush, or Moz: Comprehensive SEO intelligence
- Competitor keyword analysis (what they rank for that you don't)
- Backlink profiles and domain authority
- Content gap analysis at scale
- Keyword difficulty scores
- Best for: When you've outgrown Search Console data and need competitive intelligence
- Cost: $99-200+/month
Recommendation
Start with Option A (free). Build the pipeline around Google Search Console data first. After 1-2 months of collecting data, evaluate what questions you can't answer with free data alone. That will tell you whether Option B or C is worth the investment. Many sites PBS's size never need to go past Option A.
Deliverable
Decision on competitive intelligence approach based on data from earlier phases
Phase 6: Advanced SEO Automation & Iteration
Estimated Time: Ongoing Goal: Deepen automation and cross-platform insights
Future Ideas
- Auto-detect new WordPress posts and trigger SEO meta generation without manual intervention
- Cross-reference YouTube retention data with recipe page SEO performance (which videos drive search traffic?)
- Automated internal linking suggestions (connect related recipes/blog posts)
- Schema markup validation and monitoring (ensure WPRM recipe schema stays healthy)
- Page speed monitoring integration (Core Web Vitals affect rankings)
- Seasonal keyword planning (predict trending search terms by season for recipe content)
- A/B test meta titles: generate two versions, measure CTR difference
- Content calendar integration: use keyword gaps to suggest what Jenny should create next
- Extend to merch, cookbook, and membership pages as they launch
Prerequisites & Dependencies
| Requirement | Status | Notes |
|---|---|---|
| Google Search Console verified | Needs check | May already be connected |
| via Workspace | ||
| Google Cloud project | Shared | Same project as YouTube Analytics |
| Search Console API enabled | Needed | Free, quota-based |
| OAuth/Service Account credentials | Needed | Can extend existing YouTube |
| creds | ||
| Python + UV | Ready | Travis's local dev setup |
| Anthropic API key | Needed | For Claude meta generation |
| WPCode Lite (WordPress) | Ready | Already installed — needed for REST API |
| Yoast fields | ||
| n8n | Ready | Already running on Linode |
| Streamlit | Shared | Same instance as YouTube dashboard |
API Quotas & Costs
| Service | Quota/Cost | Notes |
|---|---|---|
| Google Search Console API | 2000 queries/day (free) | More than enough |
| for PBS | ||
| Claude API (Sonnet) | ~$0.003 per meta generation | Pennies per recipe |
| WordPress REST API | Unlimited (self-hosted) | No external cost |
| Google Chat webhooks | Unlimited (free) | Already configured for n8n |
Key Decisions
| Decision | Choice | Rationale |
|---|---|---|
| Data source | Google Search Console (free) | Actual Google data, not |
| estimates. 16 months history. Sufficient for PBS scale. | ||
| Competitor intelligence | Deferred (Phase 5) | Start free, evaluate need |
| after collecting own data. | ||
| LLM for meta generation | Claude API (Anthropic) | Consistent with PBS |
| brand, excellent at structured content, cost-effective. | ||
| Meta push to WordPress | REST API via WPCode snippet | Lightweight, no |
| extra plugins, uses existing WPCode Lite install. | ||
| Dashboard | Streamlit (shared with YouTube) | Single analytics platform |
| for all PBS data. | ||
| Approval workflow | Google Chat notification | Keeps human in the loop |
| before meta goes live. Jenny/Travis approve. |
Sequencing & Priority
- Phase 1 (Search Console Setup) → unblocks data collection
- Phase 2 (Data Collector) → starts building historical dataset, enables analysis
- Phase 3 (n8n + LLM Meta Generation) → the automation sweet spot — no more copy-paste
- Phase 4 (Streamlit Dashboard) → visualize what's working, find opportunities
- Phase 5 (Competitor Intelligence) → evaluate free vs paid based on real needs
- Phase 6 (Advanced) → cross-platform insights, deeper automation
Relationship to Other PBS Projects
- YouTube Analytics Pipeline: Shared Streamlit dashboard, shared Google Cloud project, parallel development
- PBS Content Hub (Phase 5): SEO dashboard could become a Content Hub tab
- Instagram Automation: Cross-platform content performance analysis (search + social)
- WordPress-to-MySQL sync: Trigger SEO meta generation when new recipes are synced
- Authelia SSO: Will protect Streamlit dashboard access
- Yoast SEO plugin: Stays installed for technical plumbing (sitemaps, canonical URLs, Open Graph) — but meta content is now generated and pushed by the pipeline, not manually entered
Note on Yoast
Yoast stays installed but its role changes. It continues handling:
- XML sitemap generation
- Canonical URL management
- Open Graph / social sharing meta tags
- Basic schema markup (supplementing WPRM's recipe schema)
What it NO LONGER does:
- You stop manually filling in meta titles/descriptions (the pipeline does this)
- You ignore the content scoring stoplight (Claude's output is smarter than Yoast's rules)
- Focus keywords are set by data-driven keyword research, not gut feeling
Yoast becomes invisible plumbing. The pipeline becomes the brain.
Next Step: Phase 1 — Check if Google Search Console is connected for plantbasedsoutherner.com