pbs-obsidian-vault/PBS/Tech/Projects/pbs-seo-automation.md

507 lines
18 KiB
Markdown

---
project: pbs-seo-automation
type: project-plan
status: active
tags:
- pbs
- seo
- python
- automation
- n8n
- wordpress
- yoast
- streamlit
- analytics
- google-search-console
created: 2026-03-23
updated: 2026-03-23
path: PBS/Tech/Projects/
---
# PBS SEO Automation Pipeline
## Project Goal
Build a self-hosted SEO automation pipeline that replaces manual copy-paste
SEO workflows with an automated system: collecting search performance data,
tracking rankings over time, researching keywords, generating optimized
meta content via Claude API, and pushing it back to WordPress — all
orchestrated through n8n and visualized in Streamlit.
## Why This Matters
PBS is already getting organic search traffic, which means SEO is working
to some degree. But optimizing it is currently a manual, disjointed process
— Yoast shows fields but doesn't help fill them intelligently, and there's
no automation connecting keyword research to content optimization. This
project turns SEO from a chore into a data-driven pipeline that works
across all PBS content types.
## Content Types Covered
- **Recipes** (live) — highest SEO value, drives organic discovery
- **Blog/editorial** (live) — builds authority, targets informational
queries
- **Cookbook landing pages** (future) — transactional/promotional SEO
- **Merch pages** (future) — Product schema, transactional keywords
- **Membership/classes** (future) — funnel-driven, conversion-focused
The pipeline is designed to handle all content types from day one, even if
only recipes and blog posts exist today.
---
## Architecture Overview
```
Google Search Console API (free)
|
Python Collector (PyCharm + UV)
|
SQLite Database
|
┌────────────┴────────────┐
│ │
n8n Streamlit
(orchestration) (dashboard)
├─ Claude API (generate meta titles/descriptions)
├─ WordPress REST API (push meta back to Yoast)
└─ Google Chat (alerts & digests)
```
### Shared Infrastructure with YouTube Analytics Project
- Same Streamlit instance (separate pages/tabs)
- Same n8n server for orchestration
- Separate SQLite database (keeps projects independent)
- Same Traefik reverse proxy for dashboard access
- Same Google Cloud project for API credentials
---
## Phase 1: Google Search Console Setup
**Estimated Time:** 1-2 hours
**Goal:** Connect Search Console to PBS site and verify API access
### Tasks
- [ ] Verify Google Search Console is connected for plantbasedsoutherner.com
- If yes: confirm data is flowing, check how far back data goes
- If no: add property, verify via DNS (Cloudflare), wait for data
collection to begin
- [ ] Enable Google Search Console API in Google Cloud project
- Can reuse the same project created for YouTube Analytics
- [ ] Create service account OR extend existing OAuth credentials with
scope:
`https://www.googleapis.com/auth/webmasters.readonly`
- [ ] Test API access — pull a sample query report to confirm data flows
### Key Details
- Search Console retains 16 months of historical data
- Data is typically delayed 2-3 days
- API uses `google-api-python-client` (same library as YouTube project)
- Service account auth is simpler for automated/server-side collection (no
browser needed)
### Deliverable
Working API access to PBS search performance data
---
## Phase 2: Search Data Collector
**Estimated Time:** 3-4 hours
**Goal:** Python script that pulls search performance data into SQLite
**Tools:** PyCharm Professional, UV package manager
### Tasks
- [ ] Initialize project with UV (`uv init pbs-seo-analytics`)
- [ ] Install dependencies: `google-api-python-client`, `google-auth`
- [ ] Build auth module (service account preferred for server-side)
- [ ] Build search query collector (queries, impressions, clicks, CTR,
position by page)
- [ ] Build page performance collector (aggregate metrics per URL)
- [ ] Build device/country breakdown collector
- [ ] Design and create SQLite schema
- [ ] Implement data ingestion with upsert logic (idempotent runs)
- [ ] Add CLI interface for manual runs and backfill (up to 16 months)
- [ ] Initial backfill of all available historical data
### SQLite Schema (Initial Design)
```sql
-- Pages tracked on the site
CREATE TABLE pages (
url TEXT PRIMARY KEY,
page_type TEXT CHECK(page_type IN ('recipe', 'blog', 'merch',
'cookbook', 'membership', 'landing', 'other')),
title TEXT,
first_seen TEXT,
created_at TEXT DEFAULT CURRENT_TIMESTAMP,
updated_at TEXT DEFAULT CURRENT_TIMESTAMP
);
-- Daily search performance per query per page
CREATE TABLE search_queries (
date TEXT NOT NULL,
query TEXT NOT NULL,
page_url TEXT NOT NULL,
clicks INTEGER DEFAULT 0,
impressions INTEGER DEFAULT 0,
ctr REAL DEFAULT 0,
avg_position REAL DEFAULT 0,
device TEXT DEFAULT 'all',
country TEXT DEFAULT 'all',
created_at TEXT DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (date, query, page_url, device, country)
);
-- Daily aggregate performance per page
CREATE TABLE page_daily_metrics (
date TEXT NOT NULL,
page_url TEXT NOT NULL,
total_clicks INTEGER DEFAULT 0,
total_impressions INTEGER DEFAULT 0,
avg_ctr REAL DEFAULT 0,
avg_position REAL DEFAULT 0,
created_at TEXT DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (date, page_url)
);
-- Keyword tracking: queries we want to monitor over time
CREATE TABLE tracked_keywords (
keyword TEXT PRIMARY KEY,
category TEXT,
target_page_url TEXT,
added_at TEXT DEFAULT CURRENT_TIMESTAMP,
notes TEXT
);
-- Snapshot of rank position for tracked keywords
CREATE TABLE keyword_rank_history (
keyword TEXT NOT NULL,
date TEXT NOT NULL,
avg_position REAL,
impressions INTEGER DEFAULT 0,
clicks INTEGER DEFAULT 0,
best_page_url TEXT,
created_at TEXT DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (keyword, date),
FOREIGN KEY (keyword) REFERENCES tracked_keywords(keyword)
);
-- SEO meta content generated and applied
CREATE TABLE seo_meta_log (
page_url TEXT NOT NULL,
generated_at TEXT NOT NULL,
meta_title TEXT,
meta_description TEXT,
focus_keyword TEXT,
model_used TEXT DEFAULT 'claude-sonnet',
pushed_to_wordpress INTEGER DEFAULT 0,
pushed_at TEXT,
PRIMARY KEY (page_url, generated_at)
);
-- Site-level daily summary
CREATE TABLE site_daily_metrics (
date TEXT PRIMARY KEY,
total_clicks INTEGER DEFAULT 0,
total_impressions INTEGER DEFAULT 0,
avg_ctr REAL DEFAULT 0,
avg_position REAL DEFAULT 0,
unique_queries INTEGER DEFAULT 0,
created_at TEXT DEFAULT CURRENT_TIMESTAMP
);
```
### What the Data Tells You
Google Search Console API returns four core metrics per query/page
combination:
- **Clicks** — how many times someone clicked through to your site
- **Impressions** — how many times your page appeared in search results
- **CTR (Click-Through Rate)** — clicks / impressions
- **Average Position** — where you ranked (1 = top of page 1)
You can slice these by: date, query, page, device (mobile/desktop/tablet),
country, and search type (web/image/video).
### Deliverable
Python CLI tool that backfills and incrementally collects PBS search data
into SQLite
---
## Phase 3: n8n Orchestration + LLM Meta Generation
**Estimated Time:** 4-5 hours
**Goal:** Automate data collection, generate SEO meta content with Claude,
push to WordPress
### Tasks
- [ ] Create n8n workflow: daily scheduled trigger → Execute Command →
Python collector
- [ ] Build Claude API integration for meta generation:
- Input: page content, current keywords ranking, content type
- Output: optimized meta title, meta description, focus keyword
- System prompt tuned for PBS brand voice (whole food plant based,
southern, warm, NOT "vegan")
- [ ] Build WordPress REST API integration to push meta back to Yoast
fields:
- `_yoast_wpseo_title` (meta title)
- `_yoast_wpseo_metadesc` (meta description)
- `_yoast_wpseo_focuskw` (focus keyword)
- [ ] Add WPCode snippet to expose Yoast fields via WordPress REST API
(required for write access)
- [ ] Create approval workflow: generate meta → notify Travis/Jenny via
Google Chat → approve/reject → push to WordPress
- [ ] Create weekly SEO digest alert for Google Chat
- [ ] Error handling and failure notifications
### LLM Meta Generation Flow
```
n8n detects new/updated post in WordPress
Fetch page content + current search queries ranking for that URL
Send to Claude API with SEO-optimized system prompt
Claude generates: meta title, meta description, focus keyword
Store in seo_meta_log table
Send to Google Chat for approval
On approval: push to WordPress via REST API
```
### WordPress Integration Detail
Yoast's REST API is read-only by default. To write meta fields, we need a
small WPCode snippet that registers Yoast fields on the WordPress REST API.
This is a lightweight approach — about 20 lines of PHP via WPCode Lite
(already installed), no additional plugins needed.
Alternatively, n8n can update post meta directly via the WordPress API
using the `meta` field in a PUT request to `/wp-json/wp/v2/posts/`.
### Alert Ideas
- **New content alert:** "Jenny published a new recipe. Claude generated
meta — approve?"
- **Weekly digest:** Top gaining keywords, biggest position changes, pages
needing optimization
- **Opportunity alert:** "You're ranking #11 for 'plant based collard
greens' — small push could hit page 1"
- **Cannibalization alert:** Multiple PBS pages competing for the same
keyword
### Deliverable
Fully automated pipeline: collect → analyze → generate → approve → publish
SEO meta
---
## Phase 4: Streamlit SEO Dashboard
**Estimated Time:** 4-6 hours
**Goal:** Visual SEO analytics dashboard integrated alongside YouTube
analytics
### Tasks
- [ ] Add SEO pages to existing Streamlit app (or create separate app)
- [ ] Build search performance overview (clicks, impressions, CTR trends)
- [ ] Build keyword rank tracker (position changes over time)
- [ ] Build page-level deep dive (which queries drive traffic to each page)
- [ ] Build content gap analysis view (queries with high impressions but
low CTR)
- [ ] Build content type comparison (recipe SEO vs blog SEO performance)
- [ ] Build "opportunities" view (keywords close to page 1, quick wins)
- [ ] Build meta generation log view (what Claude generated, what was
approved)
### Dashboard Pages (Initial Concept)
1. **Search Overview** — total clicks/impressions/CTR trend, top queries,
top pages
2. **Keyword Tracker** — track specific keywords over time, position change
alerts
3. **Page Deep Dive** — select a page, see all queries driving traffic,
position trends
4. **Content Gaps** — high impression / low click pages (title/description
need work)
5. **Opportunities** — keywords ranking positions 8-20 (striking distance
of page 1)
6. **Content Type Breakdown** — SEO performance by content type (recipe vs
blog vs merch)
7. **Meta Generation Log** — what Claude generated, approval status,
before/after
### Deliverable
Live SEO dashboard with actionable insights for content strategy
---
## Phase 5: Competitor Intelligence (Open — Free vs Paid)
**Estimated Time:** TBD based on approach
**Goal:** Understand competitive landscape and find content opportunities
### Option A: DIY / Free Approach
- **Manual competitor research:** Periodically Google target keywords and
note who ranks
- **Python scraping:** Build a lightweight rank checker that searches
Google for target keywords and records positions (note: Google may
rate-limit or block; use responsibly)
- **Free tools:** Google Trends API for search interest over time,
AnswerThePublic for question-based keyword ideas
- **Search Console mining:** Analyze existing query data to find patterns
and gaps — you'd be surprised how much insight is already in your own data
- **Cost:** $0
- **Limitation:** No competitor backlink data, no domain authority scores,
limited keyword volume estimates
### Option B: Budget Paid Tools (~$50-75/month)
- **SERPApi or DataForSEO:** Programmatic access to Google search results
- Track competitor rankings for your target keywords
- Get search volume estimates
- API-friendly, integrates cleanly with Python pipeline
- **Best for:** Automated daily rank tracking beyond what Search Console
provides
- **Cost:** ~$50-75/month depending on query volume
### Option C: Full SEO Platform (~$99-200+/month)
- **Ahrefs, SEMrush, or Moz:** Comprehensive SEO intelligence
- Competitor keyword analysis (what they rank for that you don't)
- Backlink profiles and domain authority
- Content gap analysis at scale
- Keyword difficulty scores
- **Best for:** When you've outgrown Search Console data and need
competitive intelligence
- **Cost:** $99-200+/month
### Recommendation
Start with Option A (free). Build the pipeline around Google Search Console
data first. After 1-2 months of collecting data, evaluate what questions
you can't answer with free data alone. That will tell you whether Option B
or C is worth the investment. Many sites PBS's size never need to go past
Option A.
### Deliverable
Decision on competitive intelligence approach based on data from earlier
phases
---
## Phase 6: Advanced SEO Automation & Iteration
**Estimated Time:** Ongoing
**Goal:** Deepen automation and cross-platform insights
### Future Ideas
- [ ] Auto-detect new WordPress posts and trigger SEO meta generation
without manual intervention
- [ ] Cross-reference YouTube retention data with recipe page SEO
performance (which videos drive search traffic?)
- [ ] Automated internal linking suggestions (connect related recipes/blog
posts)
- [ ] Schema markup validation and monitoring (ensure WPRM recipe schema
stays healthy)
- [ ] Page speed monitoring integration (Core Web Vitals affect rankings)
- [ ] Seasonal keyword planning (predict trending search terms by season
for recipe content)
- [ ] A/B test meta titles: generate two versions, measure CTR difference
- [ ] Content calendar integration: use keyword gaps to suggest what Jenny
should create next
- [ ] Extend to merch, cookbook, and membership pages as they launch
---
## Prerequisites & Dependencies
| Requirement | Status | Notes |
|---|---|---|
| Google Search Console verified | Needs check | May already be connected
via Workspace |
| Google Cloud project | Shared | Same project as YouTube Analytics |
| Search Console API enabled | Needed | Free, quota-based |
| OAuth/Service Account credentials | Needed | Can extend existing YouTube
creds |
| Python + UV | Ready | Travis's local dev setup |
| Anthropic API key | Needed | For Claude meta generation |
| WPCode Lite (WordPress) | Ready | Already installed — needed for REST API
Yoast fields |
| n8n | Ready | Already running on Linode |
| Streamlit | Shared | Same instance as YouTube dashboard |
---
## API Quotas & Costs
| Service | Quota/Cost | Notes |
|---|---|---|
| Google Search Console API | 2000 queries/day (free) | More than enough
for PBS |
| Claude API (Sonnet) | ~$0.003 per meta generation | Pennies per recipe |
| WordPress REST API | Unlimited (self-hosted) | No external cost |
| Google Chat webhooks | Unlimited (free) | Already configured for n8n |
---
## Key Decisions
| Decision | Choice | Rationale |
|---|---|---|
| Data source | Google Search Console (free) | Actual Google data, not
estimates. 16 months history. Sufficient for PBS scale. |
| Competitor intelligence | Deferred (Phase 5) | Start free, evaluate need
after collecting own data. |
| LLM for meta generation | Claude API (Anthropic) | Consistent with PBS
brand, excellent at structured content, cost-effective. |
| Meta push to WordPress | REST API via WPCode snippet | Lightweight, no
extra plugins, uses existing WPCode Lite install. |
| Dashboard | Streamlit (shared with YouTube) | Single analytics platform
for all PBS data. |
| Approval workflow | Google Chat notification | Keeps human in the loop
before meta goes live. Jenny/Travis approve. |
---
## Sequencing & Priority
1. **Phase 1** (Search Console Setup) → unblocks data collection
2. **Phase 2** (Data Collector) → starts building historical dataset,
enables analysis
3. **Phase 3** (n8n + LLM Meta Generation) → the automation sweet spot — no
more copy-paste
4. **Phase 4** (Streamlit Dashboard) → visualize what's working, find
opportunities
5. **Phase 5** (Competitor Intelligence) → evaluate free vs paid based on
real needs
6. **Phase 6** (Advanced) → cross-platform insights, deeper automation
---
## Relationship to Other PBS Projects
- **YouTube Analytics Pipeline:** Shared Streamlit dashboard, shared Google
Cloud project, parallel development
- **PBS Content Hub (Phase 5):** SEO dashboard could become a Content Hub
tab
- **Instagram Automation:** Cross-platform content performance analysis
(search + social)
- **WordPress-to-MySQL sync:** Trigger SEO meta generation when new recipes
are synced
- **Authelia SSO:** Will protect Streamlit dashboard access
- **Yoast SEO plugin:** Stays installed for technical plumbing (sitemaps,
canonical URLs, Open Graph) — but meta content is now generated and pushed
by the pipeline, not manually entered
---
## Note on Yoast
Yoast stays installed but its role changes. It continues handling:
- XML sitemap generation
- Canonical URL management
- Open Graph / social sharing meta tags
- Basic schema markup (supplementing WPRM's recipe schema)
What it NO LONGER does:
- You stop manually filling in meta titles/descriptions (the pipeline does
this)
- You ignore the content scoring stoplight (Claude's output is smarter than
Yoast's rules)
- Focus keywords are set by data-driven keyword research, not gut feeling
Yoast becomes invisible plumbing. The pipeline becomes the brain.
---
*Next Step: Phase 1 — Check if Google Search Console is connected for
plantbasedsoutherner.com*