Create pbs-seo-automation.md via n8n
This commit is contained in:
parent
71cfea51db
commit
98a30ea354
507
PBS/Tech/Projects/pbs-seo-automation.md
Normal file
507
PBS/Tech/Projects/pbs-seo-automation.md
Normal file
@ -0,0 +1,507 @@
|
||||
---
|
||||
project: pbs-seo-automation
|
||||
type: project-plan
|
||||
status: active
|
||||
tags:
|
||||
- pbs
|
||||
- seo
|
||||
- python
|
||||
- automation
|
||||
- n8n
|
||||
- wordpress
|
||||
- yoast
|
||||
- streamlit
|
||||
- analytics
|
||||
- google-search-console
|
||||
created: 2026-03-23
|
||||
updated: 2026-03-23
|
||||
path: PBS/Tech/Projects/
|
||||
---
|
||||
|
||||
# PBS SEO Automation Pipeline
|
||||
|
||||
## Project Goal
|
||||
Build a self-hosted SEO automation pipeline that replaces manual copy-paste
|
||||
SEO workflows with an automated system: collecting search performance data,
|
||||
tracking rankings over time, researching keywords, generating optimized
|
||||
meta content via Claude API, and pushing it back to WordPress — all
|
||||
orchestrated through n8n and visualized in Streamlit.
|
||||
|
||||
## Why This Matters
|
||||
PBS is already getting organic search traffic, which means SEO is working
|
||||
to some degree. But optimizing it is currently a manual, disjointed process
|
||||
— Yoast shows fields but doesn't help fill them intelligently, and there's
|
||||
no automation connecting keyword research to content optimization. This
|
||||
project turns SEO from a chore into a data-driven pipeline that works
|
||||
across all PBS content types.
|
||||
|
||||
## Content Types Covered
|
||||
- **Recipes** (live) — highest SEO value, drives organic discovery
|
||||
- **Blog/editorial** (live) — builds authority, targets informational
|
||||
queries
|
||||
- **Cookbook landing pages** (future) — transactional/promotional SEO
|
||||
- **Merch pages** (future) — Product schema, transactional keywords
|
||||
- **Membership/classes** (future) — funnel-driven, conversion-focused
|
||||
|
||||
The pipeline is designed to handle all content types from day one, even if
|
||||
only recipes and blog posts exist today.
|
||||
|
||||
---
|
||||
|
||||
## Architecture Overview
|
||||
|
||||
```
|
||||
Google Search Console API (free)
|
||||
|
|
||||
Python Collector (PyCharm + UV)
|
||||
|
|
||||
SQLite Database
|
||||
|
|
||||
┌────────────┴────────────┐
|
||||
│ │
|
||||
n8n Streamlit
|
||||
(orchestration) (dashboard)
|
||||
│
|
||||
├─ Claude API (generate meta titles/descriptions)
|
||||
├─ WordPress REST API (push meta back to Yoast)
|
||||
└─ Google Chat (alerts & digests)
|
||||
```
|
||||
|
||||
### Shared Infrastructure with YouTube Analytics Project
|
||||
- Same Streamlit instance (separate pages/tabs)
|
||||
- Same n8n server for orchestration
|
||||
- Separate SQLite database (keeps projects independent)
|
||||
- Same Traefik reverse proxy for dashboard access
|
||||
- Same Google Cloud project for API credentials
|
||||
|
||||
---
|
||||
|
||||
## Phase 1: Google Search Console Setup
|
||||
**Estimated Time:** 1-2 hours
|
||||
**Goal:** Connect Search Console to PBS site and verify API access
|
||||
|
||||
### Tasks
|
||||
- [ ] Verify Google Search Console is connected for plantbasedsoutherner.com
|
||||
- If yes: confirm data is flowing, check how far back data goes
|
||||
- If no: add property, verify via DNS (Cloudflare), wait for data
|
||||
collection to begin
|
||||
- [ ] Enable Google Search Console API in Google Cloud project
|
||||
- Can reuse the same project created for YouTube Analytics
|
||||
- [ ] Create service account OR extend existing OAuth credentials with
|
||||
scope:
|
||||
`https://www.googleapis.com/auth/webmasters.readonly`
|
||||
- [ ] Test API access — pull a sample query report to confirm data flows
|
||||
|
||||
### Key Details
|
||||
- Search Console retains 16 months of historical data
|
||||
- Data is typically delayed 2-3 days
|
||||
- API uses `google-api-python-client` (same library as YouTube project)
|
||||
- Service account auth is simpler for automated/server-side collection (no
|
||||
browser needed)
|
||||
|
||||
### Deliverable
|
||||
Working API access to PBS search performance data
|
||||
|
||||
---
|
||||
|
||||
## Phase 2: Search Data Collector
|
||||
**Estimated Time:** 3-4 hours
|
||||
**Goal:** Python script that pulls search performance data into SQLite
|
||||
**Tools:** PyCharm Professional, UV package manager
|
||||
|
||||
### Tasks
|
||||
- [ ] Initialize project with UV (`uv init pbs-seo-analytics`)
|
||||
- [ ] Install dependencies: `google-api-python-client`, `google-auth`
|
||||
- [ ] Build auth module (service account preferred for server-side)
|
||||
- [ ] Build search query collector (queries, impressions, clicks, CTR,
|
||||
position by page)
|
||||
- [ ] Build page performance collector (aggregate metrics per URL)
|
||||
- [ ] Build device/country breakdown collector
|
||||
- [ ] Design and create SQLite schema
|
||||
- [ ] Implement data ingestion with upsert logic (idempotent runs)
|
||||
- [ ] Add CLI interface for manual runs and backfill (up to 16 months)
|
||||
- [ ] Initial backfill of all available historical data
|
||||
|
||||
### SQLite Schema (Initial Design)
|
||||
|
||||
```sql
|
||||
-- Pages tracked on the site
|
||||
CREATE TABLE pages (
|
||||
url TEXT PRIMARY KEY,
|
||||
page_type TEXT CHECK(page_type IN ('recipe', 'blog', 'merch',
|
||||
'cookbook', 'membership', 'landing', 'other')),
|
||||
title TEXT,
|
||||
first_seen TEXT,
|
||||
created_at TEXT DEFAULT CURRENT_TIMESTAMP,
|
||||
updated_at TEXT DEFAULT CURRENT_TIMESTAMP
|
||||
);
|
||||
|
||||
-- Daily search performance per query per page
|
||||
CREATE TABLE search_queries (
|
||||
date TEXT NOT NULL,
|
||||
query TEXT NOT NULL,
|
||||
page_url TEXT NOT NULL,
|
||||
clicks INTEGER DEFAULT 0,
|
||||
impressions INTEGER DEFAULT 0,
|
||||
ctr REAL DEFAULT 0,
|
||||
avg_position REAL DEFAULT 0,
|
||||
device TEXT DEFAULT 'all',
|
||||
country TEXT DEFAULT 'all',
|
||||
created_at TEXT DEFAULT CURRENT_TIMESTAMP,
|
||||
PRIMARY KEY (date, query, page_url, device, country)
|
||||
);
|
||||
|
||||
-- Daily aggregate performance per page
|
||||
CREATE TABLE page_daily_metrics (
|
||||
date TEXT NOT NULL,
|
||||
page_url TEXT NOT NULL,
|
||||
total_clicks INTEGER DEFAULT 0,
|
||||
total_impressions INTEGER DEFAULT 0,
|
||||
avg_ctr REAL DEFAULT 0,
|
||||
avg_position REAL DEFAULT 0,
|
||||
created_at TEXT DEFAULT CURRENT_TIMESTAMP,
|
||||
PRIMARY KEY (date, page_url)
|
||||
);
|
||||
|
||||
-- Keyword tracking: queries we want to monitor over time
|
||||
CREATE TABLE tracked_keywords (
|
||||
keyword TEXT PRIMARY KEY,
|
||||
category TEXT,
|
||||
target_page_url TEXT,
|
||||
added_at TEXT DEFAULT CURRENT_TIMESTAMP,
|
||||
notes TEXT
|
||||
);
|
||||
|
||||
-- Snapshot of rank position for tracked keywords
|
||||
CREATE TABLE keyword_rank_history (
|
||||
keyword TEXT NOT NULL,
|
||||
date TEXT NOT NULL,
|
||||
avg_position REAL,
|
||||
impressions INTEGER DEFAULT 0,
|
||||
clicks INTEGER DEFAULT 0,
|
||||
best_page_url TEXT,
|
||||
created_at TEXT DEFAULT CURRENT_TIMESTAMP,
|
||||
PRIMARY KEY (keyword, date),
|
||||
FOREIGN KEY (keyword) REFERENCES tracked_keywords(keyword)
|
||||
);
|
||||
|
||||
-- SEO meta content generated and applied
|
||||
CREATE TABLE seo_meta_log (
|
||||
page_url TEXT NOT NULL,
|
||||
generated_at TEXT NOT NULL,
|
||||
meta_title TEXT,
|
||||
meta_description TEXT,
|
||||
focus_keyword TEXT,
|
||||
model_used TEXT DEFAULT 'claude-sonnet',
|
||||
pushed_to_wordpress INTEGER DEFAULT 0,
|
||||
pushed_at TEXT,
|
||||
PRIMARY KEY (page_url, generated_at)
|
||||
);
|
||||
|
||||
-- Site-level daily summary
|
||||
CREATE TABLE site_daily_metrics (
|
||||
date TEXT PRIMARY KEY,
|
||||
total_clicks INTEGER DEFAULT 0,
|
||||
total_impressions INTEGER DEFAULT 0,
|
||||
avg_ctr REAL DEFAULT 0,
|
||||
avg_position REAL DEFAULT 0,
|
||||
unique_queries INTEGER DEFAULT 0,
|
||||
created_at TEXT DEFAULT CURRENT_TIMESTAMP
|
||||
);
|
||||
```
|
||||
|
||||
### What the Data Tells You
|
||||
Google Search Console API returns four core metrics per query/page
|
||||
combination:
|
||||
- **Clicks** — how many times someone clicked through to your site
|
||||
- **Impressions** — how many times your page appeared in search results
|
||||
- **CTR (Click-Through Rate)** — clicks / impressions
|
||||
- **Average Position** — where you ranked (1 = top of page 1)
|
||||
|
||||
You can slice these by: date, query, page, device (mobile/desktop/tablet),
|
||||
country, and search type (web/image/video).
|
||||
|
||||
### Deliverable
|
||||
Python CLI tool that backfills and incrementally collects PBS search data
|
||||
into SQLite
|
||||
|
||||
---
|
||||
|
||||
## Phase 3: n8n Orchestration + LLM Meta Generation
|
||||
**Estimated Time:** 4-5 hours
|
||||
**Goal:** Automate data collection, generate SEO meta content with Claude,
|
||||
push to WordPress
|
||||
|
||||
### Tasks
|
||||
- [ ] Create n8n workflow: daily scheduled trigger → Execute Command →
|
||||
Python collector
|
||||
- [ ] Build Claude API integration for meta generation:
|
||||
- Input: page content, current keywords ranking, content type
|
||||
- Output: optimized meta title, meta description, focus keyword
|
||||
- System prompt tuned for PBS brand voice (whole food plant based,
|
||||
southern, warm, NOT "vegan")
|
||||
- [ ] Build WordPress REST API integration to push meta back to Yoast
|
||||
fields:
|
||||
- `_yoast_wpseo_title` (meta title)
|
||||
- `_yoast_wpseo_metadesc` (meta description)
|
||||
- `_yoast_wpseo_focuskw` (focus keyword)
|
||||
- [ ] Add WPCode snippet to expose Yoast fields via WordPress REST API
|
||||
(required for write access)
|
||||
- [ ] Create approval workflow: generate meta → notify Travis/Jenny via
|
||||
Google Chat → approve/reject → push to WordPress
|
||||
- [ ] Create weekly SEO digest alert for Google Chat
|
||||
- [ ] Error handling and failure notifications
|
||||
|
||||
### LLM Meta Generation Flow
|
||||
```
|
||||
n8n detects new/updated post in WordPress
|
||||
│
|
||||
Fetch page content + current search queries ranking for that URL
|
||||
│
|
||||
Send to Claude API with SEO-optimized system prompt
|
||||
│
|
||||
Claude generates: meta title, meta description, focus keyword
|
||||
│
|
||||
Store in seo_meta_log table
|
||||
│
|
||||
Send to Google Chat for approval
|
||||
│
|
||||
On approval: push to WordPress via REST API
|
||||
```
|
||||
|
||||
### WordPress Integration Detail
|
||||
Yoast's REST API is read-only by default. To write meta fields, we need a
|
||||
small WPCode snippet that registers Yoast fields on the WordPress REST API.
|
||||
This is a lightweight approach — about 20 lines of PHP via WPCode Lite
|
||||
(already installed), no additional plugins needed.
|
||||
|
||||
Alternatively, n8n can update post meta directly via the WordPress API
|
||||
using the `meta` field in a PUT request to `/wp-json/wp/v2/posts/`.
|
||||
|
||||
### Alert Ideas
|
||||
- **New content alert:** "Jenny published a new recipe. Claude generated
|
||||
meta — approve?"
|
||||
- **Weekly digest:** Top gaining keywords, biggest position changes, pages
|
||||
needing optimization
|
||||
- **Opportunity alert:** "You're ranking #11 for 'plant based collard
|
||||
greens' — small push could hit page 1"
|
||||
- **Cannibalization alert:** Multiple PBS pages competing for the same
|
||||
keyword
|
||||
|
||||
### Deliverable
|
||||
Fully automated pipeline: collect → analyze → generate → approve → publish
|
||||
SEO meta
|
||||
|
||||
---
|
||||
|
||||
## Phase 4: Streamlit SEO Dashboard
|
||||
**Estimated Time:** 4-6 hours
|
||||
**Goal:** Visual SEO analytics dashboard integrated alongside YouTube
|
||||
analytics
|
||||
|
||||
### Tasks
|
||||
- [ ] Add SEO pages to existing Streamlit app (or create separate app)
|
||||
- [ ] Build search performance overview (clicks, impressions, CTR trends)
|
||||
- [ ] Build keyword rank tracker (position changes over time)
|
||||
- [ ] Build page-level deep dive (which queries drive traffic to each page)
|
||||
- [ ] Build content gap analysis view (queries with high impressions but
|
||||
low CTR)
|
||||
- [ ] Build content type comparison (recipe SEO vs blog SEO performance)
|
||||
- [ ] Build "opportunities" view (keywords close to page 1, quick wins)
|
||||
- [ ] Build meta generation log view (what Claude generated, what was
|
||||
approved)
|
||||
|
||||
### Dashboard Pages (Initial Concept)
|
||||
1. **Search Overview** — total clicks/impressions/CTR trend, top queries,
|
||||
top pages
|
||||
2. **Keyword Tracker** — track specific keywords over time, position change
|
||||
alerts
|
||||
3. **Page Deep Dive** — select a page, see all queries driving traffic,
|
||||
position trends
|
||||
4. **Content Gaps** — high impression / low click pages (title/description
|
||||
need work)
|
||||
5. **Opportunities** — keywords ranking positions 8-20 (striking distance
|
||||
of page 1)
|
||||
6. **Content Type Breakdown** — SEO performance by content type (recipe vs
|
||||
blog vs merch)
|
||||
7. **Meta Generation Log** — what Claude generated, approval status,
|
||||
before/after
|
||||
|
||||
### Deliverable
|
||||
Live SEO dashboard with actionable insights for content strategy
|
||||
|
||||
---
|
||||
|
||||
## Phase 5: Competitor Intelligence (Open — Free vs Paid)
|
||||
**Estimated Time:** TBD based on approach
|
||||
**Goal:** Understand competitive landscape and find content opportunities
|
||||
|
||||
### Option A: DIY / Free Approach
|
||||
- **Manual competitor research:** Periodically Google target keywords and
|
||||
note who ranks
|
||||
- **Python scraping:** Build a lightweight rank checker that searches
|
||||
Google for target keywords and records positions (note: Google may
|
||||
rate-limit or block; use responsibly)
|
||||
- **Free tools:** Google Trends API for search interest over time,
|
||||
AnswerThePublic for question-based keyword ideas
|
||||
- **Search Console mining:** Analyze existing query data to find patterns
|
||||
and gaps — you'd be surprised how much insight is already in your own data
|
||||
- **Cost:** $0
|
||||
- **Limitation:** No competitor backlink data, no domain authority scores,
|
||||
limited keyword volume estimates
|
||||
|
||||
### Option B: Budget Paid Tools (~$50-75/month)
|
||||
- **SERPApi or DataForSEO:** Programmatic access to Google search results
|
||||
- Track competitor rankings for your target keywords
|
||||
- Get search volume estimates
|
||||
- API-friendly, integrates cleanly with Python pipeline
|
||||
- **Best for:** Automated daily rank tracking beyond what Search Console
|
||||
provides
|
||||
- **Cost:** ~$50-75/month depending on query volume
|
||||
|
||||
### Option C: Full SEO Platform (~$99-200+/month)
|
||||
- **Ahrefs, SEMrush, or Moz:** Comprehensive SEO intelligence
|
||||
- Competitor keyword analysis (what they rank for that you don't)
|
||||
- Backlink profiles and domain authority
|
||||
- Content gap analysis at scale
|
||||
- Keyword difficulty scores
|
||||
- **Best for:** When you've outgrown Search Console data and need
|
||||
competitive intelligence
|
||||
- **Cost:** $99-200+/month
|
||||
|
||||
### Recommendation
|
||||
Start with Option A (free). Build the pipeline around Google Search Console
|
||||
data first. After 1-2 months of collecting data, evaluate what questions
|
||||
you can't answer with free data alone. That will tell you whether Option B
|
||||
or C is worth the investment. Many sites PBS's size never need to go past
|
||||
Option A.
|
||||
|
||||
### Deliverable
|
||||
Decision on competitive intelligence approach based on data from earlier
|
||||
phases
|
||||
|
||||
---
|
||||
|
||||
## Phase 6: Advanced SEO Automation & Iteration
|
||||
**Estimated Time:** Ongoing
|
||||
**Goal:** Deepen automation and cross-platform insights
|
||||
|
||||
### Future Ideas
|
||||
- [ ] Auto-detect new WordPress posts and trigger SEO meta generation
|
||||
without manual intervention
|
||||
- [ ] Cross-reference YouTube retention data with recipe page SEO
|
||||
performance (which videos drive search traffic?)
|
||||
- [ ] Automated internal linking suggestions (connect related recipes/blog
|
||||
posts)
|
||||
- [ ] Schema markup validation and monitoring (ensure WPRM recipe schema
|
||||
stays healthy)
|
||||
- [ ] Page speed monitoring integration (Core Web Vitals affect rankings)
|
||||
- [ ] Seasonal keyword planning (predict trending search terms by season
|
||||
for recipe content)
|
||||
- [ ] A/B test meta titles: generate two versions, measure CTR difference
|
||||
- [ ] Content calendar integration: use keyword gaps to suggest what Jenny
|
||||
should create next
|
||||
- [ ] Extend to merch, cookbook, and membership pages as they launch
|
||||
|
||||
---
|
||||
|
||||
## Prerequisites & Dependencies
|
||||
|
||||
| Requirement | Status | Notes |
|
||||
|---|---|---|
|
||||
| Google Search Console verified | Needs check | May already be connected
|
||||
via Workspace |
|
||||
| Google Cloud project | Shared | Same project as YouTube Analytics |
|
||||
| Search Console API enabled | Needed | Free, quota-based |
|
||||
| OAuth/Service Account credentials | Needed | Can extend existing YouTube
|
||||
creds |
|
||||
| Python + UV | Ready | Travis's local dev setup |
|
||||
| Anthropic API key | Needed | For Claude meta generation |
|
||||
| WPCode Lite (WordPress) | Ready | Already installed — needed for REST API
|
||||
Yoast fields |
|
||||
| n8n | Ready | Already running on Linode |
|
||||
| Streamlit | Shared | Same instance as YouTube dashboard |
|
||||
|
||||
---
|
||||
|
||||
## API Quotas & Costs
|
||||
|
||||
| Service | Quota/Cost | Notes |
|
||||
|---|---|---|
|
||||
| Google Search Console API | 2000 queries/day (free) | More than enough
|
||||
for PBS |
|
||||
| Claude API (Sonnet) | ~$0.003 per meta generation | Pennies per recipe |
|
||||
| WordPress REST API | Unlimited (self-hosted) | No external cost |
|
||||
| Google Chat webhooks | Unlimited (free) | Already configured for n8n |
|
||||
|
||||
---
|
||||
|
||||
## Key Decisions
|
||||
|
||||
| Decision | Choice | Rationale |
|
||||
|---|---|---|
|
||||
| Data source | Google Search Console (free) | Actual Google data, not
|
||||
estimates. 16 months history. Sufficient for PBS scale. |
|
||||
| Competitor intelligence | Deferred (Phase 5) | Start free, evaluate need
|
||||
after collecting own data. |
|
||||
| LLM for meta generation | Claude API (Anthropic) | Consistent with PBS
|
||||
brand, excellent at structured content, cost-effective. |
|
||||
| Meta push to WordPress | REST API via WPCode snippet | Lightweight, no
|
||||
extra plugins, uses existing WPCode Lite install. |
|
||||
| Dashboard | Streamlit (shared with YouTube) | Single analytics platform
|
||||
for all PBS data. |
|
||||
| Approval workflow | Google Chat notification | Keeps human in the loop
|
||||
before meta goes live. Jenny/Travis approve. |
|
||||
|
||||
---
|
||||
|
||||
## Sequencing & Priority
|
||||
1. **Phase 1** (Search Console Setup) → unblocks data collection
|
||||
2. **Phase 2** (Data Collector) → starts building historical dataset,
|
||||
enables analysis
|
||||
3. **Phase 3** (n8n + LLM Meta Generation) → the automation sweet spot — no
|
||||
more copy-paste
|
||||
4. **Phase 4** (Streamlit Dashboard) → visualize what's working, find
|
||||
opportunities
|
||||
5. **Phase 5** (Competitor Intelligence) → evaluate free vs paid based on
|
||||
real needs
|
||||
6. **Phase 6** (Advanced) → cross-platform insights, deeper automation
|
||||
|
||||
---
|
||||
|
||||
## Relationship to Other PBS Projects
|
||||
- **YouTube Analytics Pipeline:** Shared Streamlit dashboard, shared Google
|
||||
Cloud project, parallel development
|
||||
- **PBS Content Hub (Phase 5):** SEO dashboard could become a Content Hub
|
||||
tab
|
||||
- **Instagram Automation:** Cross-platform content performance analysis
|
||||
(search + social)
|
||||
- **WordPress-to-MySQL sync:** Trigger SEO meta generation when new recipes
|
||||
are synced
|
||||
- **Authelia SSO:** Will protect Streamlit dashboard access
|
||||
- **Yoast SEO plugin:** Stays installed for technical plumbing (sitemaps,
|
||||
canonical URLs, Open Graph) — but meta content is now generated and pushed
|
||||
by the pipeline, not manually entered
|
||||
|
||||
---
|
||||
|
||||
## Note on Yoast
|
||||
Yoast stays installed but its role changes. It continues handling:
|
||||
- XML sitemap generation
|
||||
- Canonical URL management
|
||||
- Open Graph / social sharing meta tags
|
||||
- Basic schema markup (supplementing WPRM's recipe schema)
|
||||
|
||||
What it NO LONGER does:
|
||||
- You stop manually filling in meta titles/descriptions (the pipeline does
|
||||
this)
|
||||
- You ignore the content scoring stoplight (Claude's output is smarter than
|
||||
Yoast's rules)
|
||||
- Focus keywords are set by data-driven keyword research, not gut feeling
|
||||
|
||||
Yoast becomes invisible plumbing. The pipeline becomes the brain.
|
||||
|
||||
---
|
||||
|
||||
*Next Step: Phase 1 — Check if Google Search Console is connected for
|
||||
plantbasedsoutherner.com*
|
||||
Loading…
Reference in New Issue
Block a user