diff --git a/PBS/Tech/Projects/pbs-seo-automation.md b/PBS/Tech/Projects/pbs-seo-automation.md new file mode 100644 index 0000000..372235f --- /dev/null +++ b/PBS/Tech/Projects/pbs-seo-automation.md @@ -0,0 +1,507 @@ +--- +project: pbs-seo-automation +type: project-plan +status: active +tags: + - pbs + - seo + - python + - automation + - n8n + - wordpress + - yoast + - streamlit + - analytics + - google-search-console +created: 2026-03-23 +updated: 2026-03-23 +path: PBS/Tech/Projects/ +--- + +# PBS SEO Automation Pipeline + +## Project Goal +Build a self-hosted SEO automation pipeline that replaces manual copy-paste +SEO workflows with an automated system: collecting search performance data, +tracking rankings over time, researching keywords, generating optimized +meta content via Claude API, and pushing it back to WordPress — all +orchestrated through n8n and visualized in Streamlit. + +## Why This Matters +PBS is already getting organic search traffic, which means SEO is working +to some degree. But optimizing it is currently a manual, disjointed process +— Yoast shows fields but doesn't help fill them intelligently, and there's +no automation connecting keyword research to content optimization. This +project turns SEO from a chore into a data-driven pipeline that works +across all PBS content types. + +## Content Types Covered +- **Recipes** (live) — highest SEO value, drives organic discovery +- **Blog/editorial** (live) — builds authority, targets informational +queries +- **Cookbook landing pages** (future) — transactional/promotional SEO +- **Merch pages** (future) — Product schema, transactional keywords +- **Membership/classes** (future) — funnel-driven, conversion-focused + +The pipeline is designed to handle all content types from day one, even if +only recipes and blog posts exist today. + +--- + +## Architecture Overview + +``` +Google Search Console API (free) + | + Python Collector (PyCharm + UV) + | + SQLite Database + | + ┌────────────┴────────────┐ + │ │ + n8n Streamlit +(orchestration) (dashboard) + │ + ├─ Claude API (generate meta titles/descriptions) + ├─ WordPress REST API (push meta back to Yoast) + └─ Google Chat (alerts & digests) +``` + +### Shared Infrastructure with YouTube Analytics Project +- Same Streamlit instance (separate pages/tabs) +- Same n8n server for orchestration +- Separate SQLite database (keeps projects independent) +- Same Traefik reverse proxy for dashboard access +- Same Google Cloud project for API credentials + +--- + +## Phase 1: Google Search Console Setup +**Estimated Time:** 1-2 hours +**Goal:** Connect Search Console to PBS site and verify API access + +### Tasks +- [ ] Verify Google Search Console is connected for plantbasedsoutherner.com + - If yes: confirm data is flowing, check how far back data goes + - If no: add property, verify via DNS (Cloudflare), wait for data +collection to begin +- [ ] Enable Google Search Console API in Google Cloud project + - Can reuse the same project created for YouTube Analytics +- [ ] Create service account OR extend existing OAuth credentials with +scope: + `https://www.googleapis.com/auth/webmasters.readonly` +- [ ] Test API access — pull a sample query report to confirm data flows + +### Key Details +- Search Console retains 16 months of historical data +- Data is typically delayed 2-3 days +- API uses `google-api-python-client` (same library as YouTube project) +- Service account auth is simpler for automated/server-side collection (no +browser needed) + +### Deliverable +Working API access to PBS search performance data + +--- + +## Phase 2: Search Data Collector +**Estimated Time:** 3-4 hours +**Goal:** Python script that pulls search performance data into SQLite +**Tools:** PyCharm Professional, UV package manager + +### Tasks +- [ ] Initialize project with UV (`uv init pbs-seo-analytics`) +- [ ] Install dependencies: `google-api-python-client`, `google-auth` +- [ ] Build auth module (service account preferred for server-side) +- [ ] Build search query collector (queries, impressions, clicks, CTR, +position by page) +- [ ] Build page performance collector (aggregate metrics per URL) +- [ ] Build device/country breakdown collector +- [ ] Design and create SQLite schema +- [ ] Implement data ingestion with upsert logic (idempotent runs) +- [ ] Add CLI interface for manual runs and backfill (up to 16 months) +- [ ] Initial backfill of all available historical data + +### SQLite Schema (Initial Design) + +```sql +-- Pages tracked on the site +CREATE TABLE pages ( + url TEXT PRIMARY KEY, + page_type TEXT CHECK(page_type IN ('recipe', 'blog', 'merch', +'cookbook', 'membership', 'landing', 'other')), + title TEXT, + first_seen TEXT, + created_at TEXT DEFAULT CURRENT_TIMESTAMP, + updated_at TEXT DEFAULT CURRENT_TIMESTAMP +); + +-- Daily search performance per query per page +CREATE TABLE search_queries ( + date TEXT NOT NULL, + query TEXT NOT NULL, + page_url TEXT NOT NULL, + clicks INTEGER DEFAULT 0, + impressions INTEGER DEFAULT 0, + ctr REAL DEFAULT 0, + avg_position REAL DEFAULT 0, + device TEXT DEFAULT 'all', + country TEXT DEFAULT 'all', + created_at TEXT DEFAULT CURRENT_TIMESTAMP, + PRIMARY KEY (date, query, page_url, device, country) +); + +-- Daily aggregate performance per page +CREATE TABLE page_daily_metrics ( + date TEXT NOT NULL, + page_url TEXT NOT NULL, + total_clicks INTEGER DEFAULT 0, + total_impressions INTEGER DEFAULT 0, + avg_ctr REAL DEFAULT 0, + avg_position REAL DEFAULT 0, + created_at TEXT DEFAULT CURRENT_TIMESTAMP, + PRIMARY KEY (date, page_url) +); + +-- Keyword tracking: queries we want to monitor over time +CREATE TABLE tracked_keywords ( + keyword TEXT PRIMARY KEY, + category TEXT, + target_page_url TEXT, + added_at TEXT DEFAULT CURRENT_TIMESTAMP, + notes TEXT +); + +-- Snapshot of rank position for tracked keywords +CREATE TABLE keyword_rank_history ( + keyword TEXT NOT NULL, + date TEXT NOT NULL, + avg_position REAL, + impressions INTEGER DEFAULT 0, + clicks INTEGER DEFAULT 0, + best_page_url TEXT, + created_at TEXT DEFAULT CURRENT_TIMESTAMP, + PRIMARY KEY (keyword, date), + FOREIGN KEY (keyword) REFERENCES tracked_keywords(keyword) +); + +-- SEO meta content generated and applied +CREATE TABLE seo_meta_log ( + page_url TEXT NOT NULL, + generated_at TEXT NOT NULL, + meta_title TEXT, + meta_description TEXT, + focus_keyword TEXT, + model_used TEXT DEFAULT 'claude-sonnet', + pushed_to_wordpress INTEGER DEFAULT 0, + pushed_at TEXT, + PRIMARY KEY (page_url, generated_at) +); + +-- Site-level daily summary +CREATE TABLE site_daily_metrics ( + date TEXT PRIMARY KEY, + total_clicks INTEGER DEFAULT 0, + total_impressions INTEGER DEFAULT 0, + avg_ctr REAL DEFAULT 0, + avg_position REAL DEFAULT 0, + unique_queries INTEGER DEFAULT 0, + created_at TEXT DEFAULT CURRENT_TIMESTAMP +); +``` + +### What the Data Tells You +Google Search Console API returns four core metrics per query/page +combination: +- **Clicks** — how many times someone clicked through to your site +- **Impressions** — how many times your page appeared in search results +- **CTR (Click-Through Rate)** — clicks / impressions +- **Average Position** — where you ranked (1 = top of page 1) + +You can slice these by: date, query, page, device (mobile/desktop/tablet), +country, and search type (web/image/video). + +### Deliverable +Python CLI tool that backfills and incrementally collects PBS search data +into SQLite + +--- + +## Phase 3: n8n Orchestration + LLM Meta Generation +**Estimated Time:** 4-5 hours +**Goal:** Automate data collection, generate SEO meta content with Claude, +push to WordPress + +### Tasks +- [ ] Create n8n workflow: daily scheduled trigger → Execute Command → +Python collector +- [ ] Build Claude API integration for meta generation: + - Input: page content, current keywords ranking, content type + - Output: optimized meta title, meta description, focus keyword + - System prompt tuned for PBS brand voice (whole food plant based, +southern, warm, NOT "vegan") +- [ ] Build WordPress REST API integration to push meta back to Yoast +fields: + - `_yoast_wpseo_title` (meta title) + - `_yoast_wpseo_metadesc` (meta description) + - `_yoast_wpseo_focuskw` (focus keyword) +- [ ] Add WPCode snippet to expose Yoast fields via WordPress REST API +(required for write access) +- [ ] Create approval workflow: generate meta → notify Travis/Jenny via +Google Chat → approve/reject → push to WordPress +- [ ] Create weekly SEO digest alert for Google Chat +- [ ] Error handling and failure notifications + +### LLM Meta Generation Flow +``` +n8n detects new/updated post in WordPress + │ + Fetch page content + current search queries ranking for that URL + │ + Send to Claude API with SEO-optimized system prompt + │ + Claude generates: meta title, meta description, focus keyword + │ + Store in seo_meta_log table + │ + Send to Google Chat for approval + │ + On approval: push to WordPress via REST API +``` + +### WordPress Integration Detail +Yoast's REST API is read-only by default. To write meta fields, we need a +small WPCode snippet that registers Yoast fields on the WordPress REST API. +This is a lightweight approach — about 20 lines of PHP via WPCode Lite +(already installed), no additional plugins needed. + +Alternatively, n8n can update post meta directly via the WordPress API +using the `meta` field in a PUT request to `/wp-json/wp/v2/posts/`. + +### Alert Ideas +- **New content alert:** "Jenny published a new recipe. Claude generated +meta — approve?" +- **Weekly digest:** Top gaining keywords, biggest position changes, pages +needing optimization +- **Opportunity alert:** "You're ranking #11 for 'plant based collard +greens' — small push could hit page 1" +- **Cannibalization alert:** Multiple PBS pages competing for the same +keyword + +### Deliverable +Fully automated pipeline: collect → analyze → generate → approve → publish +SEO meta + +--- + +## Phase 4: Streamlit SEO Dashboard +**Estimated Time:** 4-6 hours +**Goal:** Visual SEO analytics dashboard integrated alongside YouTube +analytics + +### Tasks +- [ ] Add SEO pages to existing Streamlit app (or create separate app) +- [ ] Build search performance overview (clicks, impressions, CTR trends) +- [ ] Build keyword rank tracker (position changes over time) +- [ ] Build page-level deep dive (which queries drive traffic to each page) +- [ ] Build content gap analysis view (queries with high impressions but +low CTR) +- [ ] Build content type comparison (recipe SEO vs blog SEO performance) +- [ ] Build "opportunities" view (keywords close to page 1, quick wins) +- [ ] Build meta generation log view (what Claude generated, what was +approved) + +### Dashboard Pages (Initial Concept) +1. **Search Overview** — total clicks/impressions/CTR trend, top queries, +top pages +2. **Keyword Tracker** — track specific keywords over time, position change +alerts +3. **Page Deep Dive** — select a page, see all queries driving traffic, +position trends +4. **Content Gaps** — high impression / low click pages (title/description +need work) +5. **Opportunities** — keywords ranking positions 8-20 (striking distance +of page 1) +6. **Content Type Breakdown** — SEO performance by content type (recipe vs +blog vs merch) +7. **Meta Generation Log** — what Claude generated, approval status, +before/after + +### Deliverable +Live SEO dashboard with actionable insights for content strategy + +--- + +## Phase 5: Competitor Intelligence (Open — Free vs Paid) +**Estimated Time:** TBD based on approach +**Goal:** Understand competitive landscape and find content opportunities + +### Option A: DIY / Free Approach +- **Manual competitor research:** Periodically Google target keywords and +note who ranks +- **Python scraping:** Build a lightweight rank checker that searches +Google for target keywords and records positions (note: Google may +rate-limit or block; use responsibly) +- **Free tools:** Google Trends API for search interest over time, +AnswerThePublic for question-based keyword ideas +- **Search Console mining:** Analyze existing query data to find patterns +and gaps — you'd be surprised how much insight is already in your own data +- **Cost:** $0 +- **Limitation:** No competitor backlink data, no domain authority scores, +limited keyword volume estimates + +### Option B: Budget Paid Tools (~$50-75/month) +- **SERPApi or DataForSEO:** Programmatic access to Google search results + - Track competitor rankings for your target keywords + - Get search volume estimates + - API-friendly, integrates cleanly with Python pipeline +- **Best for:** Automated daily rank tracking beyond what Search Console +provides +- **Cost:** ~$50-75/month depending on query volume + +### Option C: Full SEO Platform (~$99-200+/month) +- **Ahrefs, SEMrush, or Moz:** Comprehensive SEO intelligence + - Competitor keyword analysis (what they rank for that you don't) + - Backlink profiles and domain authority + - Content gap analysis at scale + - Keyword difficulty scores +- **Best for:** When you've outgrown Search Console data and need +competitive intelligence +- **Cost:** $99-200+/month + +### Recommendation +Start with Option A (free). Build the pipeline around Google Search Console +data first. After 1-2 months of collecting data, evaluate what questions +you can't answer with free data alone. That will tell you whether Option B +or C is worth the investment. Many sites PBS's size never need to go past +Option A. + +### Deliverable +Decision on competitive intelligence approach based on data from earlier +phases + +--- + +## Phase 6: Advanced SEO Automation & Iteration +**Estimated Time:** Ongoing +**Goal:** Deepen automation and cross-platform insights + +### Future Ideas +- [ ] Auto-detect new WordPress posts and trigger SEO meta generation +without manual intervention +- [ ] Cross-reference YouTube retention data with recipe page SEO +performance (which videos drive search traffic?) +- [ ] Automated internal linking suggestions (connect related recipes/blog +posts) +- [ ] Schema markup validation and monitoring (ensure WPRM recipe schema +stays healthy) +- [ ] Page speed monitoring integration (Core Web Vitals affect rankings) +- [ ] Seasonal keyword planning (predict trending search terms by season +for recipe content) +- [ ] A/B test meta titles: generate two versions, measure CTR difference +- [ ] Content calendar integration: use keyword gaps to suggest what Jenny +should create next +- [ ] Extend to merch, cookbook, and membership pages as they launch + +--- + +## Prerequisites & Dependencies + +| Requirement | Status | Notes | +|---|---|---| +| Google Search Console verified | Needs check | May already be connected +via Workspace | +| Google Cloud project | Shared | Same project as YouTube Analytics | +| Search Console API enabled | Needed | Free, quota-based | +| OAuth/Service Account credentials | Needed | Can extend existing YouTube +creds | +| Python + UV | Ready | Travis's local dev setup | +| Anthropic API key | Needed | For Claude meta generation | +| WPCode Lite (WordPress) | Ready | Already installed — needed for REST API +Yoast fields | +| n8n | Ready | Already running on Linode | +| Streamlit | Shared | Same instance as YouTube dashboard | + +--- + +## API Quotas & Costs + +| Service | Quota/Cost | Notes | +|---|---|---| +| Google Search Console API | 2000 queries/day (free) | More than enough +for PBS | +| Claude API (Sonnet) | ~$0.003 per meta generation | Pennies per recipe | +| WordPress REST API | Unlimited (self-hosted) | No external cost | +| Google Chat webhooks | Unlimited (free) | Already configured for n8n | + +--- + +## Key Decisions + +| Decision | Choice | Rationale | +|---|---|---| +| Data source | Google Search Console (free) | Actual Google data, not +estimates. 16 months history. Sufficient for PBS scale. | +| Competitor intelligence | Deferred (Phase 5) | Start free, evaluate need +after collecting own data. | +| LLM for meta generation | Claude API (Anthropic) | Consistent with PBS +brand, excellent at structured content, cost-effective. | +| Meta push to WordPress | REST API via WPCode snippet | Lightweight, no +extra plugins, uses existing WPCode Lite install. | +| Dashboard | Streamlit (shared with YouTube) | Single analytics platform +for all PBS data. | +| Approval workflow | Google Chat notification | Keeps human in the loop +before meta goes live. Jenny/Travis approve. | + +--- + +## Sequencing & Priority +1. **Phase 1** (Search Console Setup) → unblocks data collection +2. **Phase 2** (Data Collector) → starts building historical dataset, +enables analysis +3. **Phase 3** (n8n + LLM Meta Generation) → the automation sweet spot — no +more copy-paste +4. **Phase 4** (Streamlit Dashboard) → visualize what's working, find +opportunities +5. **Phase 5** (Competitor Intelligence) → evaluate free vs paid based on +real needs +6. **Phase 6** (Advanced) → cross-platform insights, deeper automation + +--- + +## Relationship to Other PBS Projects +- **YouTube Analytics Pipeline:** Shared Streamlit dashboard, shared Google +Cloud project, parallel development +- **PBS Content Hub (Phase 5):** SEO dashboard could become a Content Hub +tab +- **Instagram Automation:** Cross-platform content performance analysis +(search + social) +- **WordPress-to-MySQL sync:** Trigger SEO meta generation when new recipes +are synced +- **Authelia SSO:** Will protect Streamlit dashboard access +- **Yoast SEO plugin:** Stays installed for technical plumbing (sitemaps, +canonical URLs, Open Graph) — but meta content is now generated and pushed +by the pipeline, not manually entered + +--- + +## Note on Yoast +Yoast stays installed but its role changes. It continues handling: +- XML sitemap generation +- Canonical URL management +- Open Graph / social sharing meta tags +- Basic schema markup (supplementing WPRM's recipe schema) + +What it NO LONGER does: +- You stop manually filling in meta titles/descriptions (the pipeline does +this) +- You ignore the content scoring stoplight (Claude's output is smarter than +Yoast's rules) +- Focus keywords are set by data-driven keyword research, not gut feeling + +Yoast becomes invisible plumbing. The pipeline becomes the brain. + +--- + +*Next Step: Phase 1 — Check if Google Search Console is connected for +plantbasedsoutherner.com* \ No newline at end of file