--- project: pbs-seo-automation type: project-plan status: active tags: - pbs - seo - python - automation - n8n - wordpress - yoast - streamlit - analytics - google-search-console created: 2026-03-23 updated: 2026-03-23 path: PBS/Tech/Projects/ --- # PBS SEO Automation Pipeline ## Project Goal Build a self-hosted SEO automation pipeline that replaces manual copy-paste SEO workflows with an automated system: collecting search performance data, tracking rankings over time, researching keywords, generating optimized meta content via Claude API, and pushing it back to WordPress — all orchestrated through n8n and visualized in Streamlit. ## Why This Matters PBS is already getting organic search traffic, which means SEO is working to some degree. But optimizing it is currently a manual, disjointed process — Yoast shows fields but doesn't help fill them intelligently, and there's no automation connecting keyword research to content optimization. This project turns SEO from a chore into a data-driven pipeline that works across all PBS content types. ## Content Types Covered - **Recipes** (live) — highest SEO value, drives organic discovery - **Blog/editorial** (live) — builds authority, targets informational queries - **Cookbook landing pages** (future) — transactional/promotional SEO - **Merch pages** (future) — Product schema, transactional keywords - **Membership/classes** (future) — funnel-driven, conversion-focused The pipeline is designed to handle all content types from day one, even if only recipes and blog posts exist today. --- ## Architecture Overview ``` Google Search Console API (free) | Python Collector (PyCharm + UV) | SQLite Database | ┌────────────┴────────────┐ │ │ n8n Streamlit (orchestration) (dashboard) │ ├─ Claude API (generate meta titles/descriptions) ├─ WordPress REST API (push meta back to Yoast) └─ Google Chat (alerts & digests) ``` ### Shared Infrastructure with YouTube Analytics Project - Same Streamlit instance (separate pages/tabs) - Same n8n server for orchestration - Separate SQLite database (keeps projects independent) - Same Traefik reverse proxy for dashboard access - Same Google Cloud project for API credentials --- ## Phase 1: Google Search Console Setup **Estimated Time:** 1-2 hours **Goal:** Connect Search Console to PBS site and verify API access ### Tasks - [ ] Verify Google Search Console is connected for plantbasedsoutherner.com - If yes: confirm data is flowing, check how far back data goes - If no: add property, verify via DNS (Cloudflare), wait for data collection to begin - [ ] Enable Google Search Console API in Google Cloud project - Can reuse the same project created for YouTube Analytics - [ ] Create service account OR extend existing OAuth credentials with scope: `https://www.googleapis.com/auth/webmasters.readonly` - [ ] Test API access — pull a sample query report to confirm data flows ### Key Details - Search Console retains 16 months of historical data - Data is typically delayed 2-3 days - API uses `google-api-python-client` (same library as YouTube project) - Service account auth is simpler for automated/server-side collection (no browser needed) ### Deliverable Working API access to PBS search performance data --- ## Phase 2: Search Data Collector **Estimated Time:** 3-4 hours **Goal:** Python script that pulls search performance data into SQLite **Tools:** PyCharm Professional, UV package manager ### Tasks - [ ] Initialize project with UV (`uv init pbs-seo-analytics`) - [ ] Install dependencies: `google-api-python-client`, `google-auth` - [ ] Build auth module (service account preferred for server-side) - [ ] Build search query collector (queries, impressions, clicks, CTR, position by page) - [ ] Build page performance collector (aggregate metrics per URL) - [ ] Build device/country breakdown collector - [ ] Design and create SQLite schema - [ ] Implement data ingestion with upsert logic (idempotent runs) - [ ] Add CLI interface for manual runs and backfill (up to 16 months) - [ ] Initial backfill of all available historical data ### SQLite Schema (Initial Design) ```sql -- Pages tracked on the site CREATE TABLE pages ( url TEXT PRIMARY KEY, page_type TEXT CHECK(page_type IN ('recipe', 'blog', 'merch', 'cookbook', 'membership', 'landing', 'other')), title TEXT, first_seen TEXT, created_at TEXT DEFAULT CURRENT_TIMESTAMP, updated_at TEXT DEFAULT CURRENT_TIMESTAMP ); -- Daily search performance per query per page CREATE TABLE search_queries ( date TEXT NOT NULL, query TEXT NOT NULL, page_url TEXT NOT NULL, clicks INTEGER DEFAULT 0, impressions INTEGER DEFAULT 0, ctr REAL DEFAULT 0, avg_position REAL DEFAULT 0, device TEXT DEFAULT 'all', country TEXT DEFAULT 'all', created_at TEXT DEFAULT CURRENT_TIMESTAMP, PRIMARY KEY (date, query, page_url, device, country) ); -- Daily aggregate performance per page CREATE TABLE page_daily_metrics ( date TEXT NOT NULL, page_url TEXT NOT NULL, total_clicks INTEGER DEFAULT 0, total_impressions INTEGER DEFAULT 0, avg_ctr REAL DEFAULT 0, avg_position REAL DEFAULT 0, created_at TEXT DEFAULT CURRENT_TIMESTAMP, PRIMARY KEY (date, page_url) ); -- Keyword tracking: queries we want to monitor over time CREATE TABLE tracked_keywords ( keyword TEXT PRIMARY KEY, category TEXT, target_page_url TEXT, added_at TEXT DEFAULT CURRENT_TIMESTAMP, notes TEXT ); -- Snapshot of rank position for tracked keywords CREATE TABLE keyword_rank_history ( keyword TEXT NOT NULL, date TEXT NOT NULL, avg_position REAL, impressions INTEGER DEFAULT 0, clicks INTEGER DEFAULT 0, best_page_url TEXT, created_at TEXT DEFAULT CURRENT_TIMESTAMP, PRIMARY KEY (keyword, date), FOREIGN KEY (keyword) REFERENCES tracked_keywords(keyword) ); -- SEO meta content generated and applied CREATE TABLE seo_meta_log ( page_url TEXT NOT NULL, generated_at TEXT NOT NULL, meta_title TEXT, meta_description TEXT, focus_keyword TEXT, model_used TEXT DEFAULT 'claude-sonnet', pushed_to_wordpress INTEGER DEFAULT 0, pushed_at TEXT, PRIMARY KEY (page_url, generated_at) ); -- Site-level daily summary CREATE TABLE site_daily_metrics ( date TEXT PRIMARY KEY, total_clicks INTEGER DEFAULT 0, total_impressions INTEGER DEFAULT 0, avg_ctr REAL DEFAULT 0, avg_position REAL DEFAULT 0, unique_queries INTEGER DEFAULT 0, created_at TEXT DEFAULT CURRENT_TIMESTAMP ); ``` ### What the Data Tells You Google Search Console API returns four core metrics per query/page combination: - **Clicks** — how many times someone clicked through to your site - **Impressions** — how many times your page appeared in search results - **CTR (Click-Through Rate)** — clicks / impressions - **Average Position** — where you ranked (1 = top of page 1) You can slice these by: date, query, page, device (mobile/desktop/tablet), country, and search type (web/image/video). ### Deliverable Python CLI tool that backfills and incrementally collects PBS search data into SQLite --- ## Phase 3: n8n Orchestration + LLM Meta Generation **Estimated Time:** 4-5 hours **Goal:** Automate data collection, generate SEO meta content with Claude, push to WordPress ### Tasks - [ ] Create n8n workflow: daily scheduled trigger → Execute Command → Python collector - [ ] Build Claude API integration for meta generation: - Input: page content, current keywords ranking, content type - Output: optimized meta title, meta description, focus keyword - System prompt tuned for PBS brand voice (whole food plant based, southern, warm, NOT "vegan") - [ ] Build WordPress REST API integration to push meta back to Yoast fields: - `_yoast_wpseo_title` (meta title) - `_yoast_wpseo_metadesc` (meta description) - `_yoast_wpseo_focuskw` (focus keyword) - [ ] Add WPCode snippet to expose Yoast fields via WordPress REST API (required for write access) - [ ] Create approval workflow: generate meta → notify Travis/Jenny via Google Chat → approve/reject → push to WordPress - [ ] Create weekly SEO digest alert for Google Chat - [ ] Error handling and failure notifications ### LLM Meta Generation Flow ``` n8n detects new/updated post in WordPress │ Fetch page content + current search queries ranking for that URL │ Send to Claude API with SEO-optimized system prompt │ Claude generates: meta title, meta description, focus keyword │ Store in seo_meta_log table │ Send to Google Chat for approval │ On approval: push to WordPress via REST API ``` ### WordPress Integration Detail Yoast's REST API is read-only by default. To write meta fields, we need a small WPCode snippet that registers Yoast fields on the WordPress REST API. This is a lightweight approach — about 20 lines of PHP via WPCode Lite (already installed), no additional plugins needed. Alternatively, n8n can update post meta directly via the WordPress API using the `meta` field in a PUT request to `/wp-json/wp/v2/posts/`. ### Alert Ideas - **New content alert:** "Jenny published a new recipe. Claude generated meta — approve?" - **Weekly digest:** Top gaining keywords, biggest position changes, pages needing optimization - **Opportunity alert:** "You're ranking #11 for 'plant based collard greens' — small push could hit page 1" - **Cannibalization alert:** Multiple PBS pages competing for the same keyword ### Deliverable Fully automated pipeline: collect → analyze → generate → approve → publish SEO meta --- ## Phase 4: Streamlit SEO Dashboard **Estimated Time:** 4-6 hours **Goal:** Visual SEO analytics dashboard integrated alongside YouTube analytics ### Tasks - [ ] Add SEO pages to existing Streamlit app (or create separate app) - [ ] Build search performance overview (clicks, impressions, CTR trends) - [ ] Build keyword rank tracker (position changes over time) - [ ] Build page-level deep dive (which queries drive traffic to each page) - [ ] Build content gap analysis view (queries with high impressions but low CTR) - [ ] Build content type comparison (recipe SEO vs blog SEO performance) - [ ] Build "opportunities" view (keywords close to page 1, quick wins) - [ ] Build meta generation log view (what Claude generated, what was approved) ### Dashboard Pages (Initial Concept) 1. **Search Overview** — total clicks/impressions/CTR trend, top queries, top pages 2. **Keyword Tracker** — track specific keywords over time, position change alerts 3. **Page Deep Dive** — select a page, see all queries driving traffic, position trends 4. **Content Gaps** — high impression / low click pages (title/description need work) 5. **Opportunities** — keywords ranking positions 8-20 (striking distance of page 1) 6. **Content Type Breakdown** — SEO performance by content type (recipe vs blog vs merch) 7. **Meta Generation Log** — what Claude generated, approval status, before/after ### Deliverable Live SEO dashboard with actionable insights for content strategy --- ## Phase 5: Competitor Intelligence (Open — Free vs Paid) **Estimated Time:** TBD based on approach **Goal:** Understand competitive landscape and find content opportunities ### Option A: DIY / Free Approach - **Manual competitor research:** Periodically Google target keywords and note who ranks - **Python scraping:** Build a lightweight rank checker that searches Google for target keywords and records positions (note: Google may rate-limit or block; use responsibly) - **Free tools:** Google Trends API for search interest over time, AnswerThePublic for question-based keyword ideas - **Search Console mining:** Analyze existing query data to find patterns and gaps — you'd be surprised how much insight is already in your own data - **Cost:** $0 - **Limitation:** No competitor backlink data, no domain authority scores, limited keyword volume estimates ### Option B: Budget Paid Tools (~$50-75/month) - **SERPApi or DataForSEO:** Programmatic access to Google search results - Track competitor rankings for your target keywords - Get search volume estimates - API-friendly, integrates cleanly with Python pipeline - **Best for:** Automated daily rank tracking beyond what Search Console provides - **Cost:** ~$50-75/month depending on query volume ### Option C: Full SEO Platform (~$99-200+/month) - **Ahrefs, SEMrush, or Moz:** Comprehensive SEO intelligence - Competitor keyword analysis (what they rank for that you don't) - Backlink profiles and domain authority - Content gap analysis at scale - Keyword difficulty scores - **Best for:** When you've outgrown Search Console data and need competitive intelligence - **Cost:** $99-200+/month ### Recommendation Start with Option A (free). Build the pipeline around Google Search Console data first. After 1-2 months of collecting data, evaluate what questions you can't answer with free data alone. That will tell you whether Option B or C is worth the investment. Many sites PBS's size never need to go past Option A. ### Deliverable Decision on competitive intelligence approach based on data from earlier phases --- ## Phase 6: Advanced SEO Automation & Iteration **Estimated Time:** Ongoing **Goal:** Deepen automation and cross-platform insights ### Future Ideas - [ ] Auto-detect new WordPress posts and trigger SEO meta generation without manual intervention - [ ] Cross-reference YouTube retention data with recipe page SEO performance (which videos drive search traffic?) - [ ] Automated internal linking suggestions (connect related recipes/blog posts) - [ ] Schema markup validation and monitoring (ensure WPRM recipe schema stays healthy) - [ ] Page speed monitoring integration (Core Web Vitals affect rankings) - [ ] Seasonal keyword planning (predict trending search terms by season for recipe content) - [ ] A/B test meta titles: generate two versions, measure CTR difference - [ ] Content calendar integration: use keyword gaps to suggest what Jenny should create next - [ ] Extend to merch, cookbook, and membership pages as they launch --- ## Prerequisites & Dependencies | Requirement | Status | Notes | |---|---|---| | Google Search Console verified | Needs check | May already be connected via Workspace | | Google Cloud project | Shared | Same project as YouTube Analytics | | Search Console API enabled | Needed | Free, quota-based | | OAuth/Service Account credentials | Needed | Can extend existing YouTube creds | | Python + UV | Ready | Travis's local dev setup | | Anthropic API key | Needed | For Claude meta generation | | WPCode Lite (WordPress) | Ready | Already installed — needed for REST API Yoast fields | | n8n | Ready | Already running on Linode | | Streamlit | Shared | Same instance as YouTube dashboard | --- ## API Quotas & Costs | Service | Quota/Cost | Notes | |---|---|---| | Google Search Console API | 2000 queries/day (free) | More than enough for PBS | | Claude API (Sonnet) | ~$0.003 per meta generation | Pennies per recipe | | WordPress REST API | Unlimited (self-hosted) | No external cost | | Google Chat webhooks | Unlimited (free) | Already configured for n8n | --- ## Key Decisions | Decision | Choice | Rationale | |---|---|---| | Data source | Google Search Console (free) | Actual Google data, not estimates. 16 months history. Sufficient for PBS scale. | | Competitor intelligence | Deferred (Phase 5) | Start free, evaluate need after collecting own data. | | LLM for meta generation | Claude API (Anthropic) | Consistent with PBS brand, excellent at structured content, cost-effective. | | Meta push to WordPress | REST API via WPCode snippet | Lightweight, no extra plugins, uses existing WPCode Lite install. | | Dashboard | Streamlit (shared with YouTube) | Single analytics platform for all PBS data. | | Approval workflow | Google Chat notification | Keeps human in the loop before meta goes live. Jenny/Travis approve. | --- ## Sequencing & Priority 1. **Phase 1** (Search Console Setup) → unblocks data collection 2. **Phase 2** (Data Collector) → starts building historical dataset, enables analysis 3. **Phase 3** (n8n + LLM Meta Generation) → the automation sweet spot — no more copy-paste 4. **Phase 4** (Streamlit Dashboard) → visualize what's working, find opportunities 5. **Phase 5** (Competitor Intelligence) → evaluate free vs paid based on real needs 6. **Phase 6** (Advanced) → cross-platform insights, deeper automation --- ## Relationship to Other PBS Projects - **YouTube Analytics Pipeline:** Shared Streamlit dashboard, shared Google Cloud project, parallel development - **PBS Content Hub (Phase 5):** SEO dashboard could become a Content Hub tab - **Instagram Automation:** Cross-platform content performance analysis (search + social) - **WordPress-to-MySQL sync:** Trigger SEO meta generation when new recipes are synced - **Authelia SSO:** Will protect Streamlit dashboard access - **Yoast SEO plugin:** Stays installed for technical plumbing (sitemaps, canonical URLs, Open Graph) — but meta content is now generated and pushed by the pipeline, not manually entered --- ## Note on Yoast Yoast stays installed but its role changes. It continues handling: - XML sitemap generation - Canonical URL management - Open Graph / social sharing meta tags - Basic schema markup (supplementing WPRM's recipe schema) What it NO LONGER does: - You stop manually filling in meta titles/descriptions (the pipeline does this) - You ignore the content scoring stoplight (Claude's output is smarter than Yoast's rules) - Focus keywords are set by data-driven keyword research, not gut feeling Yoast becomes invisible plumbing. The pipeline becomes the brain. --- *Next Step: Phase 1 — Check if Google Search Console is connected for plantbasedsoutherner.com*