herbygitea/pbs-obsidian-vault

Fork 0

herbygitea 1897d2b64d Create pbs-youtube-analytics.md via n8n

2026-03-24 20:25:28 +00:00

12 KiB

Raw Blame History

project

type

status

PBS YouTube Analytics Pipeline

Project Goal

Build a self-hosted YouTube analytics pipeline for the PBS channel that collects video performance data (with a focus on audience retention), stores it in SQLite, automates collection via n8n, sends alerts to Google Chat, and visualizes insights through a Streamlit dashboard.

Why This Matters

YouTube Studio's built-in analytics are limited and don't let us slice data the way we need. By owning the raw data, Travis can do proper analysis in Python/R, and Jenny gets a clean dashboard showing what's actually working in our content — especially where viewers drop off or rewatch.

Architecture Overview

YouTube Analytics API
        |
   Python Collector Script (PyCharm + UV)
        |
   SQLite Database (self-contained file)
        |
   ┌────┴────┐
   │         │
  n8n     Streamlit
(schedule  (dashboard
+ alerts)  via Traefik)

Data Collection: Python script using google-api-python-client + google-auth-oauthlib
Storage: SQLite database file (lightweight, portable, perfect for read-heavy analytics)
Automation: n8n triggers collection on schedule, sends Google Chat alerts
Visualization: Streamlit app served as Docker container behind Traefik

Phase 1: Google Cloud + API Setup

Estimated Time: 1-2 hours Goal: Get API credentials and verify access to PBS YouTube data

Tasks

Create Google Cloud project (or use existing PBS project)
Enable YouTube Data API v3
Enable YouTube Analytics API v2
Configure OAuth consent screen (Internal if using Workspace, External otherwise)
Create OAuth 2.0 Desktop App credentials
Download client_secret.json
Test OAuth flow — authorize and confirm access to PBS channel data

Key Details

Required OAuth scope: https://www.googleapis.com/auth/yt-analytics.readonly
Additional scope for video metadata: https://www.googleapis.com/auth/youtube.readonly
OAuth tokens will be stored securely and refreshed automatically
First auth requires browser interaction; subsequent runs use refresh token

Deliverable

Working OAuth credentials that can query the PBS channel's analytics data

Phase 2: Python Data Collector

Estimated Time: 3-4 hours Goal: Python script that pulls video stats and retention data into SQLite Tools: PyCharm Professional, UV package manager

Tasks

Initialize project with UV (uv init pbs-youtube-analytics)
Install dependencies: google-api-python-client, google-auth-oauthlib, google-auth-httplib2
Build OAuth2 auth module with token persistence (refresh token stored in JSON)
Build video list collector (pulls all PBS videos/shorts with metadata)
Build retention data collector (audience retention curves per video)
Build general metrics collector (views, watch time, likes, traffic sources, etc.)
Design and create SQLite schema
Implement data ingestion with upsert logic (idempotent runs)
Add CLI interface for manual runs and backfill
Test with real PBS channel data

SQLite Schema (Initial Design)

-- Video metadata from Data API
CREATE TABLE videos (
    video_id TEXT PRIMARY KEY,
    title TEXT NOT NULL,
    published_at TEXT NOT NULL,
    duration_seconds INTEGER,
    video_type TEXT CHECK(video_type IN ('video', 'short')),
    thumbnail_url TEXT,
    description TEXT,
    created_at TEXT DEFAULT CURRENT_TIMESTAMP,
    updated_at TEXT DEFAULT CURRENT_TIMESTAMP
);

-- Daily aggregate metrics from Analytics API
CREATE TABLE video_daily_metrics (
    video_id TEXT NOT NULL,
    date TEXT NOT NULL,
    views INTEGER DEFAULT 0,
    estimated_minutes_watched REAL DEFAULT 0,
    average_view_duration_seconds REAL DEFAULT 0,
    average_view_percentage REAL DEFAULT 0,
    likes INTEGER DEFAULT 0,
    dislikes INTEGER DEFAULT 0,
    comments INTEGER DEFAULT 0,
    shares INTEGER DEFAULT 0,
    subscribers_gained INTEGER DEFAULT 0,
    subscribers_lost INTEGER DEFAULT 0,
    impressions INTEGER DEFAULT 0,
    impressions_ctr REAL DEFAULT 0,
    created_at TEXT DEFAULT CURRENT_TIMESTAMP,
    PRIMARY KEY (video_id, date),
    FOREIGN KEY (video_id) REFERENCES videos(video_id)
);

-- Audience retention curve (100 data points per video)
CREATE TABLE video_retention (
    video_id TEXT NOT NULL,
    elapsed_ratio REAL NOT NULL,
    audience_watch_ratio REAL NOT NULL,
    relative_retention_performance REAL,
    fetched_at TEXT DEFAULT CURRENT_TIMESTAMP,
    PRIMARY KEY (video_id, elapsed_ratio),
    FOREIGN KEY (video_id) REFERENCES videos(video_id)
);

-- Traffic source breakdown per video per day
CREATE TABLE video_traffic_sources (
    video_id TEXT NOT NULL,
    date TEXT NOT NULL,
    traffic_source TEXT NOT NULL,
    views INTEGER DEFAULT 0,
    estimated_minutes_watched REAL DEFAULT 0,
    PRIMARY KEY (video_id, date, traffic_source),
    FOREIGN KEY (video_id) REFERENCES videos(video_id)
);

-- Channel-level daily summary
CREATE TABLE channel_daily_metrics (
    date TEXT PRIMARY KEY,
    total_views INTEGER DEFAULT 0,
    total_estimated_minutes_watched REAL DEFAULT 0,
    subscribers_gained INTEGER DEFAULT 0,
    subscribers_lost INTEGER DEFAULT 0,
    net_subscribers INTEGER DEFAULT 0,
    created_at TEXT DEFAULT CURRENT_TIMESTAMP
);

API Details — Retention Data

Endpoint: YouTube Analytics API v2 reports.query
Dimension: elapsedVideoTimeRatio (100 data points, values 0.01 to 1.0)
Metrics available:
- audienceWatchRatio — absolute retention (can exceed 1.0 for rewatched segments)
- relativeRetentionPerformance — compared to similar-length YouTube videos (0 to 1 scale)
- startedWatching — how often viewers started watching at this point
- stoppedWatching — how often viewers stopped watching at this point
Limitation: Retention data is per-video only (one video per API call, no further dimension splits)
Note: For a 60-second Short, each data point ≈ 0.6 seconds. For a 10-minute video, each ≈ 6 seconds.

Deliverable

Python CLI tool that pulls all PBS video data + retention curves into a local SQLite database

Phase 3: n8n Automation + Alerts

Estimated Time: 2-3 hours Goal: Automate daily data collection and send performance alerts to Google Chat

Tasks

Deploy collector script to Linode server (alongside n8n)
Create n8n workflow: daily scheduled trigger → Execute Command node → runs Python collector
Add error handling: notify Google Chat on collection failures
Create weekly digest alert: top performing videos, notable retention patterns
Create threshold alerts: video crosses view milestones, unusual engagement spikes
Test scheduled execution end-to-end

Alert Ideas

Weekly Digest (for Jenny): Top 5 videos this week by views, best retention video, shorts vs long-form comparison
Spike Alert: Video gets 2x+ its average daily views
Milestone Alert: Video crosses 1K, 5K, 10K views
New Video Check-in: 48-hour performance report for newly published content

Deliverable

Automated daily collection with Google Chat alerts for notable events

Phase 4: Streamlit Dashboard

Estimated Time: 4-6 hours Goal: Interactive web dashboard for Jenny and Travis to explore PBS YouTube performance

Tasks

Initialize Streamlit project with UV
Build retention heatmap view (the star feature)
Build video comparison view (side-by-side retention curves)
Build channel overview page (trends over time)
Build shorts vs long-form comparison view
Build traffic source analysis view
Dockerize Streamlit app
Add to docker-compose with Traefik labels
Deploy to staging first, then production
Secure with Authelia (when SSO rollout happens) or basic auth initially

Dashboard Pages (Initial Concept)

Channel Overview — subscriber trend, total views/watch time over time, publishing cadence
Video Deep Dive — select a video, see retention curve, daily metrics, traffic sources
Retention Heatmap — all videos on one view, color-coded by retention quality at each time segment
Shorts Lab — Shorts-specific view comparing hook effectiveness (first 3 seconds), rewatch rates
What's Working — auto-surfaced insights: best retention patterns, top traffic sources, optimal video length

Deployment

Streamlit container behind Traefik at analytics.plantbasedsoutherner.com (or similar subdomain)
Reads from same SQLite file populated by the collector
Protected by basic auth initially, Authelia later

Deliverable

Live dashboard accessible to Jenny and Travis showing PBS YouTube performance with retention analysis

Phase 5: Advanced Analysis & Iteration

Estimated Time: Ongoing Goal: Leverage the data for deeper content strategy insights

Future Ideas

Correlate retention patterns with recipe categories (link to pbs_recipes table)
A/B analysis: compare thumbnail styles, intro approaches, video lengths
Optimal posting time analysis using traffic source timing data
Export data to R for statistical modeling
Instagram vs YouTube cross-platform performance comparison
Automated content recommendations based on what's performing

Prerequisites & Dependencies

Requirement	Status	Notes
Google Cloud project	Needed	May already exist for Google Workspace
YouTube Analytics API enabled	Needed	Free, quota-based
OAuth 2.0 credentials	Needed	Desktop app type
Python + UV	Ready	Travis's local dev setup
Linode server access	Ready	Same server running n8n
n8n operational	Ready	Already running PBS automation
Traefik reverse proxy	Ready	For Streamlit subdomain
SQLite	Ready	Ships with Python, no setup needed

API Quotas & Limits

YouTube Analytics API: 200 queries/day default (can request increase)
YouTube Data API v3: 10,000 units/day (listing videos costs ~1-3 units each)
Retention data: one video per API call (plan batch collection accordingly)
Data availability: typically 2-3 day delay from YouTube

Key Decisions

Decision	Choice	Rationale
Database	SQLite	Self-contained, portable, perfect for read-heavy
analytics workload. No server process needed.
Dashboard	Streamlit	Python-native, fast to build, interactive. Travis
can leverage data analyst skills directly.
API approach	YouTube Analytics API (targeted queries)	Real-time,
flexible dimensions/metrics. Better than Reporting API for our scale.
Hosting	Linode (same server)	Keeps everything centralized with
existing PBS infrastructure.

Sequencing & Priority

Phase 1 (API Setup) → unblocks everything
Phase 2 (Python Collector) → gets data flowing, enables ad-hoc analysis immediately
Phase 3 (n8n Automation) → removes manual collection burden
Phase 4 (Streamlit Dashboard) → gives Jenny self-service access to insights
Phase 5 (Advanced Analysis) → ongoing value extraction

Relationship to Other PBS Projects

PBS Content Hub (Phase 5): Dashboard could eventually be a tab within the Content Hub
Authelia SSO: Will protect the Streamlit dashboard once rolled out
WordPress-to-MySQL sync: Could correlate website recipe traffic with YouTube performance
Instagram automation: Cross-platform analysis potential (YouTube + Instagram data in one place)

Next Step: Phase 1 — Set up Google Cloud project and enable YouTube APIs

12 KiB Raw Blame History

PBS YouTube Analytics Pipeline

Project Goal

Why This Matters

Architecture Overview

Phase 1: Google Cloud + API Setup

Tasks

Key Details

Deliverable

Phase 2: Python Data Collector

Tasks

SQLite Schema (Initial Design)

API Details — Retention Data

Deliverable

Phase 3: n8n Automation + Alerts

Tasks

Alert Ideas

Deliverable

Phase 4: Streamlit Dashboard

Tasks

Dashboard Pages (Initial Concept)

Deployment

Deliverable

Phase 5: Advanced Analysis & Iteration

Future Ideas

Prerequisites & Dependencies

API Quotas & Limits

Key Decisions

Sequencing & Priority

Relationship to Other PBS Projects

12 KiB

Raw Blame History