docs(readme): update readme

docs(report): add Declaration of Originality and Acknowledgements sections
refactor(connector): clean up comments
2026-04-19 13:54:09 +01:00 · 2026-04-18 22:10:16 +01:00 · 2026-04-18 22:10:03 +01:00 · 2026-04-18 16:09:22 +01:00 · 2026-04-18 15:44:04 +01:00 · 2026-04-17 20:31:39 +01:00
99 changed files with 9199 additions and 1894 deletions
--- a/.gitignore
+++ b/.gitignore
@@ -10,4 +10,7 @@ __pycache__/
 node_modules/
 dist/

-*.sh
+helper
+db
+report/build
+.DS_Store
--- a/19
+++ b/19
@@ -0,0 +1,19 @@
+# Use slim to reduce size
+FROM python:3.13-slim
+
+# Prevent Python from buffering stdout
+ENV PYTHONUNBUFFERED=1
+
+# System deps required for psycopg2 + torch
+RUN apt-get update && apt-get install -y \
+    build-essential \
+    libpq-dev \
+    gcc \
+    curl \
+    && rm -rf /var/lib/apt/lists/*
+
+WORKDIR /app
+COPY requirements.txt .
+RUN pip install --no-cache-dir -r requirements.txt
+COPY . .
+CMD ["python", "main.py"]
--- a/README.md
+++ b/README.md
@@ -1,29 +1,49 @@
 # crosspost
-**crosspost** is a browser-based tool designed to support *digital ethnography*, the study of how people interact, communicate, and form culture in online spaces such as forums, social media platforms, and comment-driven communities.
+A web-based analytics platform for exploring online communities. Built as a final year CS project at UCC, crosspost ingests data from Reddit, YouTube, and Boards.ie, runs NLP analysis on it (emotion detection, topic classification, named entity recognition, stance markers), and surfaces the results through an interactive dashboard.
+The motivating use case is digital ethnography — studying how people talk, what they talk about, and how culture forms in online spaces. The included dataset is centred on Cork, Ireland.

-The project aims to make it easier for students, researchers, and journalists to collect, organise, and explore online discourse in a structured and ethical way, without requiring deep technical expertise.
+## What it does
+- Fetch posts and comments from Reddit, YouTube, and Boards.ie (or upload your own .jsonl file)
+- Normalise everything into a unified schema regardless of source
+- Run NLP analysis asynchronously in the background via Celery workers
+- Explore results through a tabbed dashboard: temporal patterns, word clouds, emotion breakdowns, user activity, interaction graphs, topic clusters, and more
+- Multi-user support — each user has their own datasets, isolated from everyone else

-By combining data ingestion, analysis, and visualisation in a single system, crosspost turns raw online interactions into meaningful insights about how conversations emerge, evolve, and spread across platforms.
+# Prerequisites
+- Docker & Docker Compose
+- A Reddit App (client id & secret)
+- YouTube Data v3 API Key

-## Goals for this project
- Collect data ethically: enable users to link/upload text, images, and interaction data (messages etc) from specified online communities. Potentially and automated method for importing (using APIs or scraping techniques) could be included as well.
- Organise content: Store gathered material in a structured database with tagging for themes, dates, and sources.
-Analyse patterns: Use natural language processing (NLP) to detect frequent keywords, sentiment, and interaction networks.
- Visualise insights: Present findings as charts, timelines, and network diagrams to reveal how conversations and topics evolve.
- Have clearly stated and explained ethical and privacy guidelines for users. The student will design the architecture, implement data pipelines, integrate basic NLP models, and create an interactive dashboard. 
+# Setup
+1) **Clone the Repo**
+```
+git clone https://github.com/your-username/crosspost.git
+cd crosspost
+```

-Beyond programming, the project involves applying ethical research principles, handling data responsibly, and designing for non-technical users. By the end, the project will demonstrate how computer science can bridge technology and social research — turning raw online interactions into meaningful cultural insights.
+2) **Configure Enviornment Vars**
+```
+cp example.env .env
+```
+Fill in each required empty env. Some are already filled in, these are sensible defaults that usually don't need to be changed

-## Scope
+3) **Start everything**
+```
+docker compose up -d
+```

-This project focuses on:
- Designing a modular data ingestion pipeline
- Implementing backend data processing and storage
- Integrating lightweight NLP-based analysis
- Building a simple, accessible frontend for exploration and visualisation
+This starts:
+- `crosspost_db` — PostgreSQL on port 5432
+- `crosspost_redis` — Redis on port 6379
+- `crosspost_flask` — Flask API on port 5000
+- `crosspost_worker` — Celery worker for background NLP/fetching tasks
+- `crosspost_frontend` — Vite dev server on port 5173

-# Requirements
+# Data Format for Manual Uploads
+If you want to upload your own data rather than fetch it via the connectors, the expected format is newline-delimited JSON (.jsonl) where each line is a post object:
+```json
+{"id": "abc123", "author": "username", "title": "Post title", "content": "Post body", "url": "https://...", "timestamp": 1700000000.0, "source": "reddit", "comments": []}
+```

- **Python** ≥ 3.9
- **Python packages** listed in `requirements.txt`
- npm ≥ version 11 
+# Notes
+- **GPU support**: The Celery worker is configured with `--pool=solo` to avoid memory conflicts when multiple NLP models are loaded. If you have an NVIDIA GPU, uncomment the deploy.resources block in docker-compose.yml and make sure the NVIDIA Container Toolkit is installed.
--- a/connectors/reddit_api.py
+++ b/connectors/reddit_api.py
@@ -1,178 +0,0 @@
-import requests
-import logging
-import time
-
-from dto.post import Post
-from dto.user import User
-from dto.comment import Comment
-
-logger = logging.getLogger(__name__)
-
-class RedditAPI:
-    def __init__(self):
-        self.url = "https://www.reddit.com/"
-        self.source_name = "Reddit"
-
-    # Public Methods #
-    def search_new_subreddit_posts(self, search: str, subreddit: str, limit: int) -> list[Post]:
-        params = {
-            'q': search,
-            'limit': limit,
-            'restrict_sr': 'on',
-            'sort': 'new'
-        }
-
-        logger.info(f"Searching subreddit '{subreddit}' for '{search}' with limit {limit}")
-        url = f"r/{subreddit}/search.json"
-        posts = []
-        
-        while len(posts) < limit:
-            batch_limit = min(100, limit - len(posts))
-            params['limit'] = batch_limit
-
-            data = self._fetch_post_overviews(url, params)
-            batch_posts = self._parse_posts(data)
-
-            logger.debug(f"Fetched {len(batch_posts)} posts from search in subreddit {subreddit}")
-
-            if not batch_posts:
-                break
-
-            posts.extend(batch_posts)
-
-        return posts
-    
-    def get_new_subreddit_posts(self, subreddit: str, limit: int = 10) -> list[Post]:
-        posts = []
-        after = None
-        url = f"r/{subreddit}/new.json"
-
-        logger.info(f"Fetching new posts from subreddit: {subreddit}")
-
-        while len(posts) < limit:
-            batch_limit = min(100, limit - len(posts))
-            params = {
-                'limit': batch_limit,
-                'after': after
-            }
-
-            data = self._fetch_post_overviews(url, params)
-            batch_posts = self._parse_posts(data)
-
-            logger.debug(f"Fetched {len(batch_posts)} new posts from subreddit {subreddit}")
-
-            if not batch_posts:
-                break
-
-            posts.extend(batch_posts)
-            after = data['data'].get('after')
-            if not after:
-                break
-
-        return posts
-    
-    def get_user(self, username: str) -> User:
-        data = self._fetch_post_overviews(f"user/{username}/about.json", {})
-        return self._parse_user(data)
-    
-    ## Private Methods ##
-    def _parse_posts(self, data) -> list[Post]:
-        posts = []
-
-        total_num_posts = len(data['data']['children'])
-        current_index = 0
-
-        for item in data['data']['children']:
-            current_index += 1
-            logger.debug(f"Parsing post {current_index} of {total_num_posts}")
-
-            post_data = item['data']
-            post = Post(
-                id=post_data['id'],
-                author=post_data['author'],
-                title=post_data['title'],
-                content=post_data.get('selftext', ''),
-                url=post_data['url'],
-                timestamp=post_data['created_utc'],
-                source=self.source_name,
-                comments=self._get_post_comments(post_data['id']))
-            post.subreddit = post_data['subreddit']
-            post.upvotes = post_data['ups']
-
-            posts.append(post)
-        return posts
-
-    def _get_post_comments(self, post_id: str) -> list[Comment]:
-        comments: list[Comment] = []
-        url = f"comments/{post_id}.json"
-
-        data = self._fetch_post_overviews(url, {})
-        if len(data) < 2:
-            return comments
-
-        comment_data = data[1]['data']['children']
-
-        def _parse_comment_tree(items, parent_id=None):
-            for item in items:
-                if item['kind'] != 't1':
-                    continue
-
-                comment_info = item['data']
-                comment = Comment(
-                    id=comment_info['id'],
-                    post_id=post_id,
-                    author=comment_info['author'],
-                    content=comment_info.get('body', ''),
-                    timestamp=comment_info['created_utc'],
-                    reply_to=parent_id or comment_info.get('parent_id', None),
-                    source=self.source_name
-                )
-
-                comments.append(comment)
-
-                # Process replies recursively
-                replies = comment_info.get('replies')
-                if replies and isinstance(replies, dict):
-                    reply_items = replies.get('data', {}).get('children', [])
-                    _parse_comment_tree(reply_items, parent_id=comment.id)
-
-        _parse_comment_tree(comment_data)
-        return comments
-    
-    def _parse_user(self, data) -> User:
-        user_data = data['data']
-        user = User(
-            username=user_data['name'],
-            created_utc=user_data['created_utc'])
-        user.karma = user_data['total_karma']
-        return user
-    
-    def _fetch_post_overviews(self, endpoint: str, params: dict) -> dict:
-        url = f"{self.url}{endpoint}"
-        max_retries = 15
-        backoff = 1 # seconds
-
-        for attempt in range(max_retries):
-            try:
-                response = requests.get(url, headers={'User-agent': 'python:ethnography-college-project:0.1 (by /u/ThisBirchWood)'}, params=params)
-
-                if response.status_code == 429:
-                    wait_time = response.headers.get("Retry-After", backoff)
-
-                    logger.warning(f"Rate limited by Reddit API. Retrying in {wait_time} seconds...")
-
-                    time.sleep(wait_time)
-                    backoff *= 2
-                    continue
-
-                if response.status_code == 500:
-                    logger.warning("Server error from Reddit API. Retrying...")
-                    time.sleep(backoff)
-                    backoff *= 2
-                    continue
-
-                response.raise_for_status()
-                return response.json()
-            except requests.RequestException as e:
-                print(f"Error fetching data from Reddit API: {e}")
-                return {}
--- a/connectors/youtube_api.py
+++ b/connectors/youtube_api.py
@@ -1,84 +0,0 @@
-import os
-import datetime
-
-from dotenv import load_dotenv
-from googleapiclient.discovery import build
-from googleapiclient.errors import HttpError
-from dto.post import Post
-from dto.comment import Comment
-
-load_dotenv()
-
-API_KEY = os.getenv("YOUTUBE_API_KEY")
-
-class YouTubeAPI:
-    def __init__(self):
-        self.youtube = build('youtube', 'v3', developerKey=API_KEY)
-
-    def search_videos(self, query, limit):
-        request = self.youtube.search().list(
-            q=query,
-            part='snippet',
-            type='video',
-            maxResults=limit
-        )
-        response = request.execute()
-        return response.get('items', [])
-    
-    def get_video_comments(self, video_id, limit):
-        request = self.youtube.commentThreads().list(
-            part='snippet',
-            videoId=video_id,
-            maxResults=limit,
-            textFormat='plainText'
-        )
-
-        try:
-            response = request.execute()
-        except HttpError as e:
-            print(f"Error fetching comments for video {video_id}: {e}")
-            return []
-        return response.get('items', [])
-    
-    def fetch_videos(self, query, video_limit, comment_limit) -> list[Post]:
-        videos = self.search_videos(query, video_limit)
-        posts = []
-
-        for video in videos:
-            video_id = video['id']['videoId']
-            snippet = video['snippet']
-            title = snippet['title']
-            description = snippet['description']
-            published_at = datetime.datetime.strptime(snippet['publishedAt'], "%Y-%m-%dT%H:%M:%SZ").timestamp()
-            channel_title = snippet['channelTitle']
-
-            comments = []
-            comments_data = self.get_video_comments(video_id, comment_limit)
-            for comment_thread in comments_data:
-                comment_snippet = comment_thread['snippet']['topLevelComment']['snippet']
-                comment = Comment(
-                    id=comment_thread['id'],
-                    post_id=video_id,
-                    content=comment_snippet['textDisplay'],
-                    author=comment_snippet['authorDisplayName'],
-                    timestamp=datetime.datetime.strptime(comment_snippet['publishedAt'], "%Y-%m-%dT%H:%M:%SZ").timestamp(),
-                    reply_to=None,
-                    source="YouTube"
-                )
-
-                comments.append(comment)
-
-            post = Post(
-                id=video_id,
-                content=f"{title}\n\n{description}",
-                author=channel_title,
-                timestamp=published_at,
-                url=f"https://www.youtube.com/watch?v={video_id}",
-                title=title,
-                source="YouTube",
-                comments=comments
-            )
-
-            posts.append(post)
-
-        return posts
--- a/create_dataset.py
+++ b/create_dataset.py
@@ -1,43 +0,0 @@
-import json
-import logging
-from connectors.reddit_api import RedditAPI
-from connectors.boards_api import BoardsAPI
-from connectors.youtube_api import YouTubeAPI
-
-posts_file = 'posts_test.jsonl'
-
-reddit_connector = RedditAPI()
-boards_connector = BoardsAPI()
-youtube_connector = YouTubeAPI()
-
-logging.basicConfig(level=logging.DEBUG)
-logging.getLogger("urllib3").setLevel(logging.WARNING)
-
-def remove_empty_posts(posts):
-    return [post for post in posts if post.content.strip() != ""]
-
-def save_to_jsonl(filename, posts):
-    with open(filename, 'a', encoding='utf-8') as f:
-        for post in posts:
-            # Convert post object to dict if it's a dataclass
-            data = post.to_dict()
-            f.write(json.dumps(data) + '\n')
-
-
-def main():
-    boards_posts = boards_connector.get_new_category_posts('cork-city', 1200, 1200)
-    save_to_jsonl(posts_file, boards_posts)
-
-    reddit_posts = reddit_connector.get_new_subreddit_posts('cork', 1200)
-    reddit_posts = remove_empty_posts(reddit_posts)
-    save_to_jsonl(posts_file, reddit_posts)
-
-    ireland_posts = reddit_connector.search_new_subreddit_posts('cork', 'ireland', 1200)
-    ireland_posts = remove_empty_posts(ireland_posts)
-    save_to_jsonl(posts_file, ireland_posts)
-
-    youtube_videos = youtube_connector.fetch_videos('cork city', 1200, 1200)
-    save_to_jsonl(posts_file, youtube_videos)
-
-if __name__ == "__main__":
-    main()
--- a/db/database.py
+++ b/db/database.py
@@ -1,138 +0,0 @@
-import os
-import psycopg2
-import pandas as pd
-from psycopg2.extras import RealDictCursor
-from psycopg2.extras import execute_batch, Json
-
-
-class PostgresConnector:
-    """
-    Simple PostgreSQL connector (single connection).
-    """
-
-    def __init__(self):
-        self.connection = psycopg2.connect(
-            host=os.getenv("POSTGRES_HOST", "localhost"),
-            port=os.getenv("POSTGRES_PORT", 5432),
-            user=os.getenv("POSTGRES_USER", "postgres"),
-            password=os.getenv("POSTGRES_PASSWORD", "postgres"),
-            database=os.getenv("POSTGRES_DB", "postgres"),
-        )
-        self.connection.autocommit = False
-
-    def execute(self, query, params=None, fetch=False) -> list:
-        with self.connection.cursor(cursor_factory=RealDictCursor) as cursor:
-            cursor.execute(query, params)
-            if fetch:
-                return cursor.fetchall()
-            self.connection.commit()
-
-    def executemany(self, query, param_list) -> list:
-        with self.connection.cursor(cursor_factory=RealDictCursor) as cursor:
-            cursor.executemany(query, param_list)
-            self.connection.commit()
-
-    ## User Management Methods
-    def save_user(self, username, email, password_hash):
-        query = """
-            INSERT INTO users (username, email, password_hash)
-            VALUES (%s, %s, %s)
-        """
-        self.execute(query, (username, email, password_hash))
-
-    def get_user_by_username(self, username) -> dict:
-        query = "SELECT id, username, email, password_hash FROM users WHERE username = %s"
-        result = self.execute(query, (username,), fetch=True)
-        return result[0] if result else None
-    
-    def get_user_by_email(self, email) -> dict:
-        query = "SELECT id, username, email, password_hash FROM users WHERE email = %s"
-        result = self.execute(query, (email,), fetch=True)
-        return result[0] if result else None
-    
-    # Dataset Management Methods
-    def save_dataset_info(self, user_id: int, dataset_name: str, topics: dict) -> int:
-        query = """
-            INSERT INTO datasets (user_id, name, topics)
-            VALUES (%s, %s, %s)
-            RETURNING id
-        """
-        result = self.execute(query, (user_id, dataset_name, Json(topics)), fetch=True)
-        return result[0]["id"] if result else None
-
-    def save_dataset_content(self, dataset_id: int, event_data: pd.DataFrame):
-        query = """
-            INSERT INTO events (
-                dataset_id,
-                type,
-                parent_id,
-                author,
-                content,
-                timestamp,
-                date,
-                dt,
-                hour,
-                weekday,
-                reply_to,
-                source,
-                topic,
-                topic_confidence,
-                ner_entities,
-                emotion_anger,
-                emotion_disgust,
-                emotion_fear,
-                emotion_joy,
-                emotion_sadness
-            )
-            VALUES (
-                %s, %s, %s, %s, %s,
-                %s, %s, %s, %s, %s,
-                %s, %s, %s, %s, %s,
-                %s, %s, %s, %s, %s
-            )
-        """
-
-        values = []
-
-        for _, row in event_data.iterrows():
-            values.append((
-                dataset_id,
-                row["type"],
-                row["parent_id"],
-                row["author"],
-                row["content"],
-                row["timestamp"],
-                row["date"],
-                row["dt"],
-                row["hour"],
-                row["weekday"],
-                row.get("reply_to"),
-                row["source"],
-                row.get("topic"),
-                row.get("topic_confidence"),
-                Json(row["ner_entities"]) if row.get("ner_entities") else None,
-                row.get("emotion_anger"),
-                row.get("emotion_disgust"),
-                row.get("emotion_fear"),
-                row.get("emotion_joy"),
-                row.get("emotion_sadness"),
-            ))
-
-        
-        with self.connection.cursor(cursor_factory=RealDictCursor) as cursor:
-            execute_batch(cursor, query, values)
-            self.connection.commit()
-
-    def get_dataset_content(self, dataset_id: int) -> pd.DataFrame:
-        query = "SELECT * FROM events WHERE dataset_id = %s"
-        result = self.execute(query, (dataset_id,), fetch=True)
-        return pd.DataFrame(result)
-    
-    def get_dataset_info(self, dataset_id: int) -> dict:
-        query = "SELECT * FROM datasets WHERE id = %s"
-        result = self.execute(query, (dataset_id,), fetch=True)
-        return result[0] if result else None
-
-    def close(self):
-        if self.connection:
-            self.connection.close()
--- a/docker-compose.dev.yml
+++ b/docker-compose.dev.yml
@@ -0,0 +1,72 @@
+services:
+  postgres:
+    image: postgres:16
+    container_name: crosspost_db
+    restart: unless-stopped
+    env_file:
+      - .env
+    ports:
+      - "5432:5432"
+    volumes:
+      - ${POSTGRES_DIR}:/var/lib/postgresql/data
+      - ./server/db/schema.sql:/docker-entrypoint-initdb.d/schema.sql
+
+  redis:
+    image: redis:7
+    container_name: crosspost_redis
+    restart: unless-stopped
+    ports:
+      - "6379:6379"
+
+  backend:
+    build: .
+    container_name: crosspost_flask
+    volumes:
+      - .:/app
+      - model_cache:/models
+    env_file:
+      - .env
+    ports:
+      - "5000:5000"
+    command: gunicorn server.app:app --bind 0.0.0.0:5000 --workers 2 --threads 4
+    depends_on:
+      - postgres
+      - redis
+
+  worker:
+    build: .
+    volumes:
+      - .:/app
+      - model_cache:/models
+    container_name: crosspost_worker
+    env_file:
+      - .env
+    command: >
+      celery -A server.queue.celery_app.celery worker
+      --loglevel=debug
+      --pool=solo
+    depends_on:
+      - postgres
+      - redis
+    deploy:
+      resources:
+        reservations:
+          devices:
+            - driver: nvidia
+              count: 1
+              capabilities: [gpu]
+
+  frontend:
+    build:
+      context: ./frontend
+    container_name: crosspost_frontend
+    volumes:
+      - ./frontend:/app
+      - /app/node_modules
+    ports:
+      - "5173:5173"
+    depends_on:
+      - backend
+
+volumes:
+  model_cache:
--- a/docker-compose.yml
+++ b/docker-compose.yml
@@ -1,15 +1,69 @@
 services:
  postgres:
    image: postgres:16
-    container_name: postgres_db
+    container_name: crosspost_db
    restart: unless-stopped
    env_file:
      - .env
    ports:
      - "5432:5432"
    volumes:
-      - ./db/postgres_vol:/var/lib/postgresql/data
-      - ./db/schema.sql:/docker-entrypoint-initdb.d/schema.sql
+      - ${POSTGRES_DIR}:/var/lib/postgresql/data
+      - ./server/db/schema.sql:/docker-entrypoint-initdb.d/schema.sql
+
+  redis:
+    image: redis:7
+    container_name: crosspost_redis
+    restart: unless-stopped
+    ports:
+      - "6379:6379"
+
+  backend:
+    build: .
+    container_name: crosspost_flask
+    volumes:
+      - model_cache:/models
+    env_file:
+      - .env
+    ports:
+      - "5000:5000"
+    command: flask --app server.app run --host=0.0.0.0
+    depends_on:
+      - postgres
+      - redis
+
+  worker:
+    build: .
+    volumes:
+      - model_cache:/models
+    container_name: crosspost_worker
+    env_file:
+      - .env
+    command: >
+      celery -A server.queue.celery_app.celery worker
+      --loglevel=warning
+      --pool=solo
+    depends_on:
+      - postgres
+      - redis
+    deploy:
+      resources:
+        reservations:
+          devices:
+            - driver: nvidia
+              count: 1
+              capabilities: [gpu]
+
+  frontend:
+    build:
+      context: ./frontend
+    container_name: crosspost_frontend
+    volumes:
+      - /app/node_modules
+    ports:
+      - "5173:5173"
+    depends_on:
+      - backend

 volumes:
-  postgres_data:
+  model_cache:
--- a/dto/user.py
+++ b/dto/user.py
@@ -1,8 +0,0 @@
-# Generic User Data Transfer Object for social media platforms
-class User:
-    def __init__(self, username: str, created_utc: int, ):
-        self.username = username
-        self.created_utc = created_utc
-
-        # Optionals
-        self.karma = None
--- a/example.env
+++ b/example.env
@@ -0,0 +1,30 @@
+# API Keys
+YOUTUBE_API_KEY=
+REDDIT_CLIENT_ID=
+REDDIT_CLIENT_SECRET=
+
+# Database
+# Database
+POSTGRES_USER=postgres
+POSTGRES_PASSWORD=postgres
+POSTGRES_DB=mydatabase
+POSTGRES_HOST=postgres
+POSTGRES_PORT=5432
+POSTGRES_DIR=./db
+
+# JWT
+JWT_SECRET_KEY=
+JWT_ACCESS_TOKEN_EXPIRES=28800
+
+# Models
+HF_HOME=/models/huggingface
+TRANSFORMERS_CACHE=/models/huggingface
+TORCH_HOME=/models/torch
+
+# URLs
+FRONTEND_URL=http://localhost:5173
+BACKEND_URL=http://backend:5000
+REDIS_URL=redis://redis:6379/0
+
+# API & Scraping
+MAX_FETCH_LIMIT=1000
--- a/frontend/Dockerfile
+++ b/frontend/Dockerfile
@@ -0,0 +1,13 @@
+FROM node:20-alpine
+
+WORKDIR /app
+
+COPY package.json package-lock.json* ./
+RUN npm install
+
+# Copy rest of the app
+COPY . .
+
+EXPOSE 5173
+
+CMD ["npm", "run", "dev", "--", "--host", "0.0.0.0"]
--- a/frontend/index.html
+++ b/frontend/index.html
@@ -2,7 +2,7 @@
 <html lang="en">
  <head>
    <meta charset="UTF-8" />
-    <link rel="icon" type="image/svg+xml" href="/vite.svg" />
+    <link rel="icon" type="image/png" href="/icon.png" />
    <meta name="viewport" content="width=device-width, initial-scale=1.0" />
    <title>frontend</title>
  </head>
--- a/frontend/public/icon.png
+++ b/frontend/public/icon.png
--- a/frontend/src/App.tsx
+++ b/frontend/src/App.tsx
@@ -1,12 +1,34 @@
-import { Routes, Route } from "react-router-dom";
+import { useEffect } from "react";
+import { Navigate, Route, Routes, useLocation } from "react-router-dom";
+import AppLayout from "./components/AppLayout";
+import DatasetsPage from "./pages/Datasets";
+import DatasetStatusPage from "./pages/DatasetStatus";
+import LoginPage from "./pages/Login";
 import UploadPage from "./pages/Upload";
+import AutoFetchPage from "./pages/AutoFetch";
 import StatPage from "./pages/Stats";
+import { getDocumentTitle } from "./utils/documentTitle";
+import DatasetEditPage from "./pages/DatasetEdit";

 function App() {
+  const location = useLocation();
+
+  useEffect(() => {
+    document.title = getDocumentTitle(location.pathname);
+  }, [location.pathname]);
+
  return (
    <Routes>
-      <Route path="/upload" element={<UploadPage />} />
-      <Route path="/stats" element={<StatPage />} />
+      <Route element={<AppLayout />}>
+        <Route path="/" element={<Navigate to="/login" replace />} />
+        <Route path="/login" element={<LoginPage />} />
+        <Route path="/upload" element={<UploadPage />} />
+        <Route path="/auto-fetch" element={<AutoFetchPage />} />
+        <Route path="/datasets" element={<DatasetsPage />} />
+        <Route path="/dataset/:datasetId/status" element={<DatasetStatusPage />} />
+        <Route path="/dataset/:datasetId/stats" element={<StatPage />} />
+        <Route path="/dataset/:datasetId/edit" element={<DatasetEditPage />} />
+      </Route>
    </Routes>
  );
 }
--- a/frontend/src/components/AppLayout.tsx
+++ b/frontend/src/components/AppLayout.tsx
@@ -0,0 +1,135 @@
+import { useCallback, useEffect, useState } from "react";
+import axios from "axios";
+import { Outlet, useLocation, useNavigate } from "react-router-dom";
+import StatsStyling from "../styles/stats_styling";
+
+const API_BASE_URL = import.meta.env.VITE_BACKEND_URL;
+
+type ProfileResponse = {
+  user?: Record<string, unknown>;
+};
+
+const styles = StatsStyling;
+
+const getUserLabel = (user: Record<string, unknown> | null) => {
+  if (!user) {
+    return "Signed in";
+  }
+
+  const username = user.username;
+  if (typeof username === "string" && username.length > 0) {
+    return username;
+  }
+
+  const email = user.email;
+  if (typeof email === "string" && email.length > 0) {
+    return email;
+  }
+
+  return "Signed in";
+};
+
+const AppLayout = () => {
+  const location = useLocation();
+  const navigate = useNavigate();
+  const [isSignedIn, setIsSignedIn] = useState(false);
+  const [currentUser, setCurrentUser] = useState<Record<
+    string,
+    unknown
+  > | null>(null);
+
+  const syncAuthState = useCallback(async () => {
+    const token = localStorage.getItem("access_token");
+
+    if (!token) {
+      setIsSignedIn(false);
+      setCurrentUser(null);
+      delete axios.defaults.headers.common.Authorization;
+      return;
+    }
+
+    axios.defaults.headers.common.Authorization = `Bearer ${token}`;
+
+    try {
+      const response = await axios.get<ProfileResponse>(
+        `${API_BASE_URL}/profile`,
+      );
+      setIsSignedIn(true);
+      setCurrentUser(response.data.user ?? null);
+    } catch {
+      localStorage.removeItem("access_token");
+      delete axios.defaults.headers.common.Authorization;
+      setIsSignedIn(false);
+      setCurrentUser(null);
+    }
+  }, []);
+
+  useEffect(() => {
+    void syncAuthState();
+  }, [location.pathname, syncAuthState]);
+
+  const onAuthButtonClick = () => {
+    if (isSignedIn) {
+      localStorage.removeItem("access_token");
+      delete axios.defaults.headers.common.Authorization;
+      setIsSignedIn(false);
+      setCurrentUser(null);
+      navigate("/login", { replace: true });
+      return;
+    }
+
+    navigate("/login");
+  };
+
+  return (
+    <div style={styles.appShell}>
+      <div style={{ ...styles.container, ...styles.appHeaderWrap }}>
+        <div style={{ ...styles.card, ...styles.headerBar }}>
+          <div style={styles.appHeaderBrandRow}>
+            <span style={styles.appTitle}>CrossPost Analysis Engine</span>
+            <span
+              style={{
+                ...styles.authStatusBadge,
+                ...(isSignedIn
+                  ? styles.authStatusSignedIn
+                  : styles.authStatusSignedOut),
+              }}
+            >
+              {isSignedIn
+                ? `Signed in: ${getUserLabel(currentUser)}`
+                : "Not signed in"}
+            </span>
+          </div>
+
+          <div style={styles.controlsWrapped}>
+            {isSignedIn && (
+              <button
+                type="button"
+                style={
+                  location.pathname === "/datasets"
+                    ? styles.buttonPrimary
+                    : styles.buttonSecondary
+                }
+                onClick={() => navigate("/datasets")}
+              >
+                My datasets
+              </button>
+            )}
+
+            <button
+              type="button"
+              style={isSignedIn ? styles.buttonSecondary : styles.buttonPrimary}
+              onClick={onAuthButtonClick}
+            >
+              {isSignedIn ? "Sign out" : "Sign in"}
+            </button>
+          </div>
+        </div>
+      </div>
+
+      <Outlet />
+    </div>
+  );
+};
+
+export default AppLayout;
--- a/frontend/src/components/Card.tsx
+++ b/frontend/src/components/Card.tsx
@@ -1,52 +1,27 @@
 import type { CSSProperties } from "react";
+import StatsStyling from "../styles/stats_styling";
+
+const styles = StatsStyling;

 const Card = (props: {
  label: string;
  value: string | number;
  sublabel?: string;
  rightSlot?: React.ReactNode;
-  style?: CSSProperties
+  style?: CSSProperties;
 }) => {
  return (
-    <div style={{
-        background: "rgba(255,255,255,0.85)",
-        border: "1px solid rgba(15,23,42,0.08)",
-        borderRadius: 16,
-        padding: 14,
-        boxShadow: "0 12px 30px rgba(15,23,42,0.06)",
-        minHeight: 88,
-        ...props.style
-    }}>
-      <div style={ {
-        display: "flex",
-        justifyContent: "space-between",
-        alignItems: "center",
-        gap: 10,
-    }}>
-        <div style={{   
-            fontSize: 12,
-            fontWeight: 700,
-            color: "rgba(15, 23, 42, 0.65)",
-            letterSpacing: "0.02em",
-            textTransform: "uppercase"
-            }}>
-                {props.label}
-        </div>
+    <div style={{ ...styles.cardBase, ...props.style }}>
+      <div style={styles.cardTopRow}>
+        <div style={styles.cardLabel}>{props.label}</div>
        {props.rightSlot ? <div>{props.rightSlot}</div> : null}
      </div>
-      <div style={{
-            fontSize: 22,
-            fontWeight: 850,
-            marginTop: 6,
-            letterSpacing: "-0.02em",
-        }}>{props.value}</div>
-      {props.sublabel ? <div style={{
-            marginTop: 6,
-            fontSize: 12,
-            color: "rgba(15, 23, 42, 0.55)",
-        }}>{props.sublabel}</div> : null}
+      <div style={styles.cardValue}>{props.value}</div>
+      {props.sublabel ? (
+        <div style={styles.cardSubLabel}>{props.sublabel}</div>
+      ) : null}
    </div>
  );
-}
+};

-export default Card;
+export default Card;
--- a/frontend/src/components/ConfirmationModal.tsx
+++ b/frontend/src/components/ConfirmationModal.tsx
@@ -0,0 +1,58 @@
+import { Dialog, DialogPanel, DialogTitle } from "@headlessui/react";
+import StatsStyling from "../styles/stats_styling";
+
+type Props = {
+  open: boolean;
+  title: string;
+  message: string;
+  confirmLabel?: string;
+  cancelLabel?: string;
+  loading?: boolean;
+  onConfirm: () => void;
+  onCancel: () => void;
+};
+
+const styles = StatsStyling;
+
+export default function ConfirmationModal({
+  open,
+  title,
+  message,
+  confirmLabel = "Confirm",
+  cancelLabel = "Cancel",
+  loading = false,
+  onConfirm,
+  onCancel,
+}: Props) {
+  return (
+    <Dialog open={open} onClose={onCancel} style={styles.modalRoot}>
+      <div style={styles.modalBackdrop} />
+
+      <div style={styles.modalContainer}>
+        <DialogPanel style={{ ...styles.card, ...styles.modalPanel }}>
+          <DialogTitle style={styles.sectionTitle}>{title}</DialogTitle>
+          <p style={styles.sectionSubtitle}>{message}</p>
+
+          <div style={{ display: "flex", justifyContent: "flex-end", gap: 8 }}>
+            <button
+              type="button"
+              onClick={onCancel}
+              style={styles.buttonSecondary}
+              disabled={loading}
+            >
+              {cancelLabel}
+            </button>
+            <button
+              type="button"
+              onClick={onConfirm}
+              style={styles.buttonDanger}
+              disabled={loading}
+            >
+              {loading ? "Deleting..." : confirmLabel}
+            </button>
+          </div>
+        </DialogPanel>
+      </div>
+    </Dialog>
+  );
+}
--- a/frontend/src/components/CorpusExplorer.tsx
+++ b/frontend/src/components/CorpusExplorer.tsx
@@ -0,0 +1,247 @@
+import { useEffect, useState } from "react";
+import { Dialog, DialogPanel, DialogTitle } from "@headlessui/react";
+
+import StatsStyling from "../styles/stats_styling";
+import type { DatasetRecord } from "../utils/corpusExplorer";
+
+const styles = StatsStyling;
+const INITIAL_RECORD_COUNT = 60;
+const RECORD_BATCH_SIZE = 60;
+const EXCERPT_LENGTH = 320;
+
+const cleanText = (value: unknown) => {
+  if (typeof value !== "string") {
+    return "";
+  }
+
+  const trimmed = value.trim();
+  if (!trimmed) {
+    return "";
+  }
+
+  const lowered = trimmed.toLowerCase();
+  if (lowered === "nan" || lowered === "null" || lowered === "undefined") {
+    return "";
+  }
+
+  return trimmed;
+};
+
+const displayText = (value: unknown, fallback: string) => {
+  const cleaned = cleanText(value);
+  return cleaned || fallback;
+};
+
+type CorpusExplorerProps = {
+  open: boolean;
+  onClose: () => void;
+  title: string;
+  description: string;
+  records: DatasetRecord[];
+  loading: boolean;
+  error: string;
+  emptyMessage: string;
+};
+
+const formatRecordDate = (record: DatasetRecord) => {
+  if (typeof record.dt === "string" && record.dt) {
+    const date = new Date(record.dt);
+    if (!Number.isNaN(date.getTime())) {
+      return date.toLocaleString();
+    }
+  }
+
+  if (typeof record.date === "string" && record.date) {
+    return record.date;
+  }
+
+  if (typeof record.timestamp === "number") {
+    return new Date(record.timestamp * 1000).toLocaleString();
+  }
+
+  return "Unknown time";
+};
+
+const getRecordKey = (record: DatasetRecord, index: number) =>
+  String(record.id ?? record.post_id ?? `${record.author ?? "record"}-${index}`);
+
+const getRecordTitle = (record: DatasetRecord) => {
+  if (record.type === "comment") {
+    return "";
+  }
+
+  const title = cleanText(record.title);
+  if (title) {
+    return title;
+  }
+
+  const content = cleanText(record.content);
+  if (!content) {
+    return "Untitled record";
+  }
+
+  return content.length > 120 ? `${content.slice(0, 117)}...` : content;
+};
+
+const CorpusExplorer = ({
+  open,
+  onClose,
+  title,
+  description,
+  records,
+  loading,
+  error,
+  emptyMessage,
+}: CorpusExplorerProps) => {
+  const [visibleCount, setVisibleCount] = useState(INITIAL_RECORD_COUNT);
+  const [expandedKeys, setExpandedKeys] = useState<Record<string, boolean>>({});
+
+  useEffect(() => {
+    if (open) {
+      setVisibleCount(INITIAL_RECORD_COUNT);
+      setExpandedKeys({});
+    }
+  }, [open, title, records.length]);
+
+  const hasMoreRecords = visibleCount < records.length;
+
+  return (
+    <Dialog open={open} onClose={onClose} style={styles.modalRoot}>
+      <div style={styles.modalBackdrop} />
+
+      <div style={styles.modalContainer}>
+        <DialogPanel
+          style={{
+            ...styles.card,
+            ...styles.modalPanel,
+            width: "min(960px, 96vw)",
+            maxHeight: "88vh",
+            display: "flex",
+            flexDirection: "column",
+            gap: 12,
+            overflow: "hidden",
+          }}
+        >
+          <div style={styles.headerBar}>
+            <div style={{ minWidth: 0 }}>
+              <DialogTitle style={styles.sectionTitle}>{title}</DialogTitle>
+              <p style={styles.sectionSubtitle}>
+                {description} {loading ? "Loading records..." : `${records.length.toLocaleString()} records.`}
+              </p>
+            </div>
+
+            <button onClick={onClose} style={styles.buttonSecondary}>
+              Close
+            </button>
+          </div>
+
+          {error ? <p style={styles.sectionSubtitle}>{error}</p> : null}
+
+          {!loading && !error && !records.length ? (
+            <p style={styles.sectionSubtitle}>{emptyMessage}</p>
+          ) : null}
+
+          {loading ? <div style={styles.topUserMeta}>Preparing corpus slice...</div> : null}
+
+          {!loading && !error && records.length ? (
+            <>
+              <div
+                style={{
+                  ...styles.topUsersList,
+                  overflowY: "auto",
+                  overflowX: "hidden",
+                  paddingRight: 4,
+                }}
+              >
+                {records.slice(0, visibleCount).map((record, index) => {
+                  const recordKey = getRecordKey(record, index);
+                  const titleText = getRecordTitle(record);
+                  const content = cleanText(record.content);
+                  const isExpanded = !!expandedKeys[recordKey];
+                  const canExpand = content.length > EXCERPT_LENGTH;
+                  const excerpt =
+                    canExpand && !isExpanded
+                      ? `${content.slice(0, EXCERPT_LENGTH - 3)}...`
+                      : content || "No content available.";
+
+                  return (
+                    <div key={recordKey} style={styles.topUserItem}>
+                      <div style={{ ...styles.headerBar, alignItems: "flex-start" }}>
+                        <div style={{ minWidth: 0, flex: 1 }}>
+                          {titleText ? <div style={styles.topUserName}>{titleText}</div> : null}
+                          <div
+                            style={{
+                              ...styles.topUserMeta,
+                              overflowWrap: "anywhere",
+                              wordBreak: "break-word",
+                            }}
+                          >
+                            {displayText(record.author, "Unknown author")} • {displayText(record.source, "Unknown source")} • {displayText(record.type, "record")} • {formatRecordDate(record)}
+                          </div>
+                        </div>
+                        <div
+                          style={{
+                            ...styles.topUserMeta,
+                            marginLeft: 12,
+                            textAlign: "right",
+                            overflowWrap: "anywhere",
+                            wordBreak: "break-word",
+                          }}
+                        >
+                          {cleanText(record.topic) ? `Topic: ${cleanText(record.topic)}` : ""}
+                        </div>
+                      </div>
+
+                      <div
+                        style={{
+                          ...styles.topUserMeta,
+                          marginTop: 8,
+                          whiteSpace: "pre-wrap",
+                          overflowWrap: "anywhere",
+                          wordBreak: "break-word",
+                        }}
+                      >
+                        {excerpt}
+                      </div>
+
+                      {canExpand ? (
+                        <div style={{ marginTop: 10 }}>
+                          <button
+                            onClick={() =>
+                              setExpandedKeys((current) => ({
+                                ...current,
+                                [recordKey]: !current[recordKey],
+                              }))
+                            }
+                            style={styles.buttonSecondary}
+                          >
+                            {isExpanded ? "Show Less" : "Show More"}
+                          </button>
+                        </div>
+                      ) : null}
+                    </div>
+                  );
+                })}
+              </div>
+
+              {hasMoreRecords ? (
+                <div style={{ display: "flex", justifyContent: "center" }}>
+                  <button
+                    onClick={() =>
+                      setVisibleCount((current) => current + RECORD_BATCH_SIZE)
+                    }
+                    style={styles.buttonSecondary}
+                  >
+                    Show More Records
+                  </button>
+                </div>
+              ) : null}
+            </>
+          ) : null}
+        </DialogPanel>
+      </div>
+    </Dialog>
+  );
+};
+
+export default CorpusExplorer;
--- a/frontend/src/components/CulturalStats.tsx
+++ b/frontend/src/components/CulturalStats.tsx
@@ -0,0 +1,249 @@
+import Card from "./Card";
+import StatsStyling from "../styles/stats_styling";
+import type { CulturalAnalysisResponse } from "../types/ApiTypes";
+import {
+  buildCertaintySpec,
+  buildDeonticSpec,
+  buildEntitySpec,
+  buildHedgeSpec,
+  buildIdentityBucketSpec,
+  buildPermissionSpec,
+  type CorpusExplorerSpec,
+} from "../utils/corpusExplorer";
+
+const styles = StatsStyling;
+const exploreButtonStyle = { padding: "4px 8px", fontSize: 12 };
+
+type CulturalStatsProps = {
+  data: CulturalAnalysisResponse;
+  onExplore: (spec: CorpusExplorerSpec) => void;
+};
+
+const renderExploreButton = (onClick: () => void) => (
+  <button
+    onClick={onClick}
+    style={{ ...styles.buttonSecondary, ...exploreButtonStyle }}
+  >
+    Explore
+  </button>
+);
+
+const CulturalStats = ({ data, onExplore }: CulturalStatsProps) => {
+  const identity = data.identity_markers;
+  const stance = data.stance_markers;
+  const inGroupWords = identity?.in_group_usage ?? 0;
+  const outGroupWords = identity?.out_group_usage ?? 0;
+  const totalGroupWords = inGroupWords + outGroupWords;
+  const inGroupWordRate =
+    typeof identity?.in_group_ratio === "number"
+      ? identity.in_group_ratio * 100
+      : null;
+  const outGroupWordRate =
+    typeof identity?.out_group_ratio === "number"
+      ? identity.out_group_ratio * 100
+      : null;
+  const rawEntities = data.avg_emotion_per_entity?.entity_emotion_avg ?? {};
+  const entities = Object.entries(rawEntities)
+    .sort((a, b) => b[1].post_count - a[1].post_count)
+    .slice(0, 20);
+
+  const topEmotion = (emotionAvg: Record<string, number> | undefined) => {
+    const entries = Object.entries(emotionAvg ?? {});
+    if (!entries.length) {
+      return "-";
+    }
+
+    entries.sort((a, b) => b[1] - a[1]);
+    const dominant = entries[0] ?? ["emotion_unknown", 0];
+    const dominantLabel = dominant[0].replace("emotion_", "");
+    return `${dominantLabel} (${(dominant[1] * 100).toFixed(1)}%)`;
+  };
+
+  return (
+    <div style={styles.page}>
+      <div style={{ ...styles.container, ...styles.grid }}>
+        <div style={{ ...styles.card, gridColumn: "span 12" }}>
+          <h2 style={styles.sectionTitle}>Community Framing Overview</h2>
+          <p style={styles.sectionSubtitle}>
+            Simple view of how often people use "us" words vs "them" words, and
+            the tone around that language.
+          </p>
+        </div>
+
+        <Card
+          label="In-Group Words"
+          value={inGroupWords.toLocaleString()}
+          sublabel="Times we/us/our appears"
+          style={{ gridColumn: "span 3" }}
+        />
+        <Card
+          label="Out-Group Words"
+          value={outGroupWords.toLocaleString()}
+          sublabel="Times they/them/their appears"
+          style={{ gridColumn: "span 3" }}
+        />
+        <Card
+          label="In-Group Posts"
+          value={identity?.in_group_posts?.toLocaleString() ?? "-"}
+          sublabel='Posts leaning toward "us" language'
+          rightSlot={renderExploreButton(() =>
+            onExplore(buildIdentityBucketSpec("in")),
+          )}
+          style={{ gridColumn: "span 3" }}
+        />
+        <Card
+          label="Out-Group Posts"
+          value={identity?.out_group_posts?.toLocaleString() ?? "-"}
+          sublabel='Posts leaning toward "them" language'
+          rightSlot={renderExploreButton(() =>
+            onExplore(buildIdentityBucketSpec("out")),
+          )}
+          style={{ gridColumn: "span 3" }}
+        />
+
+        <Card
+          label="Balanced Posts"
+          value={identity?.tie_posts?.toLocaleString() ?? "-"}
+          sublabel="Posts with equal us/them signals"
+          rightSlot={renderExploreButton(() =>
+            onExplore(buildIdentityBucketSpec("tie")),
+          )}
+          style={{ gridColumn: "span 3" }}
+        />
+        <Card
+          label="Total Group Words"
+          value={totalGroupWords.toLocaleString()}
+          sublabel="In-group + out-group words"
+          style={{ gridColumn: "span 3" }}
+        />
+        <Card
+          label="In-Group Share"
+          value={
+            inGroupWordRate === null ? "-" : `${inGroupWordRate.toFixed(2)}%`
+          }
+          sublabel="Share of all words"
+          style={{ gridColumn: "span 3" }}
+        />
+        <Card
+          label="Out-Group Share"
+          value={
+            outGroupWordRate === null ? "-" : `${outGroupWordRate.toFixed(2)}%`
+          }
+          sublabel="Share of all words"
+          style={{ gridColumn: "span 3" }}
+        />
+
+        <Card
+          label="Hedging Words"
+          value={stance?.hedge_total?.toLocaleString() ?? "-"}
+          sublabel={
+            typeof stance?.hedge_per_1k_tokens === "number"
+              ? `${stance.hedge_per_1k_tokens.toFixed(1)} per 1k words`
+              : "Word frequency"
+          }
+          rightSlot={renderExploreButton(() => onExplore(buildHedgeSpec()))}
+          style={{ gridColumn: "span 3" }}
+        />
+        <Card
+          label="Certainty Words"
+          value={stance?.certainty_total?.toLocaleString() ?? "-"}
+          sublabel={
+            typeof stance?.certainty_per_1k_tokens === "number"
+              ? `${stance.certainty_per_1k_tokens.toFixed(1)} per 1k words`
+              : "Word frequency"
+          }
+          rightSlot={renderExploreButton(() => onExplore(buildCertaintySpec()))}
+          style={{ gridColumn: "span 3" }}
+        />
+        <Card
+          label="Need/Should Words"
+          value={stance?.deontic_total?.toLocaleString() ?? "-"}
+          sublabel={
+            typeof stance?.deontic_per_1k_tokens === "number"
+              ? `${stance.deontic_per_1k_tokens.toFixed(1)} per 1k words`
+              : "Word frequency"
+          }
+          rightSlot={renderExploreButton(() => onExplore(buildDeonticSpec()))}
+          style={{ gridColumn: "span 3" }}
+        />
+        <Card
+          label="Permission Words"
+          value={stance?.permission_total?.toLocaleString() ?? "-"}
+          sublabel={
+            typeof stance?.permission_per_1k_tokens === "number"
+              ? `${stance.permission_per_1k_tokens.toFixed(1)} per 1k words`
+              : "Word frequency"
+          }
+          rightSlot={renderExploreButton(() => onExplore(buildPermissionSpec()))}
+          style={{ gridColumn: "span 3" }}
+        />
+
+        <div style={{ ...styles.card, gridColumn: "span 6" }}>
+          <h2 style={styles.sectionTitle}>Mood in "Us" Posts</h2>
+          <p style={styles.sectionSubtitle}>
+            Most likely emotion when in-group wording is stronger.
+          </p>
+          <div style={styles.topUserName}>{topEmotion(identity?.in_group_emotion_avg)}</div>
+          <div style={{ marginTop: 12 }}>
+            <button
+              onClick={() => onExplore(buildIdentityBucketSpec("in"))}
+              style={styles.buttonSecondary}
+            >
+              Explore records
+            </button>
+          </div>
+        </div>
+
+        <div style={{ ...styles.card, gridColumn: "span 6" }}>
+          <h2 style={styles.sectionTitle}>Mood in "Them" Posts</h2>
+          <p style={styles.sectionSubtitle}>
+            Most likely emotion when out-group wording is stronger.
+          </p>
+          <div style={styles.topUserName}>{topEmotion(identity?.out_group_emotion_avg)}</div>
+          <div style={{ marginTop: 12 }}>
+            <button
+              onClick={() => onExplore(buildIdentityBucketSpec("out"))}
+              style={styles.buttonSecondary}
+            >
+              Explore records
+            </button>
+          </div>
+        </div>
+
+        <div style={{ ...styles.card, gridColumn: "span 12" }}>
+          <h2 style={styles.sectionTitle}>Entity Mood Snapshot</h2>
+          <p style={styles.sectionSubtitle}>
+            Most mentioned entities and the mood that appears most with each.
+          </p>
+          {!entities.length ? (
+            <div style={styles.topUserMeta}>No entity-level cultural data available.</div>
+          ) : (
+            <div
+              style={{
+                ...styles.topUsersList,
+                maxHeight: 420,
+                overflowY: "auto",
+              }}
+            >
+              {entities.map(([entity, aggregate]) => (
+                <div
+                  key={entity}
+                  style={{ ...styles.topUserItem, cursor: "pointer" }}
+                  onClick={() => onExplore(buildEntitySpec(entity))}
+                >
+                  <div style={styles.topUserName}>{entity}</div>
+                  <div style={styles.topUserMeta}>
+                    {aggregate.post_count.toLocaleString()} posts • Likely mood:{" "}
+                    {topEmotion(aggregate.emotion_avg)}
+                  </div>
+                </div>
+              ))}
+            </div>
+          )}
+        </div>
+      </div>
+    </div>
+  );
+};
+
+export default CulturalStats;
--- a/frontend/src/components/EmotionalStats.tsx
+++ b/frontend/src/components/EmotionalStats.tsx
@@ -1,14 +1,25 @@
-import type { ContentAnalysisResponse } from "../types/ApiTypes"
+import type { EmotionalAnalysisResponse } from "../types/ApiTypes";
 import StatsStyling from "../styles/stats_styling";
+import {
+  buildDominantEmotionSpec,
+  buildSourceSpec,
+  buildTopicSpec,
+  type CorpusExplorerSpec,
+} from "../utils/corpusExplorer";

 const styles = StatsStyling;

 type EmotionalStatsProps = {
-  contentData: ContentAnalysisResponse;
-}
+  emotionalData: EmotionalAnalysisResponse;
+  onExplore: (spec: CorpusExplorerSpec) => void;
+};

-const EmotionalStats = ({contentData}: EmotionalStatsProps) => {
-  const rows = contentData.average_emotion_by_topic ?? [];
+const EmotionalStats = ({ emotionalData, onExplore }: EmotionalStatsProps) => {
+  const rows = emotionalData.average_emotion_by_topic ?? [];
+  const overallEmotionAverage = emotionalData.overall_emotion_average ?? [];
+  const dominantEmotionDistribution =
+    emotionalData.dominant_emotion_distribution ?? [];
+  const emotionBySource = emotionalData.emotion_by_source ?? [];
  const lowSampleThreshold = 20;
  const stableSampleThreshold = 50;
  const emotionKeys = rows.length
@@ -31,7 +42,7 @@ const EmotionalStats = ({contentData}: EmotionalStatsProps) => {
      topic: String(row.topic),
      count: Number(row.n ?? 0),
      emotion: maxKey.replace("emotion_", "") || "unknown",
-      value: maxValue > Number.NEGATIVE_INFINITY ? maxValue : 0
+      value: maxValue > Number.NEGATIVE_INFINITY ? maxValue : 0,
    };
  });

@@ -45,8 +56,12 @@ const EmotionalStats = ({contentData}: EmotionalStatsProps) => {
    .filter((count) => Number.isFinite(count) && count > 0)
    .sort((a, b) => a - b);

-  const lowSampleTopics = strongestPerTopic.filter((topic) => topic.count < lowSampleThreshold).length;
-  const stableSampleTopics = strongestPerTopic.filter((topic) => topic.count >= stableSampleThreshold).length;
+  const lowSampleTopics = strongestPerTopic.filter(
+    (topic) => topic.count < lowSampleThreshold,
+  ).length;
+  const stableSampleTopics = strongestPerTopic.filter(
+    (topic) => topic.count >= stableSampleThreshold,
+  ).length;

  const medianSampleSize = sampleSizes.length
    ? sampleSizes[Math.floor(sampleSizes.length / 2)]
@@ -64,42 +79,184 @@ const EmotionalStats = ({contentData}: EmotionalStatsProps) => {
  return (
    <div style={styles.page}>
      <div style={{ ...styles.container, ...styles.card, marginTop: 16 }}>
-        <h2 style={styles.sectionTitle}>Average Emotion by Topic</h2>
-        <p style={styles.sectionSubtitle}>Read confidence together with sample size. Topics with fewer than {lowSampleThreshold} events are usually noisy and less reliable.</p>
-        <div style={{ display: "flex", flexWrap: "wrap", gap: 10, fontSize: 13, color: "#4b5563", marginTop: 6 }}>
-          <span><strong style={{ color: "#111827" }}>Topics:</strong> {strongestPerTopic.length}</span>
-          <span><strong style={{ color: "#111827" }}>Median Sample:</strong> {medianSampleSize} events</span>
-          <span><strong style={{ color: "#111827" }}>Low Sample (&lt;{lowSampleThreshold}):</strong> {lowSampleTopics}</span>
-          <span><strong style={{ color: "#111827" }}>Stable Sample ({stableSampleThreshold}+):</strong> {stableSampleTopics}</span>
+        <h2 style={styles.sectionTitle}>Topic Mood Overview</h2>
+        <p style={styles.sectionSubtitle}>
+          Use the strength score together with post count. Topics with fewer
+          than {lowSampleThreshold} events are often noisy.
+        </p>
+        <div style={styles.emotionalSummaryRow}>
+          <span>
+            <strong style={{ color: "#24292f" }}>Topics:</strong>{" "}
+            {strongestPerTopic.length}
+          </span>
+          <span>
+            <strong style={{ color: "#24292f" }}>Median Posts:</strong>{" "}
+            {medianSampleSize}
+          </span>
+          <span>
+            <strong style={{ color: "#24292f" }}>
+              Small Topics (&lt;{lowSampleThreshold}):
+            </strong>{" "}
+            {lowSampleTopics}
+          </span>
+          <span>
+            <strong style={{ color: "#24292f" }}>
+              Stable Topics ({stableSampleThreshold}+):
+            </strong>{" "}
+            {stableSampleTopics}
+          </span>
        </div>
-        <p style={{ ...styles.sectionSubtitle, marginTop: 10, marginBottom: 0 }}>
-          Confidence reflects how strongly one emotion leads within a topic, not model accuracy. Use larger samples for stronger conclusions.
+        <p
+          style={{ ...styles.sectionSubtitle, marginTop: 10, marginBottom: 0 }}
+        >
+          Strength means how far the top emotion is ahead in that topic. It does
+          not mean model accuracy.
        </p>
      </div>

      <div style={{ ...styles.container, ...styles.grid }}>
-        {strongestPerTopic.map((topic) => (
-          <div key={topic.topic} style={{ ...styles.card, gridColumn: "span 4" }}>
-            <h3 style={{ ...styles.sectionTitle, marginBottom: 6 }}>{topic.topic}</h3>
-            <div style={{ fontSize: 12, fontWeight: 700, color: "#6b7280", letterSpacing: "0.02em", textTransform: "uppercase" }}>
-              Top Emotion
+        <div style={{ ...styles.card, gridColumn: "span 4" }}>
+          <h2 style={styles.sectionTitle}>Mood Averages</h2>
+          <p style={styles.sectionSubtitle}>Average score for each emotion.</p>
+          {!overallEmotionAverage.length ? (
+            <div style={styles.topUserMeta}>
+              No overall emotion averages available.
            </div>
-            <div style={{ fontSize: 24, fontWeight: 800, marginTop: 4, lineHeight: 1.2 }}>
-              {formatEmotion(topic.emotion)}
+          ) : (
+            <div
+              style={{
+                ...styles.topUsersList,
+                maxHeight: 260,
+                overflowY: "auto",
+              }}
+            >
+              {[...overallEmotionAverage]
+                .sort((a, b) => b.score - a.score)
+                .map((row) => (
+                  <div
+                    key={row.emotion}
+                    style={{ ...styles.topUserItem, cursor: "pointer" }}
+                    onClick={() => onExplore(buildDominantEmotionSpec(row.emotion))}
+                  >
+                    <div style={styles.topUserName}>
+                      {formatEmotion(row.emotion)}
+                    </div>
+                    <div style={styles.topUserMeta}>{row.score.toFixed(3)}</div>
+                  </div>
+                ))}
            </div>
-            <div style={{ display: "flex", justifyContent: "space-between", alignItems: "center", marginTop: 10, fontSize: 13, color: "#6b7280" }}>
-              <span>Confidence</span>
-              <span style={{ fontWeight: 700, color: "#111827" }}>{topic.value.toFixed(3)}</span>
+          )}
+        </div>
+
+        <div style={{ ...styles.card, gridColumn: "span 4" }}>
+          <h2 style={styles.sectionTitle}>Mood Split</h2>
+          <p style={styles.sectionSubtitle}>
+            How often each emotion is dominant.
+          </p>
+          {!dominantEmotionDistribution.length ? (
+            <div style={styles.topUserMeta}>
+              No dominant-emotion split available.
            </div>
-            <div style={{ display: "flex", justifyContent: "space-between", alignItems: "center", marginTop: 4, fontSize: 13, color: "#6b7280" }}>
-              <span>Sample Size</span>
-              <span style={{ fontWeight: 700, color: "#111827" }}>{topic.count} events</span>
+          ) : (
+            <div
+              style={{
+                ...styles.topUsersList,
+                maxHeight: 260,
+                overflowY: "auto",
+              }}
+            >
+              {[...dominantEmotionDistribution]
+                .sort((a, b) => b.ratio - a.ratio)
+                .map((row) => (
+                  <div
+                    key={row.emotion}
+                    style={{ ...styles.topUserItem, cursor: "pointer" }}
+                    onClick={() => onExplore(buildDominantEmotionSpec(row.emotion))}
+                  >
+                    <div style={styles.topUserName}>
+                      {formatEmotion(row.emotion)}
+                    </div>
+                    <div style={styles.topUserMeta}>
+                      {(row.ratio * 100).toFixed(1)}% •{" "}
+                      {row.count.toLocaleString()} events
+                    </div>
+                  </div>
+                ))}
            </div>
+          )}
+        </div>
+
+        <div style={{ ...styles.card, gridColumn: "span 4" }}>
+          <h2 style={styles.sectionTitle}>Mood by Source</h2>
+          <p style={styles.sectionSubtitle}>Leading emotion in each source.</p>
+          {!emotionBySource.length ? (
+            <div style={styles.topUserMeta}>
+              No source emotion profile available.
+            </div>
+          ) : (
+            <div
+              style={{
+                ...styles.topUsersList,
+                maxHeight: 260,
+                overflowY: "auto",
+              }}
+            >
+              {[...emotionBySource]
+                .sort((a, b) => b.event_count - a.event_count)
+                .map((row) => (
+                  <div
+                    key={row.source}
+                    style={{ ...styles.topUserItem, cursor: "pointer" }}
+                    onClick={() => onExplore(buildSourceSpec(row.source))}
+                  >
+                    <div style={styles.topUserName}>{row.source}</div>
+                    <div style={styles.topUserMeta}>
+                      {formatEmotion(row.dominant_emotion)} •{" "}
+                      {row.dominant_score.toFixed(3)} •{" "}
+                      {row.event_count.toLocaleString()} events
+                    </div>
+                  </div>
+                ))}
+            </div>
+          )}
+        </div>
+
+        <div style={{ ...styles.card, gridColumn: "span 12" }}>
+          <h2 style={styles.sectionTitle}>Topic Snapshots</h2>
+          <p style={styles.sectionSubtitle}>
+            Per-topic mood with strength and post count.
+          </p>
+          <div style={{ ...styles.grid, marginTop: 10 }}>
+            {strongestPerTopic.map((topic) => (
+              <div
+                key={topic.topic}
+                style={{ ...styles.cardBase, gridColumn: "span 4", cursor: "pointer" }}
+                onClick={() => onExplore(buildTopicSpec(topic.topic))}
+              >
+                <h3 style={{ ...styles.sectionTitle, marginBottom: 6 }}>
+                  {topic.topic}
+                </h3>
+                <div style={styles.emotionalTopicLabel}>Likely Mood</div>
+                <div style={styles.emotionalTopicValue}>
+                  {formatEmotion(topic.emotion)}
+                </div>
+                <div style={styles.emotionalMetricRow}>
+                  <span>Strength</span>
+                  <span style={styles.emotionalMetricValue}>
+                    {topic.value.toFixed(3)}
+                  </span>
+                </div>
+                <div style={styles.emotionalMetricRowCompact}>
+                  <span>Posts in Topic</span>
+                  <span style={styles.emotionalMetricValue}>{topic.count}</span>
+                </div>
+              </div>
+            ))}
          </div>
-        ))}
+        </div>
      </div>
    </div>
  );
-}
+};

 export default EmotionalStats;
--- a/frontend/src/components/InteractionalStats.tsx
+++ b/frontend/src/components/InteractionalStats.tsx
@@ -0,0 +1,262 @@
+import Card from "./Card";
+import StatsStyling from "../styles/stats_styling";
+import type { InteractionAnalysisResponse } from "../types/ApiTypes";
+import {
+  ResponsiveContainer,
+  BarChart,
+  Bar,
+  XAxis,
+  YAxis,
+  CartesianGrid,
+  Tooltip,
+  PieChart,
+  Pie,
+  Cell,
+  Legend,
+} from "recharts";
+
+const styles = StatsStyling;
+
+type InteractionalStatsProps = {
+  data: InteractionAnalysisResponse;
+};
+
+const InteractionalStats = ({ data }: InteractionalStatsProps) => {
+  const graph = data.interaction_graph ?? {};
+  const userCount = Object.keys(graph).length;
+  let edgeCount = 0;
+  let interactionVolume = 0;
+  for (const targets of Object.values(graph)) {
+    for (const value of Object.values(targets)) {
+      edgeCount += 1;
+      interactionVolume += value;
+    }
+  }
+  const concentration = data.conversation_concentration;
+  const topTenCommentShare =
+    typeof concentration?.top_10pct_comment_share === "number"
+      ? concentration?.top_10pct_comment_share
+      : null;
+  const topTenAuthorCount =
+    typeof concentration?.top_10pct_author_count === "number"
+      ? concentration.top_10pct_author_count
+      : null;
+  const totalCommentingAuthors =
+    typeof concentration?.total_commenting_authors === "number"
+      ? concentration.total_commenting_authors
+      : null;
+  const singleCommentAuthorRatio =
+    typeof concentration?.single_comment_author_ratio === "number"
+      ? concentration.single_comment_author_ratio
+      : null;
+  const singleCommentAuthors =
+    typeof concentration?.single_comment_authors === "number"
+      ? concentration.single_comment_authors
+      : null;
+
+  const topPairs = (data.top_interaction_pairs ?? [])
+    .filter((item): item is [[string, string], number] => {
+      if (!Array.isArray(item) || item.length !== 2) {
+        return false;
+      }
+
+      const pair = item[0];
+      const count = item[1];
+
+      return (
+        Array.isArray(pair) &&
+        pair.length === 2 &&
+        typeof pair[0] === "string" &&
+        typeof pair[1] === "string" &&
+        typeof count === "number"
+      );
+    })
+    .slice(0, 20);
+
+  const topPairChartData = topPairs
+    .slice(0, 8)
+    .map(([[source, target], value], index) => ({
+      pair: `${source} -> ${target}`,
+      replies: value,
+      rank: index + 1,
+    }));
+
+  const topTenSharePercent =
+    topTenCommentShare === null ? null : topTenCommentShare * 100;
+  const nonTopTenSharePercent =
+    topTenSharePercent === null ? null : Math.max(0, 100 - topTenSharePercent);
+
+  let concentrationPieData: { name: string; value: number }[] = [];
+  if (topTenSharePercent !== null && nonTopTenSharePercent !== null) {
+    concentrationPieData = [
+      { name: "Top 10% authors", value: topTenSharePercent },
+      { name: "Other authors", value: nonTopTenSharePercent },
+    ];
+  }
+
+  const PIE_COLORS = ["#2b6777", "#c8d8e4"];
+
+  return (
+    <div style={styles.page}>
+      <div style={{ ...styles.container, ...styles.grid }}>
+        <div style={{ ...styles.card, gridColumn: "span 12" }}>
+          <h2 style={styles.sectionTitle}>Conversation Overview</h2>
+          <p style={styles.sectionSubtitle}>
+            Who talks to who, how much they interact, and how concentrated the replies are.
+          </p>
+        </div>
+
+        <Card
+          label="Users in Network"
+          value={userCount.toLocaleString()}
+          sublabel="Users in the reply graph"
+          style={{ gridColumn: "span 4" }}
+        />
+        <Card
+          label="User-to-User Links"
+          value={edgeCount.toLocaleString()}
+          sublabel="Unique reply directions"
+          style={{ gridColumn: "span 4" }}
+        />
+        <Card
+          label="Total Replies"
+          value={interactionVolume.toLocaleString()}
+          sublabel="All reply links combined"
+          style={{ gridColumn: "span 4" }}
+        />
+        <Card
+          label="Concentrated Replies"
+          value={
+            topTenSharePercent === null
+              ? "-"
+              : `${topTenSharePercent.toFixed(1)}%`
+          }
+          sublabel={
+            topTenAuthorCount === null || totalCommentingAuthors === null
+              ? "Reply share from the top 10% commenters"
+              : `${topTenAuthorCount.toLocaleString()} of ${totalCommentingAuthors.toLocaleString()} authors`
+          }
+          style={{ gridColumn: "span 6" }}
+        />
+        <Card
+          label="Single-Comment Authors"
+          value={
+            singleCommentAuthorRatio === null
+              ? "-"
+              : `${(singleCommentAuthorRatio * 100).toFixed(1)}%`
+          }
+          sublabel={
+            singleCommentAuthors === null
+              ? "Authors who commented exactly once"
+              : `${singleCommentAuthors.toLocaleString()} authors commented exactly once`
+          }
+          style={{ gridColumn: "span 6" }}
+        />
+
+        <div style={{ ...styles.card, gridColumn: "span 12" }}>
+          <h2 style={styles.sectionTitle}>Conversation Visuals</h2>
+          <p style={styles.sectionSubtitle}>
+            Main reply links and concentration split.
+          </p>
+
+          <div style={{ ...styles.grid, marginTop: 12 }}>
+            <div style={{ ...styles.cardBase, gridColumn: "span 6" }}>
+              <h3 style={{ ...styles.sectionTitle, fontSize: "1rem" }}>
+                Top Interaction Pairs
+              </h3>
+              <div style={{ width: "100%", height: 300 }}>
+                <ResponsiveContainer>
+                  <BarChart
+                    data={topPairChartData}
+                    layout="vertical"
+                    margin={{ top: 8, right: 16, left: 16, bottom: 8 }}
+                  >
+                    <CartesianGrid strokeDasharray="3 3" stroke="#d9e2ec" />
+                    <XAxis type="number" allowDecimals={false} />
+                    <YAxis
+                      type="category"
+                      dataKey="rank"
+                      tickFormatter={(value) => `#${value}`}
+                      width={36}
+                    />
+                    <Tooltip />
+                    <Bar
+                      dataKey="replies"
+                      fill="#2b6777"
+                      radius={[0, 6, 6, 0]}
+                    />
+                  </BarChart>
+                </ResponsiveContainer>
+              </div>
+            </div>
+
+            <div style={{ ...styles.cardBase, gridColumn: "span 6" }}>
+              <h3 style={{ ...styles.sectionTitle, fontSize: "1rem" }}>
+                Top 10% vs Other Comment Share
+              </h3>
+              <div style={{ width: "100%", height: 300 }}>
+                <ResponsiveContainer>
+                  <PieChart>
+                    <Pie
+                      data={concentrationPieData}
+                      dataKey="value"
+                      nameKey="name"
+                      innerRadius={56}
+                      outerRadius={88}
+                      paddingAngle={2}
+                    >
+                      {concentrationPieData.map((entry, index) => (
+                        <Cell
+                          key={`${entry.name}-${index}`}
+                          fill={PIE_COLORS[index % PIE_COLORS.length]}
+                        />
+                      ))}
+                    </Pie>
+                    <Tooltip />
+                    <Legend verticalAlign="bottom" height={36} />
+                  </PieChart>
+                </ResponsiveContainer>
+              </div>
+            </div>
+          </div>
+        </div>
+
+        <div style={{ ...styles.card, gridColumn: "span 12" }}>
+          <h2 style={styles.sectionTitle}>Frequent Reply Paths</h2>
+          <p style={styles.sectionSubtitle}>
+            Most common user-to-user reply paths.
+          </p>
+          {!topPairs.length ? (
+            <div style={styles.topUserMeta}>
+              No interaction pair data available.
+            </div>
+          ) : (
+            <div
+              style={{
+                ...styles.topUsersList,
+                maxHeight: 420,
+                overflowY: "auto",
+              }}
+            >
+              {topPairs.map(([[source, target], value], index) => (
+                <div
+                  key={`${source}->${target}-${index}`}
+                  style={styles.topUserItem}
+                >
+                  <div style={styles.topUserName}>
+                    {source} -&gt; {target}
+                  </div>
+                  <div style={styles.topUserMeta}>
+                    {value.toLocaleString()} replies
+                  </div>
+                </div>
+              ))}
+            </div>
+          )}
+        </div>
+      </div>
+    </div>
+  );
+};
+
+export default InteractionalStats;
--- a/frontend/src/components/LinguisticStats.tsx
+++ b/frontend/src/components/LinguisticStats.tsx
@@ -0,0 +1,137 @@
+import Card from "./Card";
+import StatsStyling from "../styles/stats_styling";
+import type { LinguisticAnalysisResponse } from "../types/ApiTypes";
+import {
+  buildNgramSpec,
+  buildWordSpec,
+  type CorpusExplorerSpec,
+} from "../utils/corpusExplorer";
+
+const styles = StatsStyling;
+
+type LinguisticStatsProps = {
+  data: LinguisticAnalysisResponse;
+  onExplore: (spec: CorpusExplorerSpec) => void;
+};
+
+const LinguisticStats = ({ data, onExplore }: LinguisticStatsProps) => {
+  const lexical = data.lexical_diversity;
+  const words = data.word_frequencies ?? [];
+  const bigrams = data.common_two_phrases ?? [];
+  const trigrams = data.common_three_phrases ?? [];
+
+  const topWords = words.slice(0, 20);
+  const topBigrams = bigrams.slice(0, 10);
+  const topTrigrams = trigrams.slice(0, 10);
+
+  return (
+    <div style={styles.page}>
+      <div style={{ ...styles.container, ...styles.grid }}>
+        <div style={{ ...styles.card, gridColumn: "span 12" }}>
+          <h2 style={styles.sectionTitle}>Language Overview</h2>
+          <p style={styles.sectionSubtitle}>
+            Quick read on how broad and repetitive the wording is.
+          </p>
+        </div>
+
+        <Card
+          label="Total Words"
+          value={lexical?.total_tokens?.toLocaleString() ?? "—"}
+          sublabel="Words after basic filtering"
+          style={{ gridColumn: "span 4" }}
+        />
+        <Card
+          label="Unique Words"
+          value={lexical?.unique_tokens?.toLocaleString() ?? "—"}
+          sublabel="Different words used"
+          style={{ gridColumn: "span 4" }}
+        />
+        <Card
+          label="Vocabulary Variety"
+          value={
+            typeof lexical?.ttr === "number" ? lexical.ttr.toFixed(4) : "—"
+          }
+          sublabel="Higher means less repetition"
+          style={{ gridColumn: "span 4" }}
+        />
+
+        <div style={{ ...styles.card, gridColumn: "span 4" }}>
+          <h2 style={styles.sectionTitle}>Top Words</h2>
+          <p style={styles.sectionSubtitle}>Most used single words.</p>
+          <div
+            style={{
+              ...styles.topUsersList,
+              maxHeight: 360,
+              overflowY: "auto",
+            }}
+          >
+            {topWords.map((item) => (
+              <div
+                key={item.word}
+                style={{ ...styles.topUserItem, cursor: "pointer" }}
+                onClick={() => onExplore(buildWordSpec(item.word))}
+              >
+                <div style={styles.topUserName}>{item.word}</div>
+                <div style={styles.topUserMeta}>
+                  {item.count.toLocaleString()} uses
+                </div>
+              </div>
+            ))}
+          </div>
+        </div>
+
+        <div style={{ ...styles.card, gridColumn: "span 4" }}>
+          <h2 style={styles.sectionTitle}>Top Bigrams</h2>
+          <p style={styles.sectionSubtitle}>Most used 2-word phrases.</p>
+          <div
+            style={{
+              ...styles.topUsersList,
+              maxHeight: 360,
+              overflowY: "auto",
+            }}
+          >
+            {topBigrams.map((item) => (
+              <div
+                key={item.ngram}
+                style={{ ...styles.topUserItem, cursor: "pointer" }}
+                onClick={() => onExplore(buildNgramSpec(item.ngram))}
+              >
+                <div style={styles.topUserName}>{item.ngram}</div>
+                <div style={styles.topUserMeta}>
+                  {item.count.toLocaleString()} uses
+                </div>
+              </div>
+            ))}
+          </div>
+        </div>
+
+        <div style={{ ...styles.card, gridColumn: "span 4" }}>
+          <h2 style={styles.sectionTitle}>Top Trigrams</h2>
+          <p style={styles.sectionSubtitle}>Most used 3-word phrases.</p>
+          <div
+            style={{
+              ...styles.topUsersList,
+              maxHeight: 360,
+              overflowY: "auto",
+            }}
+          >
+            {topTrigrams.map((item) => (
+              <div
+                key={item.ngram}
+                style={{ ...styles.topUserItem, cursor: "pointer" }}
+                onClick={() => onExplore(buildNgramSpec(item.ngram))}
+              >
+                <div style={styles.topUserName}>{item.ngram}</div>
+                <div style={styles.topUserMeta}>
+                  {item.count.toLocaleString()} uses
+                </div>
+              </div>
+            ))}
+          </div>
+        </div>
+      </div>
+    </div>
+  );
+};
+
+export default LinguisticStats;
--- a/frontend/src/components/SummaryStats.tsx
+++ b/frontend/src/components/SummaryStats.tsx
@@ -1,4 +1,4 @@
-import { useState } from "react";
+import { memo, useMemo } from "react";
 import {
  LineChart,
  Line,
@@ -6,32 +6,55 @@ import {
  YAxis,
  Tooltip,
  CartesianGrid,
-  ResponsiveContainer
+  ResponsiveContainer,
 } from "recharts";

 import ActivityHeatmap from "../stats/ActivityHeatmap";
-import { ReactWordcloud } from '@cp949/react-wordcloud';
+import { ReactWordcloud } from "@cp949/react-wordcloud";
 import StatsStyling from "../styles/stats_styling";
 import Card from "../components/Card";
-import UserModal from "../components/UserModal";

-import { 
-  type SummaryResponse, 
-  type FrequencyWord, 
-  type UserAnalysisResponse, 
+import {
+  type SummaryResponse,
+  type FrequencyWord,
+  type UserEndpointResponse,
  type TimeAnalysisResponse,
-  type ContentAnalysisResponse,
-  type User
-} from '../types/ApiTypes'
+  type LinguisticAnalysisResponse,
+} from "../types/ApiTypes";
+import {
+  buildAllRecordsSpec,
+  buildDateBucketSpec,
+  buildOneTimeUsersSpec,
+  buildUserSpec,
+  type CorpusExplorerSpec,
+} from "../utils/corpusExplorer";

 const styles = StatsStyling;
+const MAX_WORDCLOUD_WORDS = 250;
+const exploreButtonStyle = { padding: "4px 8px", fontSize: 12 };
+
+const WORDCLOUD_OPTIONS = {
+  rotations: 2,
+  rotationAngles: [0, 90] as [number, number],
+  fontSizes: [14, 60] as [number, number],
+  enableTooltip: true,
+};

 type SummaryStatsProps = {
-    userData: UserAnalysisResponse | null;
-    timeData: TimeAnalysisResponse | null;
-    contentData: ContentAnalysisResponse | null;
-    summary: SummaryResponse | null;
-}
+  userData: UserEndpointResponse | null;
+  timeData: TimeAnalysisResponse | null;
+  linguisticData: LinguisticAnalysisResponse | null;
+  summary: SummaryResponse | null;
+  onExplore: (spec: CorpusExplorerSpec) => void;
+};
+
+type WordCloudPanelProps = {
+  words: { text: string; value: number }[];
+};
+
+const WordCloudPanel = memo(({ words }: WordCloudPanelProps) => (
+  <ReactWordcloud words={words} options={WORDCLOUD_OPTIONS} />
+));

 function formatDateRange(startUnix: number, endUnix: number) {
  const start = new Date(startUnix * 1000);
@@ -44,174 +67,188 @@ function formatDateRange(startUnix: number, endUnix: number) {
      day: "2-digit",
    });

-  return `${fmt(start)} → ${fmt(end)}`;
+  return `${fmt(start)} -> ${fmt(end)}`;
 }

 function convertFrequencyData(data: FrequencyWord[]) {
-    return data.map((d: FrequencyWord) => ({
-        text: d.word,
-        value: d.count,
-      }))
+  return data.map((d: FrequencyWord) => ({
+    text: d.word,
+    value: d.count,
+  }));
 }

-const SummaryStats = ({userData, timeData, contentData, summary}: SummaryStatsProps) => {
-    const [selectedUser, setSelectedUser] = useState<string | null>(null);
-    const selectedUserData: User | null = userData?.users.find((u) => u.author === selectedUser) ?? null;
+const renderExploreButton = (onClick: () => void) => (
+  <button
+    onClick={onClick}
+    style={{ ...styles.buttonSecondary, ...exploreButtonStyle }}
+  >
+    Explore
+  </button>
+);

-    console.log(summary)
+const SummaryStats = ({
+  userData,
+  timeData,
+  linguisticData,
+  summary,
+  onExplore,
+}: SummaryStatsProps) => {
+  const wordCloudWords = useMemo(
+    () =>
+      convertFrequencyData(
+        (linguisticData?.word_frequencies ?? []).slice(0, MAX_WORDCLOUD_WORDS),
+      ),
+    [linguisticData?.word_frequencies],
+  );

-    return (
+  const topUsersPreview = useMemo(
+    () => (userData?.top_users ?? []).slice(0, 100),
+    [userData?.top_users],
+  );
+
+  return (
    <div style={styles.page}>
+      <div style={{ ...styles.container, ...styles.grid }}>
+        <Card
+          label="Total Activity"
+          value={summary?.total_events ?? "-"}
+          sublabel="Posts + comments"
+          rightSlot={renderExploreButton(() => onExplore(buildAllRecordsSpec()))}
+          style={{ gridColumn: "span 4" }}
+        />
+        <Card
+          label="Active People"
+          value={summary?.unique_users ?? "-"}
+          sublabel="Distinct users"
+          rightSlot={renderExploreButton(() => onExplore(buildAllRecordsSpec()))}
+          style={{ gridColumn: "span 4" }}
+        />
+        <Card
+          label="Posts vs Comments"
+          value={
+            summary ? `${summary.total_posts} / ${summary.total_comments}` : "-"
+          }
+          sublabel={`Comments per post: ${summary?.comments_per_post ?? "-"}`}
+          rightSlot={renderExploreButton(() => onExplore(buildAllRecordsSpec()))}
+          style={{ gridColumn: "span 4" }}
+        />

-        {/* main grid*/}
-        <div style={{ ...styles.container, ...styles.grid}}>
-            <Card
-            label="Total Events"
-            value={summary?.total_events ?? "—"}
-            sublabel="Posts + comments"
-            style={{
-                gridColumn: "span 4"
-            }}
-            />
-            <Card
-            label="Unique Users"
-            value={summary?.unique_users ?? "—"}
-            sublabel="Distinct authors"
-            style={{
-                gridColumn: "span 4"
-            }}
-            />
-            <Card
-            label="Posts / Comments"
-            value={
-                summary
-                ? `${summary.total_posts} / ${summary.total_comments}`
-                : "—"
-            }
-            sublabel={`Comments per post: ${summary?.comments_per_post ?? "—"}`}
-            style={{
-                gridColumn: "span 4"
-            }}
-            />
+        <Card
+          label="Time Range"
+          value={
+            summary?.time_range
+              ? formatDateRange(summary.time_range.start, summary.time_range.end)
+              : "-"
+          }
+          sublabel="Based on dataset timestamps"
+          rightSlot={renderExploreButton(() => onExplore(buildAllRecordsSpec()))}
+          style={{ gridColumn: "span 4" }}
+        />

-            <Card
-            label="Time Range"
-            value={
-                summary?.time_range
-                ? formatDateRange(summary.time_range.start, summary.time_range.end)
-                : "—"
-            }
-            sublabel="Based on dataset timestamps"
-            style={{
-                gridColumn: "span 4"
-            }}
-            />
+        <Card
+          label="One-Time Users"
+          value={
+            typeof summary?.lurker_ratio === "number"
+              ? `${Math.round(summary.lurker_ratio * 100)}%`
+              : "-"
+          }
+          sublabel="Users with only one event"
+          rightSlot={renderExploreButton(() => onExplore(buildOneTimeUsersSpec()))}
+          style={{ gridColumn: "span 4" }}
+        />

-            <Card
-            label="Lurker Ratio"
-            value={
-                typeof summary?.lurker_ratio === "number"
-                ? `${Math.round(summary.lurker_ratio * 100)}%`
-                : "—"
-            }
-            sublabel="Users with only 1 event"
-            style={{
-                gridColumn: "span 4"
-            }}
-            />
+        <Card
+          label="Sources"
+          value={summary?.sources?.length ?? "-"}
+          sublabel={
+            summary?.sources?.length
+              ? summary.sources.slice(0, 3).join(", ") +
+                (summary.sources.length > 3 ? "..." : "")
+              : "-"
+          }
+          rightSlot={renderExploreButton(() => onExplore(buildAllRecordsSpec()))}
+          style={{ gridColumn: "span 4" }}
+        />

-            <Card
-            label="Sources"
-            value={summary?.sources?.length ?? "—"}
-            sublabel={
-                summary?.sources?.length
-                ? summary.sources.slice(0, 3).join(", ") +
-                    (summary.sources.length > 3 ? "…" : "")
-                : "—"
-            }
-            style={{
-                gridColumn: "span 4"
-            }}
-            />
-
-        {/* events per day */}
        <div style={{ ...styles.card, gridColumn: "span 5" }}>
-            <h2 style={styles.sectionTitle}>Events per Day</h2>
-            <p style={styles.sectionSubtitle}>Trend of activity over time</p>
+          <h2 style={styles.sectionTitle}>Activity Over Time</h2>
+          <p style={styles.sectionSubtitle}>How much posting happened each day.</p>

-            <div style={styles.chartWrapper}>
+          <div style={styles.chartWrapper}>
            <ResponsiveContainer width="100%" height="100%">
-                <LineChart data={timeData?.events_per_day.filter((d) => new Date(d.date) >= new Date('2026-01-10'))}>
+              <LineChart
+                data={timeData?.events_per_day ?? []}
+                onClick={(state: unknown) => {
+                  const payload = (state as { activePayload?: Array<{ payload?: { date?: string } }> })
+                    ?.activePayload?.[0]?.payload as
+                    | { date?: string }
+                    | undefined;
+                  if (payload?.date) {
+                    onExplore(buildDateBucketSpec(String(payload.date)));
+                  }
+                }}
+              >
                <CartesianGrid strokeDasharray="3 3" />
                <XAxis dataKey="date" />
                <YAxis />
                <Tooltip />
-                <Line type="monotone" dataKey="count" name="Events" />
-                </LineChart>
+                <Line
+                  type="monotone"
+                  dataKey="count"
+                  name="Events"
+                  isAnimationActive={false}
+                />
+              </LineChart>
            </ResponsiveContainer>
-            </div>
+          </div>
        </div>

-        {/* Word Cloud */}
        <div style={{ ...styles.card, gridColumn: "span 4" }}>
-            <h2 style={styles.sectionTitle}>Word Cloud</h2>
-            <p style={styles.sectionSubtitle}>Most common terms across events</p>
+          <h2 style={styles.sectionTitle}>Common Words</h2>
+          <p style={styles.sectionSubtitle}>
+            Frequently used words across the dataset.
+          </p>

-            <div style={styles.chartWrapper}>
-            <ReactWordcloud
-                words={convertFrequencyData(contentData?.word_frequencies ?? [])}
-                options={{
-                rotations: 2,
-                rotationAngles: [0, 90],
-                fontSizes: [14, 60],
-                enableTooltip: true,
-                }}
-            />
-            </div>
+          <div style={styles.chartWrapper}>
+            <WordCloudPanel words={wordCloudWords} />
+          </div>
        </div>

-        {/* Top Users */}
-        <div style={{...styles.card, ...styles.scrollArea, gridColumn: "span 3",
-        }}
+        <div
+          style={{ ...styles.card, ...styles.scrollArea, gridColumn: "span 3" }}
        >
-            <h2 style={styles.sectionTitle}>Top Users</h2>
-            <p style={styles.sectionSubtitle}>Most active authors</p>
+          <h2 style={styles.sectionTitle}>Most Active Users</h2>
+          <p style={styles.sectionSubtitle}>Who posted the most events.</p>

-            <div style={styles.topUsersList}>
-            {userData?.top_users.slice(0, 100).map((item) => (
-                <div
+          <div style={styles.topUsersList}>
+            {topUsersPreview.map((item) => (
+              <div
                key={`${item.author}-${item.source}`}
                style={{ ...styles.topUserItem, cursor: "pointer" }}
-                onClick={() => setSelectedUser(item.author)}
-                >
+                onClick={() => onExplore(buildUserSpec(item.author))}
+              >
                <div style={styles.topUserName}>{item.author}</div>
                <div style={styles.topUserMeta}>
-                    {item.source} • {item.count} events
-                </div>
+                  {item.source} • {item.count} events
                </div>
+              </div>
            ))}
-            </div>
+          </div>
        </div>

-        {/* Heatmap */}
        <div style={{ ...styles.card, gridColumn: "span 12" }}>
-            <h2 style={styles.sectionTitle}>Heatmap</h2>
-            <p style={styles.sectionSubtitle}>Activity density across time</p>
+          <h2 style={styles.sectionTitle}>Weekly Activity Pattern</h2>
+          <p style={styles.sectionSubtitle}>
+            When activity tends to happen by weekday and hour.
+          </p>

-            <div style={styles.heatmapWrapper}>
+          <div style={styles.heatmapWrapper}>
            <ActivityHeatmap data={timeData?.weekday_hour_heatmap ?? []} />
-            </div>
+          </div>
        </div>
-        </div>
-
-        <UserModal
-        open={!!selectedUser}
-        onClose={() => setSelectedUser(null)}
-        username={selectedUser ?? ""}
-        userData={selectedUserData}
-        />
+      </div>
    </div>
-    );
-}
+  );
+};

-export default SummaryStats;
+export default SummaryStats;
--- a/frontend/src/components/UserModal.tsx
+++ b/frontend/src/components/UserModal.tsx
@@ -11,28 +11,22 @@ type Props = {
  username: string;
 };

-export default function UserModal({ open, onClose, userData, username }: Props) {
-  return (
-    <Dialog open={open} onClose={onClose} style={{ position: "relative", zIndex: 50 }}>
-      <div
-        style={{
-          position: "fixed",
-          inset: 0,
-          background: "rgba(0,0,0,0.45)",
-        }}
-      />
+export default function UserModal({
+  open,
+  onClose,
+  userData,
+  username,
+}: Props) {
+  const dominantEmotionEntry = Object.entries(
+    userData?.avg_emotions ?? {},
+  ).sort((a, b) => b[1] - a[1])[0];

-      <div
-        style={{
-          position: "fixed",
-          inset: 0,
-          display: "flex",
-          alignItems: "center",
-          justifyContent: "center",
-          padding: 16,
-        }}
-      >
-        <DialogPanel style={{ ...styles.card, width: "min(520px, 95vw)" }}>
+  return (
+    <Dialog open={open} onClose={onClose} style={styles.modalRoot}>
+      <div style={styles.modalBackdrop} />
+
+      <div style={styles.modalContainer}>
+        <DialogPanel style={{ ...styles.card, ...styles.modalPanel }}>
          <div style={styles.headerBar}>
            <div>
              <DialogTitle style={styles.sectionTitle}>{username}</DialogTitle>
@@ -48,7 +42,9 @@ export default function UserModal({ open, onClose, userData, username }: Props)
            <p style={styles.sectionSubtitle}>No data for this user.</p>
          ) : (
            <div style={styles.topUsersList}>
-              <div style={{...styles.topUserName, fontSize: 20}}>{userData.author}</div>
+              <div style={{ ...styles.topUserName, fontSize: 20 }}>
+                {userData.author}
+              </div>
              <div style={styles.topUserItem}>
                <div style={styles.topUserName}>Posts</div>
                <div style={styles.topUserMeta}>{userData.post}</div>
@@ -77,7 +73,27 @@ export default function UserModal({ open, onClose, userData, username }: Props)
                <div style={styles.topUserItem}>
                  <div style={styles.topUserName}>Vocab Richness</div>
                  <div style={styles.topUserMeta}>
-                    {userData.vocab.vocab_richness} (avg {userData.vocab.avg_words_per_event} words/event)
+                    {userData.vocab.vocab_richness} (avg{" "}
+                    {userData.vocab.avg_words_per_event} words/event)
+                  </div>
+                </div>
+              ) : null}
+
+              {dominantEmotionEntry ? (
+                <div style={styles.topUserItem}>
+                  <div style={styles.topUserName}>Dominant Avg Emotion</div>
+                  <div style={styles.topUserMeta}>
+                    {dominantEmotionEntry[0].replace("emotion_", "")} (
+                    {dominantEmotionEntry[1].toFixed(3)})
+                  </div>
+                </div>
+              ) : null}
+
+              {userData.dominant_topic ? (
+                <div style={styles.topUserItem}>
+                  <div style={styles.topUserName}>Most Common Topic</div>
+                  <div style={styles.topUserMeta}>
+                    {userData.dominant_topic.topic} ({userData.dominant_topic.count} events)
                  </div>
                </div>
              ) : null}
--- a/frontend/src/components/UserStats.tsx
+++ b/frontend/src/components/UserStats.tsx
@@ -1,61 +1,230 @@
+import { useEffect, useMemo, useRef, useState } from "react";
 import ForceGraph3D from "react-force-graph-3d";

-import {
-    type UserAnalysisResponse,
-    type InteractionGraph
-} from '../types/ApiTypes';
+import { type TopUser, type InteractionGraph } from "../types/ApiTypes";

 import StatsStyling from "../styles/stats_styling";
+import Card from "./Card";
+import {
+  buildReplyPairSpec,
+  toText,
+  buildUserSpec,
+  type CorpusExplorerSpec,
+} from "../utils/corpusExplorer";

 const styles = StatsStyling;

-function ApiToGraphData(apiData: InteractionGraph) {
-    const nodes = Object.keys(apiData).map(username => ({ id: username }));
-    const links = [];
-    
-    for (const [source, targets] of Object.entries(apiData)) {
-        for (const [target, count] of Object.entries(targets)) {
-            links.push({ source, target, value: count });
-        }
+type GraphLink = {
+  source: string;
+  target: string;
+  value: number;
+};
+
+function toGraphData(apiData: InteractionGraph) {
+  const links: GraphLink[] = [];
+  const connectedNodeIds = new Set<string>();
+
+  for (const [source, targets] of Object.entries(apiData)) {
+    for (const [target, count] of Object.entries(targets)) {
+      if (count < 2 || source === "[deleted]" || target === "[deleted]") {
+        continue;
+      }
+      links.push({ source, target, value: count });
+      connectedNodeIds.add(source);
+      connectedNodeIds.add(target);
    }
-    
-    // drop low-value and deleted interactions to reduce clutter
-    const filteredLinks = links.filter(link => 
-        link.value >= 2 && 
-        link.source !== "[deleted]" && 
-        link.target !== "[deleted]"
-    );
+  }

-    // also filter out nodes that are no longer connected after link filtering
-    const connectedNodeIds = new Set(filteredLinks.flatMap(link => [link.source, link.target]));
-    const filteredNodes = nodes.filter(node => connectedNodeIds.has(node.id));
+  const filteredNodes = Array.from(connectedNodeIds, (id) => ({ id }));

-    return { nodes: filteredNodes, links: filteredLinks};
+  return { nodes: filteredNodes, links };
 }

+type UserStatsProps = {
+  topUsers: TopUser[];
+  interactionGraph: InteractionGraph;
+  totalUsers: number;
+  mostCommentHeavyUser: { author: string; commentShare: number } | null;
+  onExplore: (spec: CorpusExplorerSpec) => void;
+};

-const UserStats = (props: { data: UserAnalysisResponse }) => {
-  const graphData = ApiToGraphData(props.data.interaction_graph);
+const UserStats = ({
+  topUsers,
+  interactionGraph,
+  totalUsers,
+  mostCommentHeavyUser,
+  onExplore,
+}: UserStatsProps) => {
+  const graphData = useMemo(
+    () => toGraphData(interactionGraph),
+    [interactionGraph],
+  );
+  const graphContainerRef = useRef<HTMLDivElement | null>(null);
+  const [graphSize, setGraphSize] = useState({ width: 720, height: 540 });
+
+  useEffect(() => {
+    const updateGraphSize = () => {
+      const containerWidth = graphContainerRef.current?.clientWidth ?? 720;
+      const nextWidth = Math.max(320, Math.floor(containerWidth));
+      const nextHeight = nextWidth < 700 ? 300 : 540;
+      setGraphSize({ width: nextWidth, height: nextHeight });
+    };
+
+    updateGraphSize();
+    window.addEventListener("resize", updateGraphSize);
+
+    return () => window.removeEventListener("resize", updateGraphSize);
+  }, []);
+
+  const connectedUsers = graphData.nodes.length;
+  const totalInteractions = graphData.links.reduce(
+    (sum, link) => sum + link.value,
+    0,
+  );
+  const avgInteractionsPerConnectedUser = connectedUsers
+    ? totalInteractions / connectedUsers
+    : 0;
+
+  const strongestLink = graphData.links.reduce<GraphLink | null>(
+    (best, current) => {
+      if (!best || current.value > best.value) {
+        return current;
+      }
+      return best;
+    },
+    null,
+  );
+
+  const mostActiveUser = topUsers.find((u) => u.author !== "[deleted]");
+  const strongestLinkSource = strongestLink ? toText(strongestLink.source) : "";
+  const strongestLinkTarget = strongestLink ? toText(strongestLink.target) : "";

  return (
    <div style={styles.page}>
-        <h2 style={styles.sectionTitle}>User Interaction Graph</h2>
-        <p style={styles.sectionSubtitle}>
-            This graph visualizes interactions between users based on comments and replies. 
-            Nodes represent users, and edges represent interactions (e.g., comments or replies) between them.
-        </p>
-        <div>
+      <div style={{ ...styles.container, ...styles.grid }}>
+        <Card
+          label="Users"
+          value={totalUsers.toLocaleString()}
+          sublabel={`${connectedUsers.toLocaleString()} users in filtered graph`}
+          style={{ gridColumn: "span 3" }}
+        />
+        <Card
+          label="Replies"
+          value={totalInteractions.toLocaleString()}
+          sublabel="Links with at least 2 replies"
+          style={{ gridColumn: "span 3" }}
+        />
+        <Card
+          label="Replies per Connected User"
+          value={avgInteractionsPerConnectedUser.toFixed(1)}
+          sublabel="Average from visible graph links"
+          style={{ gridColumn: "span 3" }}
+        />
+        <Card
+          label="Most Active User"
+          value={mostActiveUser?.author ?? "-"}
+          sublabel={
+            mostActiveUser
+              ? `${mostActiveUser.count.toLocaleString()} events`
+              : "No user activity found"
+          }
+          rightSlot={
+            mostActiveUser ? (
+              <button
+                onClick={() => onExplore(buildUserSpec(mostActiveUser.author))}
+                style={styles.buttonSecondary}
+              >
+                Explore
+              </button>
+            ) : null
+          }
+          style={{ gridColumn: "span 3" }}
+        />
+
+        <Card
+          label="Strongest User Link"
+          value={
+            strongestLinkSource && strongestLinkTarget
+              ? `${strongestLinkSource} -> ${strongestLinkTarget}`
+              : "-"
+          }
+          sublabel={
+            strongestLink
+              ? `${strongestLink.value.toLocaleString()} replies`
+              : "No graph links after filtering"
+          }
+          rightSlot={
+            strongestLinkSource && strongestLinkTarget ? (
+              <button
+                onClick={() =>
+                  onExplore(buildReplyPairSpec(strongestLinkSource, strongestLinkTarget))
+                }
+                style={styles.buttonSecondary}
+              >
+                Explore
+              </button>
+            ) : null
+          }
+          style={{ gridColumn: "span 6" }}
+        />
+        <Card
+          label="Most Comment-Heavy User"
+          value={mostCommentHeavyUser?.author ?? "-"}
+          sublabel={
+            mostCommentHeavyUser
+              ? `${Math.round(mostCommentHeavyUser.commentShare * 100)}% comments`
+              : "No user distribution available"
+          }
+          rightSlot={
+            mostCommentHeavyUser ? (
+              <button
+                onClick={() => onExplore(buildUserSpec(mostCommentHeavyUser.author))}
+                style={styles.buttonSecondary}
+              >
+                Explore
+              </button>
+            ) : null
+          }
+          style={{ gridColumn: "span 6" }}
+        />
+
+        <div style={{ ...styles.card, gridColumn: "span 12" }}>
+          <h2 style={styles.sectionTitle}>User Interaction Graph</h2>
+          <p style={styles.sectionSubtitle}>
+            Each node is a user, and each link shows replies between them.
+          </p>
+          <div
+            ref={graphContainerRef}
+            style={{ width: "100%", height: graphSize.height }}
+          >
            <ForceGraph3D
-                graphData={graphData}
-                nodeAutoColorBy="id"
-                linkDirectionalParticles={2}
-                linkDirectionalParticleSpeed={0.005}
-                linkWidth={(link) => Math.sqrt(link.value)}
-                nodeLabel={(node) => `${node.id}`}
+              width={graphSize.width}
+              height={graphSize.height}
+              graphData={graphData}
+              nodeAutoColorBy="id"
+              linkDirectionalParticles={1}
+              linkDirectionalParticleSpeed={0.004}
+              linkWidth={(link) => Math.sqrt(Number(link.value))}
+              nodeLabel={(node) => `${node.id}`}
+              onNodeClick={(node) => {
+                const userId = toText(node.id);
+                if (userId) {
+                  onExplore(buildUserSpec(userId));
+                }
+              }}
+              onLinkClick={(link) => {
+                const source = toText(link.source);
+                const target = toText(link.target);
+                if (source && target) {
+                  onExplore(buildReplyPairSpec(source, target));
+                }
+              }}
            />
+          </div>
        </div>
+      </div>
    </div>
  );
-}
+};

-export default UserStats;
+export default UserStats;
--- a/frontend/src/index.css
+++ b/frontend/src/index.css
@@ -1,68 +1,65 @@
 :root {
-  font-family: system-ui, Avenir, Helvetica, Arial, sans-serif;
-  line-height: 1.5;
-  font-weight: 400;
-
-  color-scheme: light dark;
-  color: rgba(255, 255, 255, 0.87);
-  background-color: #242424;
-
+  --bg-default: #f6f8fa;
+  --text-default: #24292f;
+  --border-default: #d0d7de;
+  --focus-ring: rgba(9, 105, 218, 0.22);
  font-synthesis: none;
  text-rendering: optimizeLegibility;
  -webkit-font-smoothing: antialiased;
  -moz-osx-font-smoothing: grayscale;
 }

-a {
-  font-weight: 500;
-  color: #646cff;
-  text-decoration: inherit;
-}
-a:hover {
-  color: #535bf2;
+html,
+body,
+#root {
+  width: 100%;
+  height: 100%;
 }

 body {
  margin: 0;
-  display: flex;
-  place-items: center;
-  min-width: 320px;
-  min-height: 100vh;
+  background: var(--bg-default);
+  color: var(--text-default);
+  font-family: "IBM Plex Sans", "Noto Sans", "Liberation Sans", "Segoe UI", sans-serif;
 }

-h1 {
-  font-size: 3.2em;
-  line-height: 1.1;
+* {
+  box-sizing: border-box;
 }

-button {
-  border-radius: 8px;
-  border: 1px solid transparent;
-  padding: 0.6em 1.2em;
-  font-size: 1em;
-  font-weight: 500;
-  font-family: inherit;
-  background-color: #1a1a1a;
-  cursor: pointer;
-  transition: border-color 0.25s;
-}
-button:hover {
-  border-color: #646cff;
-}
-button:focus,
-button:focus-visible {
-  outline: 4px auto -webkit-focus-ring-color;
+button,
+input,
+select,
+textarea {
+  font: inherit;
 }

-@media (prefers-color-scheme: light) {
-  :root {
-    color: #213547;
-    background-color: #ffffff;
+input:focus,
+button:focus-visible,
+select:focus,
+textarea:focus {
+  border-color: #0969da;
+  box-shadow: 0 0 0 3px var(--focus-ring);
+  outline: none;
+}
+
+@keyframes stats-spin {
+  from {
+    transform: rotate(0deg);
  }
-  a:hover {
-    color: #747bff;
-  }
-  button {
-    background-color: #f9f9f9;
+
+  to {
+    transform: rotate(360deg);
+  }
+}
+
+@keyframes stats-pulse {
+  0%,
+  100% {
+    opacity: 0.5;
+  }
+
+  50% {
+    opacity: 1;
  }
 }
--- a/frontend/src/pages/AutoFetch.tsx
+++ b/frontend/src/pages/AutoFetch.tsx
@@ -0,0 +1,530 @@
+import axios from "axios";
+import { useEffect, useState } from "react";
+import { useNavigate } from "react-router-dom";
+import StatsStyling from "../styles/stats_styling";
+
+const styles = StatsStyling;
+const API_BASE_URL = import.meta.env.VITE_BACKEND_URL;
+
+type SourceOption = {
+  id: string;
+  label: string;
+  search_enabled?: boolean;
+  categories_enabled?: boolean;
+  searchEnabled?: boolean;
+  categoriesEnabled?: boolean;
+};
+
+type SourceConfig = {
+  sourceName: string;
+  limit: string;
+  search: string;
+  category: string;
+};
+
+type TopicMap = Record<string, string>;
+
+const buildEmptySourceConfig = (sourceName = ""): SourceConfig => ({
+  sourceName,
+  limit: "100",
+  search: "",
+  category: "",
+});
+
+const supportsSearch = (source?: SourceOption): boolean =>
+  Boolean(source?.search_enabled ?? source?.searchEnabled);
+
+const supportsCategories = (source?: SourceOption): boolean =>
+  Boolean(source?.categories_enabled ?? source?.categoriesEnabled);
+
+const AutoFetchPage = () => {
+  const navigate = useNavigate();
+  const [datasetName, setDatasetName] = useState("");
+  const [sourceOptions, setSourceOptions] = useState<SourceOption[]>([]);
+  const [sourceConfigs, setSourceConfigs] = useState<SourceConfig[]>([]);
+  const [returnMessage, setReturnMessage] = useState("");
+  const [isLoadingSources, setIsLoadingSources] = useState(true);
+  const [isSubmitting, setIsSubmitting] = useState(false);
+  const [hasError, setHasError] = useState(false);
+  const [useCustomTopics, setUseCustomTopics] = useState(false);
+  const [customTopicsText, setCustomTopicsText] = useState("");
+
+  useEffect(() => {
+    axios
+      .get<SourceOption[]>(`${API_BASE_URL}/datasets/sources`)
+      .then((response) => {
+        const options = response.data || [];
+        setSourceOptions(options);
+        setSourceConfigs([buildEmptySourceConfig(options[0]?.id || "")]);
+      })
+      .catch((requestError: unknown) => {
+        setHasError(true);
+        if (axios.isAxiosError(requestError)) {
+          setReturnMessage(
+            `Failed to load available sources: ${String(
+              requestError.response?.data?.error || requestError.message,
+            )}`,
+          );
+        } else {
+          setReturnMessage("Failed to load available sources.");
+        }
+      })
+      .finally(() => {
+        setIsLoadingSources(false);
+      });
+  }, []);
+
+  const updateSourceConfig = (
+    index: number,
+    field: keyof SourceConfig,
+    value: string,
+  ) => {
+    setSourceConfigs((previous) =>
+      previous.map((config, configIndex) =>
+        configIndex === index
+          ? field === "sourceName"
+            ? { ...config, sourceName: value, search: "", category: "" }
+            : { ...config, [field]: value }
+          : config,
+      ),
+    );
+  };
+
+  const getSourceOption = (sourceName: string) =>
+    sourceOptions.find((option) => option.id === sourceName);
+
+  const addSourceConfig = () => {
+    setSourceConfigs((previous) => [
+      ...previous,
+      buildEmptySourceConfig(sourceOptions[0]?.id || ""),
+    ]);
+  };
+
+  const removeSourceConfig = (index: number) => {
+    setSourceConfigs((previous) =>
+      previous.filter((_, configIndex) => configIndex !== index),
+    );
+  };
+
+  const autoFetch = async () => {
+    const token = localStorage.getItem("access_token");
+    if (!token) {
+      setHasError(true);
+      setReturnMessage("You must be signed in to auto fetch a dataset.");
+      return;
+    }
+
+    const normalizedDatasetName = datasetName.trim();
+    if (!normalizedDatasetName) {
+      setHasError(true);
+      setReturnMessage("Please add a dataset name before continuing.");
+      return;
+    }
+
+    if (sourceConfigs.length === 0) {
+      setHasError(true);
+      setReturnMessage("Please add at least one source.");
+      return;
+    }
+
+    const normalizedSources = sourceConfigs.map((source) => {
+      const sourceOption = getSourceOption(source.sourceName);
+
+      return {
+        name: source.sourceName,
+        limit: Number(source.limit || 100),
+        search: supportsSearch(sourceOption)
+          ? source.search.trim() || undefined
+          : undefined,
+        category: supportsCategories(sourceOption)
+          ? source.category.trim() || undefined
+          : undefined,
+      };
+    });
+
+    const invalidSource = normalizedSources.find(
+      (source) =>
+        !source.name || !Number.isFinite(source.limit) || source.limit <= 0,
+    );
+
+    if (invalidSource) {
+      setHasError(true);
+      setReturnMessage(
+        "Every source needs a name and a limit greater than zero.",
+      );
+      return;
+    }
+
+    let normalizedTopics: TopicMap | undefined;
+
+    if (useCustomTopics) {
+      const customTopicsJson = customTopicsText.trim();
+
+      if (!customTopicsJson) {
+        setHasError(true);
+        setReturnMessage(
+          "Custom topics are enabled, so please provide a JSON topic map.",
+        );
+        return;
+      }
+
+      let parsedTopics: unknown;
+      try {
+        parsedTopics = JSON.parse(customTopicsJson);
+      } catch {
+        setHasError(true);
+        setReturnMessage("Custom topic list must be valid JSON.");
+        return;
+      }
+
+      if (
+        !parsedTopics ||
+        Array.isArray(parsedTopics) ||
+        typeof parsedTopics !== "object"
+      ) {
+        setHasError(true);
+        setReturnMessage(
+          "Custom topic list must be a JSON object: {\"Topic\": \"keywords\"}.",
+        );
+        return;
+      }
+
+      const entries = Object.entries(parsedTopics);
+      if (entries.length === 0) {
+        setHasError(true);
+        setReturnMessage("Custom topic list cannot be empty.");
+        return;
+      }
+
+      const hasInvalidTopic = entries.some(
+        ([topicName, keywords]) =>
+          !topicName.trim() ||
+          typeof keywords !== "string" ||
+          !keywords.trim(),
+      );
+
+      if (hasInvalidTopic) {
+        setHasError(true);
+        setReturnMessage(
+          "Every custom topic must have a non-empty name and keyword string.",
+        );
+        return;
+      }
+
+      normalizedTopics = Object.fromEntries(
+        entries.map(([topicName, keywords]) => [
+          topicName.trim(),
+          String(keywords).trim(),
+        ]),
+      );
+    }
+
+    const requestBody: {
+      name: string;
+      sources: Array<{
+        name: string;
+        limit: number;
+        search?: string;
+        category?: string;
+      }>;
+      topics?: TopicMap;
+    } = {
+      name: normalizedDatasetName,
+      sources: normalizedSources,
+    };
+
+    if (normalizedTopics) {
+      requestBody.topics = normalizedTopics;
+    }
+
+    try {
+      setIsSubmitting(true);
+      setHasError(false);
+      setReturnMessage("");
+
+      const response = await axios.post(
+        `${API_BASE_URL}/datasets/fetch`,
+        requestBody,
+        {
+          headers: {
+            Authorization: `Bearer ${token}`,
+          },
+        },
+      );
+
+      const datasetId = Number(response.data.dataset_id);
+
+      setReturnMessage(
+        `Auto fetch queued successfully (dataset #${datasetId}). Redirecting to processing status...`,
+      );
+
+      setTimeout(() => {
+        navigate(`/dataset/${datasetId}/status`);
+      }, 400);
+    } catch (requestError: unknown) {
+      setHasError(true);
+      if (axios.isAxiosError(requestError)) {
+        const message = String(
+          requestError.response?.data?.error ||
+            requestError.message ||
+            "Auto fetch failed.",
+        );
+        setReturnMessage(`Auto fetch failed: ${message}`);
+      } else {
+        setReturnMessage("Auto fetch failed due to an unexpected error.");
+      }
+    } finally {
+      setIsSubmitting(false);
+    }
+  };
+
+  return (
+    <div style={styles.page}>
+      <div style={styles.containerWide}>
+        <div style={{ ...styles.card, ...styles.headerBar }}>
+          <div>
+            <h1 style={styles.sectionHeaderTitle}>Auto Fetch Dataset</h1>
+            <p style={styles.sectionHeaderSubtitle}>
+              Select sources and fetch settings, then queue processing
+              automatically.
+            </p>
+            <p
+              style={{
+                ...styles.subtleBodyText,
+                marginTop: 6,
+                color: "#9a6700",
+              }}
+            >
+              Warning: Fetching more than 250 posts from any single site can
+              take hours due to rate limits.
+            </p>
+          </div>
+          <button
+            type="button"
+            style={{
+              ...styles.buttonPrimary,
+              opacity: isSubmitting || isLoadingSources ? 0.75 : 1,
+            }}
+            onClick={autoFetch}
+            disabled={isSubmitting || isLoadingSources}
+          >
+            {isSubmitting ? "Queueing..." : "Auto Fetch and Analyze"}
+          </button>
+        </div>
+
+        <div
+          style={{
+            ...styles.grid,
+            marginTop: 14,
+            gridTemplateColumns: "repeat(auto-fit, minmax(280px, 1fr))",
+          }}
+        >
+          <div style={{ ...styles.card, gridColumn: "auto" }}>
+            <h2 style={{ ...styles.sectionTitle, color: "#24292f" }}>
+              Dataset Name
+            </h2>
+            <p style={styles.sectionSubtitle}>
+              Use a clear label so you can identify this run later.
+            </p>
+            <input
+              style={{ ...styles.input, ...styles.inputFullWidth }}
+              type="text"
+              placeholder="Example: r/cork subreddit - Jan 2026"
+              value={datasetName}
+              onChange={(event) => setDatasetName(event.target.value)}
+            />
+          </div>
+
+          <div style={{ ...styles.card, gridColumn: "auto" }}>
+            <h2 style={{ ...styles.sectionTitle, color: "#24292f" }}>
+              Sources
+            </h2>
+            <p style={styles.sectionSubtitle}>
+              Configure source, limit, optional search, and optional category.
+            </p>
+
+            {isLoadingSources && (
+              <p style={styles.subtleBodyText}>Loading sources...</p>
+            )}
+
+            {!isLoadingSources && sourceOptions.length === 0 && (
+              <p style={styles.subtleBodyText}>
+                No source connectors are currently available.
+              </p>
+            )}
+
+            {!isLoadingSources && sourceOptions.length > 0 && (
+              <div
+                style={{ display: "flex", flexDirection: "column", gap: 10 }}
+              >
+                {sourceConfigs.map((source, index) => {
+                  const sourceOption = getSourceOption(source.sourceName);
+                  const searchEnabled = supportsSearch(sourceOption);
+                  const categoriesEnabled = supportsCategories(sourceOption);
+
+                  return (
+                    <div
+                      key={`source-${index}`}
+                      style={{
+                        border: "1px solid #d0d7de",
+                        borderRadius: 8,
+                        padding: 12,
+                        background: "#f6f8fa",
+                        display: "grid",
+                        gap: 8,
+                      }}
+                    >
+                      <select
+                        value={source.sourceName}
+                        style={{ ...styles.input, ...styles.inputFullWidth }}
+                        onChange={(event) =>
+                          updateSourceConfig(
+                            index,
+                            "sourceName",
+                            event.target.value,
+                          )
+                        }
+                      >
+                        {sourceOptions.map((option) => (
+                          <option key={option.id} value={option.id}>
+                            {option.label}
+                          </option>
+                        ))}
+                      </select>
+
+                      <input
+                        type="number"
+                        min={1}
+                        value={source.limit}
+                        placeholder="Limit"
+                        style={{ ...styles.input, ...styles.inputFullWidth }}
+                        onChange={(event) =>
+                          updateSourceConfig(index, "limit", event.target.value)
+                        }
+                      />
+
+                      <input
+                        type="text"
+                        value={source.search}
+                        placeholder={
+                          searchEnabled
+                            ? "Search term (optional)"
+                            : "Search not supported for this source"
+                        }
+                        style={{ ...styles.input, ...styles.inputFullWidth }}
+                        disabled={!searchEnabled}
+                        onChange={(event) =>
+                          updateSourceConfig(
+                            index,
+                            "search",
+                            event.target.value,
+                          )
+                        }
+                      />
+
+                      <input
+                        type="text"
+                        value={source.category}
+                        placeholder={
+                          categoriesEnabled
+                            ? "Category (optional)"
+                            : "Categories not supported for this source"
+                        }
+                        style={{ ...styles.input, ...styles.inputFullWidth }}
+                        disabled={!categoriesEnabled}
+                        onChange={(event) =>
+                          updateSourceConfig(
+                            index,
+                            "category",
+                            event.target.value,
+                          )
+                        }
+                      />
+
+                      {sourceConfigs.length > 1 && (
+                        <button
+                          type="button"
+                          style={styles.buttonSecondary}
+                          onClick={() => removeSourceConfig(index)}
+                        >
+                          Remove source
+                        </button>
+                      )}
+                    </div>
+                  );
+                })}
+
+                <button
+                  type="button"
+                  style={styles.buttonSecondary}
+                  onClick={addSourceConfig}
+                >
+                  Add another source
+                </button>
+              </div>
+            )}
+          </div>
+
+          <div style={{ ...styles.card, gridColumn: "auto" }}>
+            <h2 style={{ ...styles.sectionTitle, color: "#24292f" }}>
+              Topic List
+            </h2>
+            <p style={styles.sectionSubtitle}>
+              Use the default topic list, or provide your own JSON topic map.
+            </p>
+
+            <label
+              style={{
+                display: "flex",
+                alignItems: "center",
+                gap: 8,
+                fontSize: 14,
+                color: "#24292f",
+                marginBottom: 10,
+              }}
+            >
+              <input
+                type="checkbox"
+                checked={useCustomTopics}
+                onChange={(event) => setUseCustomTopics(event.target.checked)}
+              />
+              Use custom topic list
+            </label>
+
+            <textarea
+              value={customTopicsText}
+              onChange={(event) => setCustomTopicsText(event.target.value)}
+              disabled={!useCustomTopics}
+              placeholder='{"Politics": "election, policy, government", "Housing": "rent, landlords, tenancy"}'
+              style={{
+                ...styles.input,
+                ...styles.inputFullWidth,
+                minHeight: 170,
+                resize: "vertical",
+                fontFamily:
+                  '"IBM Plex Mono", "Fira Code", "JetBrains Mono", monospace',
+              }}
+            />
+            <p style={styles.subtleBodyText}>
+              Format: JSON object where each key is a topic and each value is a
+              keyword string.
+            </p>
+          </div>
+        </div>
+
+        <div
+          style={{
+            ...styles.card,
+            marginTop: 14,
+            ...(hasError ? styles.alertCardError : styles.alertCardInfo),
+          }}
+        >
+          {returnMessage ||
+            "After queueing, your dataset is fetched and processed in the background automatically."}
+        </div>
+      </div>
+    </div>
+  );
+};
+
+export default AutoFetchPage;
--- a/frontend/src/pages/DatasetEdit.tsx
+++ b/frontend/src/pages/DatasetEdit.tsx
@@ -0,0 +1,217 @@
+import StatsStyling from "../styles/stats_styling";
+import { useNavigate, useParams } from "react-router-dom";
+import { useEffect, useMemo, useState, type FormEvent } from "react";
+import axios from "axios";
+import ConfirmationModal from "../components/ConfirmationModal";
+
+const API_BASE_URL = import.meta.env.VITE_BACKEND_URL;
+const styles = StatsStyling;
+
+type DatasetInfoResponse = {
+  id: number;
+  name: string;
+  created_at: string;
+};
+
+const DatasetEditPage = () => {
+  const navigate = useNavigate();
+  const { datasetId } = useParams<{ datasetId: string }>();
+  const parsedDatasetId = useMemo(() => Number(datasetId), [datasetId]);
+  const [statusMessage, setStatusMessage] = useState("");
+  const [loading, setLoading] = useState(true);
+  const [isSaving, setIsSaving] = useState(false);
+  const [isDeleting, setIsDeleting] = useState(false);
+  const [isDeleteModalOpen, setIsDeleteModalOpen] = useState(false);
+
+  const [datasetName, setDatasetName] = useState("");
+  useEffect(() => {
+    if (!Number.isInteger(parsedDatasetId) || parsedDatasetId <= 0) {
+      setStatusMessage("Invalid dataset id.");
+      setLoading(false);
+      return;
+    }
+
+    const token = localStorage.getItem("access_token");
+    if (!token) {
+      setStatusMessage("You must be signed in to edit datasets.");
+      setLoading(false);
+      return;
+    }
+
+    axios
+      .get<DatasetInfoResponse>(`${API_BASE_URL}/dataset/${parsedDatasetId}`, {
+        headers: { Authorization: `Bearer ${token}` },
+      })
+      .then((response) => {
+        setDatasetName(response.data.name || "");
+      })
+      .catch((error: unknown) => {
+        if (axios.isAxiosError(error)) {
+          setStatusMessage(
+            String(error.response?.data?.error || error.message),
+          );
+        } else {
+          setStatusMessage("Could not get dataset info.");
+        }
+      })
+      .finally(() => {
+        setLoading(false);
+      });
+  }, [parsedDatasetId]);
+
+  const saveDatasetName = async (event: FormEvent<HTMLFormElement>) => {
+    event.preventDefault();
+
+    const trimmedName = datasetName.trim();
+    if (!trimmedName) {
+      setStatusMessage("Please enter a valid dataset name.");
+      return;
+    }
+
+    const token = localStorage.getItem("access_token");
+    if (!token) {
+      setStatusMessage("You must be signed in to save changes.");
+      return;
+    }
+
+    try {
+      setIsSaving(true);
+      setStatusMessage("");
+
+      await axios.patch(
+        `${API_BASE_URL}/dataset/${parsedDatasetId}`,
+        { name: trimmedName },
+        { headers: { Authorization: `Bearer ${token}` } },
+      );
+
+      navigate("/datasets", { replace: true });
+    } catch (error: unknown) {
+      if (axios.isAxiosError(error)) {
+        setStatusMessage(
+          String(
+            error.response?.data?.error || error.message || "Save failed.",
+          ),
+        );
+      } else {
+        setStatusMessage("Save failed due to an unexpected error.");
+      }
+    } finally {
+      setIsSaving(false);
+    }
+  };
+
+  const deleteDataset = async () => {
+    const deleteToken = localStorage.getItem("access_token");
+    if (!deleteToken) {
+      setStatusMessage("You must be signed in to delete datasets.");
+      setIsDeleteModalOpen(false);
+      return;
+    }
+
+    try {
+      setIsDeleting(true);
+      setStatusMessage("");
+
+      await axios.delete(`${API_BASE_URL}/dataset/${parsedDatasetId}`, {
+        headers: { Authorization: `Bearer ${deleteToken}` },
+      });
+
+      setIsDeleteModalOpen(false);
+      navigate("/datasets", { replace: true });
+    } catch (error: unknown) {
+      if (axios.isAxiosError(error)) {
+        setStatusMessage(
+          String(
+            error.response?.data?.error || error.message || "Delete failed.",
+          ),
+        );
+      } else {
+        setStatusMessage("Delete failed due to an unexpected error.");
+      }
+    } finally {
+      setIsDeleting(false);
+    }
+  };
+
+  return (
+    <div style={styles.page}>
+      <div style={styles.containerNarrow}>
+        <div style={{ ...styles.card, ...styles.headerBar }}>
+          <div>
+            <h1 style={styles.sectionHeaderTitle}>Edit Dataset</h1>
+            <p style={styles.sectionHeaderSubtitle}>
+              Update the dataset name shown in your datasets list.
+            </p>
+          </div>
+        </div>
+
+        <form
+          onSubmit={saveDatasetName}
+          style={{ ...styles.card, marginTop: 14, display: "grid", gap: 12 }}
+        >
+          <label
+            htmlFor="dataset-name"
+            style={{ fontSize: 13, color: "#374151", fontWeight: 600 }}
+          >
+            Dataset name
+          </label>
+
+          <input
+            id="dataset-name"
+            style={{ ...styles.input, ...styles.inputFullWidth }}
+            type="text"
+            placeholder="Example: Cork Discussions - Jan 2026"
+            value={datasetName}
+            onChange={(event) => setDatasetName(event.target.value)}
+            disabled={loading || isSaving}
+          />
+
+          <div style={{ display: "flex", gap: 8, justifyContent: "flex-end" }}>
+            <button
+              type="button"
+              style={styles.buttonDanger}
+              onClick={() => setIsDeleteModalOpen(true)}
+              disabled={isSaving || isDeleting}
+            >
+              Delete Dataset
+            </button>
+
+            <button
+              type="button"
+              style={styles.buttonSecondary}
+              onClick={() => navigate("/datasets")}
+              disabled={isSaving || isDeleting}
+            >
+              Cancel
+            </button>
+            <button
+              type="submit"
+              style={{
+                ...styles.buttonPrimary,
+                opacity: loading || isSaving ? 0.75 : 1,
+              }}
+              disabled={loading || isSaving || isDeleting}
+            >
+              {isSaving ? "Saving..." : "Save"}
+            </button>
+
+            {loading ? "Loading dataset details..." : statusMessage}
+          </div>
+        </form>
+
+        <ConfirmationModal
+          open={isDeleteModalOpen}
+          title="Delete Dataset"
+          message={`Are you sure you want to delete "${datasetName || "this dataset"}"? This action cannot be undone.`}
+          confirmLabel="Delete"
+          cancelLabel="Keep Dataset"
+          loading={isDeleting}
+          onCancel={() => setIsDeleteModalOpen(false)}
+          onConfirm={deleteDataset}
+        />
+      </div>
+    </div>
+  );
+};
+
+export default DatasetEditPage;
--- a/frontend/src/pages/DatasetStatus.tsx
+++ b/frontend/src/pages/DatasetStatus.tsx
@@ -0,0 +1,126 @@
+import { useEffect, useMemo, useState } from "react";
+import axios from "axios";
+import { useNavigate, useParams } from "react-router-dom";
+import StatsStyling from "../styles/stats_styling";
+
+const API_BASE_URL = import.meta.env.VITE_BACKEND_URL;
+
+type DatasetStatusResponse = {
+  status?: "fetching" | "processing" | "complete" | "error";
+  status_message?: string | null;
+  completed_at?: string | null;
+};
+
+const styles = StatsStyling;
+
+const DatasetStatusPage = () => {
+  const navigate = useNavigate();
+  const { datasetId } = useParams<{ datasetId: string }>();
+  const [loading, setLoading] = useState(true);
+  const [status, setStatus] =
+    useState<DatasetStatusResponse["status"]>("processing");
+  const [statusMessage, setStatusMessage] = useState("");
+  const parsedDatasetId = useMemo(() => Number(datasetId), [datasetId]);
+
+  useEffect(() => {
+    if (!Number.isInteger(parsedDatasetId) || parsedDatasetId <= 0) {
+      setLoading(false);
+      setStatus("error");
+      setStatusMessage("Invalid dataset id.");
+      return;
+    }
+
+    let pollTimer: number | undefined;
+
+    const pollStatus = async () => {
+      try {
+        const response = await axios.get<DatasetStatusResponse>(
+          `${API_BASE_URL}/dataset/${parsedDatasetId}/status`,
+        );
+
+        const nextStatus = response.data.status ?? "processing";
+        setStatus(nextStatus);
+        setStatusMessage(String(response.data.status_message ?? ""));
+        setLoading(false);
+
+        if (nextStatus === "complete") {
+          window.setTimeout(() => {
+            navigate(`/dataset/${parsedDatasetId}/stats`, { replace: true });
+          }, 800);
+        }
+      } catch (error: unknown) {
+        setLoading(false);
+        setStatus("error");
+        if (axios.isAxiosError(error)) {
+          const message = String(
+            error.response?.data?.error || error.message || "Request failed",
+          );
+          setStatusMessage(message);
+        } else {
+          setStatusMessage("Unable to fetch dataset status.");
+        }
+      }
+    };
+
+    void pollStatus();
+    pollTimer = window.setInterval(() => {
+      if (status !== "complete" && status !== "error") {
+        void pollStatus();
+      }
+    }, 2000);
+
+    return () => {
+      if (pollTimer) {
+        window.clearInterval(pollTimer);
+      }
+    };
+  }, [navigate, parsedDatasetId, status]);
+
+  const isProcessing =
+    loading || status === "fetching" || status === "processing";
+  const isError = status === "error";
+
+  return (
+    <div style={styles.page}>
+      <div style={styles.containerNarrow}>
+        <div style={{ ...styles.card, marginTop: 28 }}>
+          <h1 style={styles.sectionHeaderTitle}>
+            {isProcessing
+              ? "Processing dataset..."
+              : isError
+                ? "Dataset processing failed"
+                : "Dataset ready"}
+          </h1>
+
+          <p style={{ ...styles.sectionSubtitle, marginTop: 10 }}>
+            {isProcessing &&
+              "Your dataset is being analyzed. This page will redirect to stats automatically once complete."}
+            {isError &&
+              "There was an issue while processing your dataset. Please review the error details."}
+            {status === "complete" &&
+              "Processing complete. Redirecting to your stats now..."}
+          </p>
+
+          <div
+            style={{
+              ...styles.card,
+              ...styles.statusMessageCard,
+              borderColor: isError
+                ? "rgba(185, 28, 28, 0.28)"
+                : "rgba(0,0,0,0.06)",
+              background: isError ? "#fff5f5" : "#ffffff",
+              color: isError ? "#991b1b" : "#374151",
+            }}
+          >
+            {statusMessage ||
+              (isProcessing
+                ? "Waiting for updates from the worker queue..."
+                : "No details provided.")}
+          </div>
+        </div>
+      </div>
+    </div>
+  );
+};
+
+export default DatasetStatusPage;
--- a/frontend/src/pages/Datasets.tsx
+++ b/frontend/src/pages/Datasets.tsx
@@ -0,0 +1,207 @@
+import { useEffect, useState } from "react";
+import axios from "axios";
+import { useNavigate } from "react-router-dom";
+import StatsStyling from "../styles/stats_styling";
+
+const styles = StatsStyling;
+const API_BASE_URL = import.meta.env.VITE_BACKEND_URL;
+
+type DatasetItem = {
+  id: number;
+  name?: string;
+  status?: "processing" | "complete" | "error" | "fetching" | string;
+  status_message?: string | null;
+  completed_at?: string | null;
+  created_at?: string | null;
+};
+
+const DatasetsPage = () => {
+  const navigate = useNavigate();
+  const [datasets, setDatasets] = useState<DatasetItem[]>([]);
+  const [loading, setLoading] = useState(true);
+  const [error, setError] = useState("");
+
+  useEffect(() => {
+    const token = localStorage.getItem("access_token");
+    if (!token) {
+      setLoading(false);
+      setError("You must be signed in to view datasets.");
+      return;
+    }
+
+    axios
+      .get<DatasetItem[]>(`${API_BASE_URL}/user/datasets`, {
+        headers: { Authorization: `Bearer ${token}` },
+      })
+      .then((response) => {
+        const sorted = [...(response.data || [])].sort((a, b) => b.id - a.id);
+        setDatasets(sorted);
+      })
+      .catch((requestError: unknown) => {
+        if (axios.isAxiosError(requestError)) {
+          setError(
+            String(requestError.response?.data?.error || requestError.message),
+          );
+        } else {
+          setError("Failed to load datasets.");
+        }
+      })
+      .finally(() => {
+        setLoading(false);
+      });
+  }, []);
+
+  if (loading) {
+    return (
+      <div style={styles.loadingPage}>
+        <div style={{ ...styles.loadingCard, transform: "translateY(-100px)" }}>
+          <div style={styles.loadingHeader}>
+            <div style={styles.loadingSpinner} />
+            <div>
+              <h2 style={styles.loadingTitle}>Loading datasets</h2>
+            </div>
+          </div>
+
+          <div style={styles.loadingSkeleton}>
+            <div
+              style={{
+                ...styles.loadingSkeletonLine,
+                ...styles.loadingSkeletonLineLong,
+              }}
+            />
+            <div
+              style={{
+                ...styles.loadingSkeletonLine,
+                ...styles.loadingSkeletonLineMed,
+              }}
+            />
+            <div
+              style={{
+                ...styles.loadingSkeletonLine,
+                ...styles.loadingSkeletonLineShort,
+              }}
+            />
+          </div>
+        </div>
+      </div>
+    );
+  }
+
+  return (
+    <div style={styles.page}>
+      <div style={styles.containerWide}>
+        <div style={{ ...styles.card, ...styles.headerBar }}>
+          <div>
+            <h1 style={styles.sectionHeaderTitle}>My Datasets</h1>
+            <p style={styles.sectionHeaderSubtitle}>
+              View and reopen datasets you previously uploaded.
+            </p>
+          </div>
+          <div style={styles.controlsWrapped}>
+            <button
+              type="button"
+              style={styles.buttonPrimary}
+              onClick={() => navigate("/upload")}
+            >
+              Upload New Dataset
+            </button>
+            <button
+              type="button"
+              style={styles.buttonSecondary}
+              onClick={() => navigate("/auto-fetch")}
+            >
+              Auto Fetch Dataset
+            </button>
+          </div>
+        </div>
+
+        {error && (
+          <div
+            style={{
+              ...styles.card,
+              marginTop: 14,
+              borderColor: "rgba(185, 28, 28, 0.28)",
+              background: "#fff5f5",
+              color: "#991b1b",
+              fontSize: 14,
+            }}
+          >
+            {error}
+          </div>
+        )}
+
+        {!error && datasets.length === 0 && (
+          <div style={{ ...styles.card, marginTop: 14, color: "#374151" }}>
+            No datasets yet. Upload one to get started.
+          </div>
+        )}
+
+        {!error && datasets.length > 0 && (
+          <div
+            style={{
+              ...styles.card,
+              marginTop: 14,
+              padding: 0,
+              overflow: "hidden",
+            }}
+          >
+            <ul style={styles.listNoBullets}>
+              {datasets.map((dataset) => {
+                const isComplete =
+                  dataset.status === "complete" || dataset.status === "error";
+                const editPath = `/dataset/${dataset.id}/edit`;
+                const targetPath = isComplete
+                  ? `/dataset/${dataset.id}/stats`
+                  : `/dataset/${dataset.id}/status`;
+
+                return (
+                  <li key={dataset.id} style={styles.datasetListItem}>
+                    <div style={{ minWidth: 0 }}>
+                      <div style={styles.datasetName}>
+                        {dataset.name || `Dataset #${dataset.id}`}
+                      </div>
+                      <div style={styles.datasetMeta}>
+                        ID #{dataset.id} • Status: {dataset.status || "unknown"}
+                      </div>
+                      {dataset.status_message && (
+                        <div style={styles.datasetMetaSecondary}>
+                          {dataset.status_message}
+                        </div>
+                      )}
+                    </div>
+
+                    <div>
+                      {isComplete && (
+                        <button
+                          type="button"
+                          style={{ ...styles.buttonSecondary, margin: "5px" }}
+                          onClick={() => navigate(editPath)}
+                        >
+                          Edit Dataset
+                        </button>
+                      )}
+
+                      <button
+                        type="button"
+                        style={
+                          isComplete
+                            ? styles.buttonPrimary
+                            : styles.buttonSecondary
+                        }
+                        onClick={() => navigate(targetPath)}
+                      >
+                        {isComplete ? "Open stats" : "View status"}
+                      </button>
+                    </div>
+                  </li>
+                );
+              })}
+            </ul>
+          </div>
+        )}
+      </div>
+    </div>
+  );
+};
+
+export default DatasetsPage;
--- a/frontend/src/pages/Login.tsx
+++ b/frontend/src/pages/Login.tsx
@@ -0,0 +1,168 @@
+import { useEffect, useState } from "react";
+import axios from "axios";
+import { useNavigate } from "react-router-dom";
+import StatsStyling from "../styles/stats_styling";
+
+const API_BASE_URL = import.meta.env.VITE_BACKEND_URL;
+
+const styles = StatsStyling;
+
+const LoginPage = () => {
+  const navigate = useNavigate();
+
+  const [isRegisterMode, setIsRegisterMode] = useState(false);
+  const [username, setUsername] = useState("");
+  const [email, setEmail] = useState("");
+  const [password, setPassword] = useState("");
+  const [loading, setLoading] = useState(false);
+  const [error, setError] = useState("");
+  const [info, setInfo] = useState("");
+
+  useEffect(() => {
+    const token = localStorage.getItem("access_token");
+    if (!token) {
+      return;
+    }
+
+    axios.defaults.headers.common.Authorization = `Bearer ${token}`;
+    axios
+      .get(`${API_BASE_URL}/profile`)
+      .then(() => {
+        navigate("/upload", { replace: true });
+      })
+      .catch(() => {
+        localStorage.removeItem("access_token");
+        delete axios.defaults.headers.common.Authorization;
+      });
+  }, [navigate]);
+
+  const handleSubmit = async (event: React.FormEvent<HTMLFormElement>) => {
+    event.preventDefault();
+    setError("");
+    setInfo("");
+    setLoading(true);
+
+    try {
+      if (isRegisterMode) {
+        await axios.post(`${API_BASE_URL}/register`, {
+          username,
+          email,
+          password,
+        });
+        setInfo("Account created. You can now sign in.");
+        setIsRegisterMode(false);
+      } else {
+        const response = await axios.post<{ access_token: string }>(
+          `${API_BASE_URL}/login`,
+          { username, password },
+        );
+
+        const token = response.data.access_token;
+        localStorage.setItem("access_token", token);
+        axios.defaults.headers.common.Authorization = `Bearer ${token}`;
+        navigate("/upload");
+      }
+    } catch (requestError: unknown) {
+      if (axios.isAxiosError(requestError)) {
+        setError(
+          String(
+            requestError.response?.data?.error ||
+              requestError.message ||
+              "Request failed",
+          ),
+        );
+      } else {
+        setError("Unexpected error occurred.");
+      }
+    } finally {
+      setLoading(false);
+    }
+  };
+
+  return (
+    <div style={styles.containerAuth}>
+      <div style={{ ...styles.card, ...styles.authCard }}>
+        <div style={styles.headingBlock}>
+          <h1 style={styles.headingXl}>
+            {isRegisterMode ? "Create your account" : "Welcome back"}
+          </h1>
+          <p style={styles.mutedText}>
+            {isRegisterMode
+              ? "Register to start uploading and exploring your dataset insights."
+              : "Sign in to continue to your analytics workspace."}
+          </p>
+        </div>
+
+        <form onSubmit={handleSubmit} style={styles.authForm}>
+          <input
+            type="text"
+            placeholder="Username"
+            style={{ ...styles.input, ...styles.authControl }}
+            value={username}
+            onChange={(event) => setUsername(event.target.value)}
+            required
+          />
+
+          {isRegisterMode && (
+            <input
+              type="email"
+              placeholder="Email"
+              style={{ ...styles.input, ...styles.authControl }}
+              value={email}
+              onChange={(event) => setEmail(event.target.value)}
+              required
+            />
+          )}
+
+          <input
+            type="password"
+            placeholder="Password"
+            style={{ ...styles.input, ...styles.authControl }}
+            value={password}
+            onChange={(event) => setPassword(event.target.value)}
+            required
+          />
+
+          <button
+            type="submit"
+            style={{
+              ...styles.buttonPrimary,
+              ...styles.authControl,
+              marginTop: 2,
+            }}
+            disabled={loading}
+          >
+            {loading
+              ? "Please wait..."
+              : isRegisterMode
+                ? "Create account"
+                : "Sign in"}
+          </button>
+        </form>
+
+        {error && <p style={styles.authErrorText}>{error}</p>}
+
+        {info && <p style={styles.authInfoText}>{info}</p>}
+
+        <div style={styles.authSwitchRow}>
+          <span style={styles.authSwitchLabel}>
+            {isRegisterMode ? "Already have an account?" : "New here?"}
+          </span>
+          <button
+            type="button"
+            style={styles.authSwitchButton}
+            onClick={() => {
+              setError("");
+              setInfo("");
+              setIsRegisterMode((value) => !value);
+            }}
+          >
+            {isRegisterMode ? "Switch to sign in" : "Create account"}
+          </button>
+        </div>
+      </div>
+    </div>
+  );
+};
+
+export default LoginPage;
--- a/frontend/src/pages/Stats.tsx
+++ b/frontend/src/pages/Stats.tsx
@@ -1,173 +1,772 @@
-import { useEffect, useState, useRef } from "react";
+import { useEffect, useRef, useState } from "react";
 import axios from "axios";
+import { useParams } from "react-router-dom";
 import StatsStyling from "../styles/stats_styling";
 import SummaryStats from "../components/SummaryStats";
 import EmotionalStats from "../components/EmotionalStats";
-import InteractionStats from "../components/UserStats";
+import UserStats from "../components/UserStats";
+import LinguisticStats from "../components/LinguisticStats";
+import InteractionalStats from "../components/InteractionalStats";
+import CulturalStats from "../components/CulturalStats";
+import CorpusExplorer from "../components/CorpusExplorer";

-import { 
-  type SummaryResponse, 
-  type UserAnalysisResponse, 
+import {
+  type SummaryResponse,
  type TimeAnalysisResponse,
-  type ContentAnalysisResponse
-} from '../types/ApiTypes'
+  type User,
+  type UserEndpointResponse,
+  type LinguisticAnalysisResponse,
+  type EmotionalAnalysisResponse,
+  type InteractionAnalysisResponse,
+  type CulturalAnalysisResponse,
+} from "../types/ApiTypes";
+import {
+  buildExplorerContext,
+  type CorpusExplorerSpec,
+  type DatasetRecord,
+} from "../utils/corpusExplorer";

+const API_BASE_URL = import.meta.env.VITE_BACKEND_URL;
 const styles = StatsStyling;
+const DELETED_USERS = ["[deleted]", "automoderator"];
+
+const isDeletedUser = (value: string | null | undefined) =>
+  DELETED_USERS.includes((value ?? "").trim().toLowerCase());
+
+type ActiveView =
+  | "summary"
+  | "emotional"
+  | "user"
+  | "linguistic"
+  | "interactional"
+  | "cultural";
+
+type UserStatsMeta = {
+  totalUsers: number;
+  mostCommentHeavyUser: { author: string; commentShare: number } | null;
+};
+
+type ExplorerState = {
+  open: boolean;
+  title: string;
+  description: string;
+  emptyMessage: string;
+  records: DatasetRecord[];
+  loading: boolean;
+  error: string;
+};
+
+const EMPTY_EXPLORER_STATE: ExplorerState = {
+  open: false,
+  title: "Corpus Explorer",
+  description: "",
+  emptyMessage: "No records found.",
+  records: [],
+  loading: false,
+  error: "",
+};
+
+const createExplorerState = (
+  spec: CorpusExplorerSpec,
+  patch: Partial<ExplorerState> = {},
+): ExplorerState => ({
+  open: true,
+  title: spec.title,
+  description: spec.description,
+  emptyMessage: spec.emptyMessage ?? "No matching records found.",
+  records: [],
+  loading: false,
+  error: "",
+  ...patch,
+});
+
+const compareRecordsByNewest = (a: DatasetRecord, b: DatasetRecord) => {
+  const aValue = String(a.dt ?? a.date ?? a.timestamp ?? "");
+  const bValue = String(b.dt ?? b.date ?? b.timestamp ?? "");
+  return bValue.localeCompare(aValue);
+};
+
+const parseJsonLikePayload = (value: string): unknown => {
+  const normalized = value
+    .replace(/\uFEFF/g, "")
+    .replace(/,\s*([}\]])/g, "$1")
+    .replace(/(:\s*)(NaN|Infinity|-Infinity)\b/g, "$1null")
+    .replace(/(\[\s*)(NaN|Infinity|-Infinity)\b/g, "$1null")
+    .replace(/(,\s*)(NaN|Infinity|-Infinity)\b/g, "$1null")
+    .replace(/(:\s*)None\b/g, "$1null")
+    .replace(/(:\s*)True\b/g, "$1true")
+    .replace(/(:\s*)False\b/g, "$1false")
+    .replace(/(\[\s*)None\b/g, "$1null")
+    .replace(/(\[\s*)True\b/g, "$1true")
+    .replace(/(\[\s*)False\b/g, "$1false")
+    .replace(/(,\s*)None\b/g, "$1null")
+    .replace(/(,\s*)True\b/g, "$1true")
+    .replace(/(,\s*)False\b/g, "$1false");
+
+  return JSON.parse(normalized);
+};
+
+const tryParseRecords = (value: string) => {
+  try {
+    return normalizeRecordPayload(parseJsonLikePayload(value));
+  } catch {
+    return null;
+  }
+};
+
+const parseRecordStringPayload = (payload: string): DatasetRecord[] | null => {
+  const trimmed = payload.trim();
+  if (!trimmed) {
+    return [];
+  }
+
+  const direct = tryParseRecords(trimmed);
+  if (direct) {
+    return direct;
+  }
+
+  const ndjsonLines = trimmed
+    .split(/\r?\n/)
+    .map((line) => line.trim())
+    .filter(Boolean);
+  if (ndjsonLines.length > 0) {
+    try {
+      return ndjsonLines.map((line) => parseJsonLikePayload(line)) as DatasetRecord[];
+    } catch {
+    }
+  }
+
+  const bracketStart = trimmed.indexOf("[");
+  const bracketEnd = trimmed.lastIndexOf("]");
+  if (bracketStart !== -1 && bracketEnd > bracketStart) {
+    const parsed = tryParseRecords(trimmed.slice(bracketStart, bracketEnd + 1));
+    if (parsed) {
+      return parsed;
+    }
+  }
+
+  const braceStart = trimmed.indexOf("{");
+  const braceEnd = trimmed.lastIndexOf("}");
+  if (braceStart !== -1 && braceEnd > braceStart) {
+    const parsed = tryParseRecords(trimmed.slice(braceStart, braceEnd + 1));
+    if (parsed) {
+      return parsed;
+    }
+  }
+
+  return null;
+};
+
+const normalizeRecordPayload = (payload: unknown): DatasetRecord[] => {
+  if (typeof payload === "string") {
+    const parsed = parseRecordStringPayload(payload);
+    if (parsed) {
+      return parsed;
+    }
+
+    const preview = payload.trim().slice(0, 120).replace(/\s+/g, " ");
+    throw new Error(
+      `Corpus endpoint returned a non-JSON string payload.${
+        preview ? ` Response preview: ${preview}` : ""
+      }`,
+    );
+  }
+
+  if (
+    payload &&
+    typeof payload === "object" &&
+    "error" in payload &&
+    typeof (payload as { error?: unknown }).error === "string"
+  ) {
+    throw new Error((payload as { error: string }).error);
+  }
+
+  if (Array.isArray(payload)) {
+    return payload as DatasetRecord[];
+  }
+
+  if (
+    payload &&
+    typeof payload === "object" &&
+    "data" in payload &&
+    Array.isArray((payload as { data?: unknown }).data)
+  ) {
+    return (payload as { data: DatasetRecord[] }).data;
+  }
+
+  if (
+    payload &&
+    typeof payload === "object" &&
+    "records" in payload &&
+    Array.isArray((payload as { records?: unknown }).records)
+  ) {
+    return (payload as { records: DatasetRecord[] }).records;
+  }
+
+  if (
+    payload &&
+    typeof payload === "object" &&
+    "rows" in payload &&
+    Array.isArray((payload as { rows?: unknown }).rows)
+  ) {
+    return (payload as { rows: DatasetRecord[] }).rows;
+  }
+
+  if (
+    payload &&
+    typeof payload === "object" &&
+    "result" in payload &&
+    Array.isArray((payload as { result?: unknown }).result)
+  ) {
+    return (payload as { result: DatasetRecord[] }).result;
+  }
+
+  if (payload && typeof payload === "object") {
+    const values = Object.values(payload);
+    if (values.length === 1 && Array.isArray(values[0])) {
+      return values[0] as DatasetRecord[];
+    }
+    if (values.every((value) => value && typeof value === "object")) {
+      return values as DatasetRecord[];
+    }
+  }
+
+  throw new Error("Corpus endpoint returned an unexpected payload.");
+};

 const StatPage = () => {
-  const [error, setError] = useState('');
+  const { datasetId: routeDatasetId } = useParams<{ datasetId: string }>();
+  const [error, setError] = useState("");
  const [loading, setLoading] = useState(false);
-  const [activeView, setActiveView] = useState<"summary" | "emotional" | "interaction">("summary");
+  const [activeView, setActiveView] = useState<ActiveView>("summary");

-  const [userData, setUserData] = useState<UserAnalysisResponse | null>(null);
+  const [userData, setUserData] = useState<UserEndpointResponse | null>(null);
  const [timeData, setTimeData] = useState<TimeAnalysisResponse | null>(null);
-  const [contentData, setContentData] = useState<ContentAnalysisResponse | null>(null);
+  const [linguisticData, setLinguisticData] =
+    useState<LinguisticAnalysisResponse | null>(null);
+  const [emotionalData, setEmotionalData] =
+    useState<EmotionalAnalysisResponse | null>(null);
+  const [interactionData, setInteractionData] =
+    useState<InteractionAnalysisResponse | null>(null);
+  const [culturalData, setCulturalData] =
+    useState<CulturalAnalysisResponse | null>(null);
  const [summary, setSummary] = useState<SummaryResponse | null>(null);
-
+  const [userStatsMeta, setUserStatsMeta] = useState<UserStatsMeta>({
+    totalUsers: 0,
+    mostCommentHeavyUser: null,
+  });
+  const [appliedFilters, setAppliedFilters] = useState<Record<string, string>>({});
+  const [allRecords, setAllRecords] = useState<DatasetRecord[] | null>(null);
+  const [allRecordsKey, setAllRecordsKey] = useState("");
+  const [explorerState, setExplorerState] = useState<ExplorerState>(
+    EMPTY_EXPLORER_STATE,
+  );

  const searchInputRef = useRef<HTMLInputElement>(null);
  const beforeDateRef = useRef<HTMLInputElement>(null);
  const afterDateRef = useRef<HTMLInputElement>(null);

-  const getStats = () => {
+  const parsedDatasetId = Number(routeDatasetId ?? "");
+  const datasetId =
+    Number.isInteger(parsedDatasetId) && parsedDatasetId > 0
+      ? parsedDatasetId
+      : null;
+
+  const getFilterParams = () => {
+    const params: Record<string, string> = {};
+    const query = (searchInputRef.current?.value ?? "").trim();
+    const start = (afterDateRef.current?.value ?? "").trim();
+    const end = (beforeDateRef.current?.value ?? "").trim();
+
+    if (query) {
+      params.search_query = query;
+    }
+
+    if (start) {
+      params.start_date = start;
+    }
+
+    if (end) {
+      params.end_date = end;
+    }
+
+    return params;
+  };
+
+  const getAuthHeaders = () => {
+    const token = localStorage.getItem("access_token");
+    if (!token) {
+      return null;
+    }
+
+    return {
+      Authorization: `Bearer ${token}`,
+    };
+  };
+
+  const getFilterKey = (params: Record<string, string>) =>
+    JSON.stringify(Object.entries(params).sort(([a], [b]) => a.localeCompare(b)));
+
+  const ensureFilteredRecords = async () => {
+    if (!datasetId) {
+      throw new Error("Missing dataset id.");
+    }
+
+    const authHeaders = getAuthHeaders();
+    if (!authHeaders) {
+      throw new Error("You must be signed in to load corpus records.");
+    }
+
+    const filterKey = getFilterKey(appliedFilters);
+    if (allRecords && allRecordsKey === filterKey) {
+      return allRecords;
+    }
+
+    const response = await axios.get<unknown>(
+      `${API_BASE_URL}/dataset/${datasetId}/all`,
+      {
+        params: appliedFilters,
+        headers: authHeaders,
+      },
+    );
+
+    const normalizedRecords = normalizeRecordPayload(response.data);
+
+    setAllRecords(normalizedRecords);
+    setAllRecordsKey(filterKey);
+    return normalizedRecords;
+  };
+
+  const openExplorer = async (spec: CorpusExplorerSpec) => {
+    setExplorerState(createExplorerState(spec, { loading: true }));
+
+    try {
+      const records = await ensureFilteredRecords();
+      const context = buildExplorerContext(records);
+      const matched = records
+        .filter((record) => spec.matcher(record, context))
+        .sort(compareRecordsByNewest);
+
+      setExplorerState(createExplorerState(spec, { records: matched }));
+    } catch (e) {
+      setExplorerState(
+        createExplorerState(spec, {
+          error: `Failed to load corpus records: ${String(e)}`,
+        }),
+      );
+    }
+  };
+
+  const getStats = (params: Record<string, string> = {}) => {
+    if (!datasetId) {
+      setError("Missing dataset id. Open /dataset/<id>/stats.");
+      return;
+    }
+
+    const authHeaders = getAuthHeaders();
+    if (!authHeaders) {
+      setError("You must be signed in to load stats.");
+      return;
+    }
+
    setError("");
    setLoading(true);
+    setAppliedFilters(params);
+    setAllRecords(null);
+    setAllRecordsKey("");
+    setExplorerState((current) => ({ ...current, open: false }));

    Promise.all([
-      axios.get<TimeAnalysisResponse>("http://localhost:5000/stats/time"),
-      axios.get<UserAnalysisResponse>("http://localhost:5000/stats/user"),
-      axios.get<ContentAnalysisResponse>("http://localhost:5000/stats/content"),
-      axios.get<SummaryResponse>(`http://localhost:5000/stats/summary`),
-    ]) 
-      .then(([timeRes, userRes, contentRes, summaryRes]) => {
-        setUserData(userRes.data || null);
-        setTimeData(timeRes.data || null);
-        setContentData(contentRes.data || null);
-        setSummary(summaryRes.data || null);
-      })
-      .catch((e) => setError("Failed to load statistics: " + String(e)))
+      axios.get<TimeAnalysisResponse>(`${API_BASE_URL}/dataset/${datasetId}/temporal`, {
+        params,
+        headers: authHeaders,
+      }),
+      axios.get<UserEndpointResponse>(`${API_BASE_URL}/dataset/${datasetId}/user`, {
+        params,
+        headers: authHeaders,
+      }),
+      axios.get<LinguisticAnalysisResponse>(
+        `${API_BASE_URL}/dataset/${datasetId}/linguistic`,
+        {
+          params,
+          headers: authHeaders,
+        },
+      ),
+      axios.get<EmotionalAnalysisResponse>(`${API_BASE_URL}/dataset/${datasetId}/emotional`, {
+        params,
+        headers: authHeaders,
+      }),
+      axios.get<InteractionAnalysisResponse>(
+        `${API_BASE_URL}/dataset/${datasetId}/interactional`,
+        {
+          params,
+          headers: authHeaders,
+        },
+      ),
+      axios.get<SummaryResponse>(`${API_BASE_URL}/dataset/${datasetId}/summary`, {
+        params,
+        headers: authHeaders,
+      }),
+      axios.get<CulturalAnalysisResponse>(`${API_BASE_URL}/dataset/${datasetId}/cultural`, {
+        params,
+        headers: authHeaders,
+      }),
+    ])
+      .then(
+        ([
+          timeRes,
+          userRes,
+          linguisticRes,
+          emotionalRes,
+          interactionRes,
+          summaryRes,
+          culturalRes,
+        ]) => {
+          const usersList = userRes.data.users ?? [];
+          const topUsersList = userRes.data.top_users ?? [];
+          const interactionGraphRaw = interactionRes.data?.interaction_graph ?? {};
+          const topPairsRaw = interactionRes.data?.top_interaction_pairs ?? [];
+
+          const filteredUsers: typeof usersList = [];
+          for (const user of usersList) {
+            if (isDeletedUser(user.author)) continue;
+            filteredUsers.push(user);
+          }
+
+          const filteredTopUsers: typeof topUsersList = [];
+          for (const user of topUsersList) {
+            if (isDeletedUser(user.author)) continue;
+            filteredTopUsers.push(user);
+          }
+
+          let mostCommentHeavyUser: UserStatsMeta["mostCommentHeavyUser"] = null;
+          for (const user of filteredUsers) {
+            const currentShare = user.comment_share ?? 0;
+            if (!mostCommentHeavyUser || currentShare > mostCommentHeavyUser.commentShare) {
+              mostCommentHeavyUser = {
+                author: user.author,
+                commentShare: currentShare,
+              };
+            }
+          }
+
+          const topAuthors = new Set(filteredTopUsers.map((entry) => entry.author));
+          const summaryUsers: User[] = [];
+          for (const user of filteredUsers) {
+            if (topAuthors.has(user.author)) {
+              summaryUsers.push(user);
+            }
+          }
+
+          const filteredInteractionGraph: Record<string, Record<string, number>> = {};
+          for (const [source, targets] of Object.entries(interactionGraphRaw)) {
+            if (isDeletedUser(source)) {
+              continue;
+            }
+
+            const nextTargets: Record<string, number> = {};
+            for (const [target, count] of Object.entries(targets)) {
+              if (isDeletedUser(target)) {
+                continue;
+              }
+              nextTargets[target] = count;
+            }
+
+            filteredInteractionGraph[source] = nextTargets;
+          }
+
+          const filteredTopInteractionPairs: typeof topPairsRaw = [];
+          for (const pairEntry of topPairsRaw) {
+            const pair = pairEntry[0];
+            const source = pair[0];
+            const target = pair[1];
+            if (isDeletedUser(source) || isDeletedUser(target)) {
+              continue;
+            }
+            filteredTopInteractionPairs.push(pairEntry);
+          }
+
+          const filteredUserData: UserEndpointResponse = {
+            users: summaryUsers,
+            top_users: filteredTopUsers,
+          };
+
+          const filteredInteractionData: InteractionAnalysisResponse = {
+            ...interactionRes.data,
+            interaction_graph: filteredInteractionGraph,
+            top_interaction_pairs: filteredTopInteractionPairs,
+          };
+
+          const filteredSummary: SummaryResponse = {
+            ...summaryRes.data,
+            unique_users: filteredUsers.length,
+          };
+
+          setUserData(filteredUserData);
+          setUserStatsMeta({
+            totalUsers: filteredUsers.length,
+            mostCommentHeavyUser,
+          });
+          setTimeData(timeRes.data || null);
+          setLinguisticData(linguisticRes.data || null);
+          setEmotionalData(emotionalRes.data || null);
+          setInteractionData(filteredInteractionData || null);
+          setCulturalData(culturalRes.data || null);
+          setSummary(filteredSummary || null);
+        },
+      )
+      .catch((e) => setError(`Failed to load statistics: ${String(e)}`))
      .finally(() => setLoading(false));
  };

  const onSubmitFilters = () => {
-    const query = searchInputRef.current?.value ?? "";
-
-    Promise.all([
-      axios.post("http://localhost:5000/filter/search", {
-        query: query
-      }),
-    ]) 
-    .then(() => {
-      getStats();
-    })
-    .catch(e => {
-      setError("Failed to load filters: " + e.response);
-    })
+    getStats(getFilterParams());
  };

  const resetFilters = () => {
-    axios.get("http://localhost:5000/filter/reset")
-    .then(() => {
-      getStats();
-    })
-    .catch(e => {
-      setError(e);
-    })
+    if (searchInputRef.current) {
+      searchInputRef.current.value = "";
+    }
+    if (beforeDateRef.current) {
+      beforeDateRef.current.value = "";
+    }
+    if (afterDateRef.current) {
+      afterDateRef.current.value = "";
+    }
+    getStats();
  };

  useEffect(() => {
    setError("");
+    setAllRecords(null);
+    setAllRecordsKey("");
+    setExplorerState(EMPTY_EXPLORER_STATE);
+    if (!datasetId) {
+      setError("Missing dataset id. Open /dataset/<id>/stats.");
+      return;
+    }
    getStats();
-  }, [])
+  }, [datasetId]);

-  if (loading) return <p style={{...styles.page, minWidth: "100vh", minHeight: "100vh"}}>Loading insights…</p>;
-  if (error) return <p style={{...styles.page}}>{error}</p>;
+  if (loading) {
+    return (
+      <div style={styles.loadingPage}>
+        <div style={{ ...styles.loadingCard, transform: "translateY(-100px)" }}>
+          <div style={styles.loadingHeader}>
+            <div style={styles.loadingSpinner} />
+            <div>
+              <h2 style={styles.loadingTitle}>Loading analytics</h2>
+              <p style={styles.loadingSubtitle}>
+                Fetching summary, timeline, user, and content insights.
+              </p>
+            </div>
+          </div>

-return (
-  <div style={styles.page}>
-    <div style={{ ...styles.container, ...styles.card, ...styles.headerBar }}>
-      <div style={styles.controls}>
-        <input
-          type="text"
-          id="query"
-          ref={searchInputRef}
-          placeholder="Search events..."
-          style={styles.input}
-        />
+          <div style={styles.loadingSkeleton}>
+            <div
+              style={{
+                ...styles.loadingSkeletonLine,
+                ...styles.loadingSkeletonLineLong,
+              }}
+            />
+            <div
+              style={{
+                ...styles.loadingSkeletonLine,
+                ...styles.loadingSkeletonLineMed,
+              }}
+            />
+            <div
+              style={{
+                ...styles.loadingSkeletonLine,
+                ...styles.loadingSkeletonLineShort,
+              }}
+            />
+          </div>
+        </div>
+      </div>
+    );
+  }
+  if (error) return <p style={{ ...styles.page }}>{error}</p>;

-        <input 
-          type="date"
-          ref={beforeDateRef}
-          placeholder="Search before date"
-          style={styles.input}
-        />
+  return (
+    <div style={styles.page}>
+      <div style={{ ...styles.container, ...styles.card, ...styles.headerBar }}>
+        <div style={styles.controls}>
+          <input
+            type="text"
+            id="query"
+            ref={searchInputRef}
+            placeholder="Search events..."
+            style={styles.input}
+          />

-        <input
+          <input
+            type="date"
+            ref={beforeDateRef}
+            placeholder="Search before date"
+            style={styles.input}
+          />
+
+          <input
            type="date"
            ref={afterDateRef}
            placeholder="Search before date"
            style={styles.input}
+          />
+
+          <button onClick={onSubmitFilters} style={styles.buttonPrimary}>
+            Search
+          </button>
+
+          <button onClick={resetFilters} style={styles.buttonSecondary}>
+            Reset
+          </button>
+        </div>
+
+        <div style={styles.dashboardMeta}>Analytics Dashboard</div>
+        <div style={styles.dashboardMeta}>Dataset #{datasetId ?? "-"}</div>
+      </div>
+
+      <div
+        style={{
+          ...styles.container,
+          ...styles.tabsRow,
+          justifyContent: "center",
+        }}
+      >
+        <button
+          onClick={() => setActiveView("summary")}
+          style={
+            activeView === "summary" ? styles.buttonPrimary : styles.buttonSecondary
+          }
+        >
+          Summary
+        </button>
+        <button
+          onClick={() => setActiveView("emotional")}
+          style={
+            activeView === "emotional"
+              ? styles.buttonPrimary
+              : styles.buttonSecondary
+          }
+        >
+          Emotional
+        </button>
+
+        <button
+          onClick={() => setActiveView("user")}
+          style={activeView === "user" ? styles.buttonPrimary : styles.buttonSecondary}
+        >
+          Users
+        </button>
+        <button
+          onClick={() => setActiveView("linguistic")}
+          style={
+            activeView === "linguistic"
+              ? styles.buttonPrimary
+              : styles.buttonSecondary
+          }
+        >
+          Linguistic
+        </button>
+        <button
+          onClick={() => setActiveView("interactional")}
+          style={
+            activeView === "interactional"
+              ? styles.buttonPrimary
+              : styles.buttonSecondary
+          }
+        >
+          Interactional
+        </button>
+        <button
+          onClick={() => setActiveView("cultural")}
+          style={
+            activeView === "cultural" ? styles.buttonPrimary : styles.buttonSecondary
+          }
+        >
+          Cultural
+        </button>
+      </div>
+
+      {activeView === "summary" && (
+        <SummaryStats
+          userData={userData}
+          timeData={timeData}
+          linguisticData={linguisticData}
+          summary={summary}
+          onExplore={openExplorer}
        />
+      )}

-        <button onClick={onSubmitFilters} style={styles.buttonPrimary}>
-          Search
-        </button>
+      {activeView === "emotional" && emotionalData && (
+        <EmotionalStats emotionalData={emotionalData} onExplore={openExplorer} />
+      )}

-        <button onClick={resetFilters} style={styles.buttonSecondary}>
-          Reset
-        </button>
-      </div>
+      {activeView === "emotional" && !emotionalData && (
+        <div style={{ ...styles.container, ...styles.card, marginTop: 16 }}>
+          No emotional data available.
+        </div>
+      )}

-      <div style={{ fontSize: 13, color: "#6b7280" }}>Analytics Dashboard</div>
-    </div>
+      {activeView === "user" && userData && interactionData && (
+        <UserStats
+          topUsers={userData.top_users}
+          interactionGraph={interactionData.interaction_graph}
+          totalUsers={userStatsMeta.totalUsers}
+          mostCommentHeavyUser={userStatsMeta.mostCommentHeavyUser}
+          onExplore={openExplorer}
+        />
+      )}

-    <div style={{ ...styles.container, display: "flex", gap: 8, marginTop: 12 }}>
-      <button
-        onClick={() => setActiveView("summary")}
-        style={activeView === "summary" ? styles.buttonPrimary : styles.buttonSecondary}
-      >
-        Summary
-      </button>
-      <button
-        onClick={() => setActiveView("emotional")}
-        style={activeView === "emotional" ? styles.buttonPrimary : styles.buttonSecondary}
-      >
-        Emotional
-      </button>
+      {activeView === "user" && (!userData || !interactionData) && (
+        <div style={{ ...styles.container, ...styles.card, marginTop: 16 }}>
+          No user network data available.
+        </div>
+      )}

-      <button
-        onClick={() => setActiveView("interaction")}
-        style={activeView === "interaction" ? styles.buttonPrimary : styles.buttonSecondary}
-      >
-        Interaction
-      </button>
-    </div>
+      {activeView === "linguistic" && linguisticData && (
+        <LinguisticStats data={linguisticData} onExplore={openExplorer} />
+      )}

-    {activeView === "summary" && (
-      <SummaryStats
-        userData={userData}
-        timeData={timeData}
-        contentData={contentData}
-        summary={summary}
+      {activeView === "linguistic" && !linguisticData && (
+        <div style={{ ...styles.container, ...styles.card, marginTop: 16 }}>
+          No linguistic data available.
+        </div>
+      )}
+
+      {activeView === "interactional" && interactionData && (
+        <InteractionalStats data={interactionData} />
+      )}
+
+      {activeView === "interactional" && !interactionData && (
+        <div style={{ ...styles.container, ...styles.card, marginTop: 16 }}>
+          No interactional data available.
+        </div>
+      )}
+
+      {activeView === "cultural" && culturalData && (
+        <CulturalStats data={culturalData} onExplore={openExplorer} />
+      )}
+
+      {activeView === "cultural" && !culturalData && (
+        <div style={{ ...styles.container, ...styles.card, marginTop: 16 }}>
+          No cultural data available.
+        </div>
+      )}
+
+      <CorpusExplorer
+        open={explorerState.open}
+        onClose={() => setExplorerState((current) => ({ ...current, open: false }))}
+        title={explorerState.title}
+        description={explorerState.description}
+        records={explorerState.records}
+        loading={explorerState.loading}
+        error={explorerState.error}
+        emptyMessage={explorerState.emptyMessage}
      />
-    )}
-
-    {activeView === "emotional" && contentData && (
-      <EmotionalStats contentData={contentData} />
-    )}
-
-    {activeView === "emotional" && !contentData && (
-      <div style={{ ...styles.container, ...styles.card, marginTop: 16 }}>
-        No emotional data available.
-      </div>
-    )}
-
-    {activeView === "interaction" && userData && (
-      <InteractionStats data={userData} />
-    )}
-
-  </div>
-);
-}
+    </div>
+  );
+};

 export default StatPage;
--- a/frontend/src/pages/Upload.tsx
+++ b/frontend/src/pages/Upload.tsx
@@ -1,56 +1,180 @@
-import axios from 'axios'
-import './../App.css'
-import { useState } from 'react'
-import { useNavigate } from 'react-router-dom'
+import axios from "axios";
+import { useState } from "react";
+import { useNavigate } from "react-router-dom";
 import StatsStyling from "../styles/stats_styling";

 const styles = StatsStyling;
+const API_BASE_URL = import.meta.env.VITE_BACKEND_URL;

 const UploadPage = () => {
-  let postFile: File | undefined;
-  let topicBucketFile: File | undefined;
-  const [returnMessage, setReturnMessage] = useState('')
-  const navigate = useNavigate()
+  const [datasetName, setDatasetName] = useState("");
+  const [postFile, setPostFile] = useState<File | null>(null);
+  const [topicBucketFile, setTopicBucketFile] = useState<File | null>(null);
+  const [returnMessage, setReturnMessage] = useState("");
+  const [isSubmitting, setIsSubmitting] = useState(false);
+  const [hasError, setHasError] = useState(false);
+  const navigate = useNavigate();

  const uploadFiles = async () => {
-    if (!postFile || !topicBucketFile) {
-      alert('Please upload all files before uploading.')
-      return
+    const normalizedDatasetName = datasetName.trim();
+
+    if (!normalizedDatasetName) {
+      setHasError(true);
+      setReturnMessage("Please add a dataset name before continuing.");
+      return;
    }

-    const formData = new FormData()
-    formData.append('posts', postFile)
-    formData.append('topics', topicBucketFile)
+    if (!postFile || !topicBucketFile) {
+      setHasError(true);
+      setReturnMessage("Please upload both files before continuing.");
+      return;
+    }
+
+    const formData = new FormData();
+    formData.append("name", normalizedDatasetName);
+    formData.append("posts", postFile);
+    formData.append("topics", topicBucketFile);

    try {
-      const response = await axios.post('http://localhost:5000/upload', formData, {
-        headers: {
-          'Content-Type': 'multipart/form-data',
-        },
-      })
-      console.log('Files uploaded successfully:', response.data)
-      setReturnMessage(`Upload successful! Posts: ${response.data.posts_count}, Comments: ${response.data.comments_count}`)
-      navigate('/stats')
-    } catch (error) {
-      console.error('Error uploading files:', error)
-      setReturnMessage('Error uploading files. Error details: ' + error)
-    }
-  }
-  return (
-    <div style={{...styles.container, ...styles.grid, margin: "0"}}>
-      <div style={{ ...styles.card }}>
-        <h2 style={{color: "black" }}>Posts File</h2>
-        <input style={{color: "black" }} type="file" onChange={(e) => postFile = e.target.files?.[0]}></input>
-      </div>
-      <div style={{ ...styles.card }}>
-        <h2 style={{color: "black" }}>Topic Buckets File</h2>
-        <input style={{color: "black" }} type="file" onChange={(e) => topicBucketFile = e.target.files?.[0]}></input>
-      </div>
-      <button onClick={uploadFiles}>Upload</button>
+      setIsSubmitting(true);
+      setHasError(false);
+      setReturnMessage("");

-      <p>{returnMessage}</p>
+      const response = await axios.post(
+        `${API_BASE_URL}/datasets/upload`,
+        formData,
+        {
+          headers: {
+            "Content-Type": "multipart/form-data",
+          },
+        },
+      );
+
+      const datasetId = Number(response.data.dataset_id);
+
+      setReturnMessage(
+        `Upload queued successfully (dataset #${datasetId}). Redirecting to processing status...`,
+      );
+
+      setTimeout(() => {
+        navigate(`/dataset/${datasetId}/status`);
+      }, 400);
+    } catch (error: unknown) {
+      setHasError(true);
+      if (axios.isAxiosError(error)) {
+        const message = String(
+          error.response?.data?.error || error.message || "Upload failed.",
+        );
+        setReturnMessage(`Upload failed: ${message}`);
+      } else {
+        setReturnMessage("Upload failed due to an unexpected error.");
+      }
+    } finally {
+      setIsSubmitting(false);
+    }
+  };
+
+  return (
+    <div style={styles.page}>
+      <div style={styles.containerWide}>
+        <div style={{ ...styles.card, ...styles.headerBar }}>
+          <div>
+            <h1 style={styles.sectionHeaderTitle}>Upload Dataset</h1>
+            <p style={styles.sectionHeaderSubtitle}>
+              Name your dataset, then upload posts and topic map files to
+              generate analytics.
+            </p>
+          </div>
+          <button
+            type="button"
+            style={{
+              ...styles.buttonPrimary,
+              opacity: isSubmitting ? 0.75 : 1,
+            }}
+            onClick={uploadFiles}
+            disabled={isSubmitting}
+          >
+            {isSubmitting ? "Uploading..." : "Upload and Analyze"}
+          </button>
+        </div>
+
+        <div
+          style={{
+            ...styles.grid,
+            marginTop: 14,
+            gridTemplateColumns: "repeat(auto-fit, minmax(280px, 1fr))",
+          }}
+        >
+          <div style={{ ...styles.card, gridColumn: "auto" }}>
+            <h2 style={{ ...styles.sectionTitle, color: "#24292f" }}>
+              Dataset Name
+            </h2>
+            <p style={styles.sectionSubtitle}>
+              Use a clear label so you can identify this upload later.
+            </p>
+            <input
+              style={{ ...styles.input, ...styles.inputFullWidth }}
+              type="text"
+              placeholder="Example: Cork Discussions - Jan 2026"
+              value={datasetName}
+              onChange={(event) => setDatasetName(event.target.value)}
+            />
+          </div>
+
+          <div style={{ ...styles.card, gridColumn: "auto" }}>
+            <h2 style={{ ...styles.sectionTitle, color: "#24292f" }}>
+              Posts File (.jsonl)
+            </h2>
+            <p style={styles.sectionSubtitle}>
+              Upload the raw post records export.
+            </p>
+            <input
+              style={{ ...styles.input, ...styles.inputFullWidth }}
+              type="file"
+              accept=".jsonl"
+              onChange={(event) => setPostFile(event.target.files?.[0] ?? null)}
+            />
+            <p style={styles.subtleBodyText}>
+              {postFile ? `Selected: ${postFile.name}` : "No file selected"}
+            </p>
+          </div>
+
+          <div style={{ ...styles.card, gridColumn: "auto" }}>
+            <h2 style={{ ...styles.sectionTitle, color: "#24292f" }}>
+              Topics File (.json)
+            </h2>
+            <p style={styles.sectionSubtitle}>
+              Upload your topic bucket mapping file.
+            </p>
+            <input
+              style={{ ...styles.input, ...styles.inputFullWidth }}
+              type="file"
+              accept=".json"
+              onChange={(event) =>
+                setTopicBucketFile(event.target.files?.[0] ?? null)
+              }
+            />
+            <p style={styles.subtleBodyText}>
+              {topicBucketFile
+                ? `Selected: ${topicBucketFile.name}`
+                : "No file selected"}
+            </p>
+          </div>
+        </div>
+
+        <div
+          style={{
+            ...styles.card,
+            marginTop: 14,
+            ...(hasError ? styles.alertCardError : styles.alertCardInfo),
+          }}
+        >
+          {returnMessage ||
+            "After upload, your dataset is queued for processing and you'll land on stats."}
+        </div>
+      </div>
    </div>
-  )
-}
+  );
+};

 export default UploadPage;
--- a/frontend/src/stats/ActivityHeatmap.tsx
+++ b/frontend/src/stats/ActivityHeatmap.tsx
@@ -1,4 +1,5 @@
 import { ResponsiveHeatMap } from "@nivo/heatmap";
+import { memo, useMemo } from "react";

 type ApiRow = Record<number, number>;
 type ActivityHeatmapProps = {
@@ -25,8 +26,7 @@ const DAYS = [
  "Sunday",
 ];

-const hourLabel = (h: number) =>
-  `${h.toString().padStart(2, "0")}:00`;
+const hourLabel = (h: number) => `${h.toString().padStart(2, "0")}:00`;

 const convertWeeklyData = (dataset: ApiRow[]): ChartSeries[] => {
  return dataset.map((dayData, index) => ({
@@ -40,32 +40,37 @@ const convertWeeklyData = (dataset: ApiRow[]): ChartSeries[] => {
  }));
 };

-
 const ActivityHeatmap = ({ data }: ActivityHeatmapProps) => {
-    const convertedData = convertWeeklyData(data);
+  const convertedData = useMemo(() => convertWeeklyData(data), [data]);

-    const maxValue = Math.max(
-    ...convertedData.flatMap(day =>
-      day.data.map(point => point.y)
-    )
+  const maxValue = useMemo(() => {
+    let max = 0;
+    for (const day of convertedData) {
+      for (const point of day.data) {
+        if (point.y > max) {
+          max = point.y;
+        }
+      }
+    }
+    return max;
+  }, [convertedData]);
+
+  return (
+    <ResponsiveHeatMap
+      data={convertedData}
+      valueFormat=">-.2s"
+      axisTop={{ tickRotation: -90 }}
+      axisRight={{ legend: "Weekday", legendOffset: 70 }}
+      axisLeft={{ legend: "Weekday", legendOffset: -72 }}
+      colors={{
+        type: "diverging",
+        scheme: "red_yellow_blue",
+        divergeAt: 0.3,
+        minValue: 0,
+        maxValue: maxValue,
+      }}
+    />
  );
+};

-    return (
-            <ResponsiveHeatMap
-                data={convertedData}
-                valueFormat=">-.2s"
-                axisTop={{ tickRotation: -90 }}
-                axisRight={{ legend: 'Weekday', legendOffset: 70 }}
-                axisLeft={{ legend: 'Weekday', legendOffset: -72 }}
-                colors={{
-                    type: 'diverging',
-                    scheme: 'red_yellow_blue',
-                    divergeAt: 0.3,
-                    minValue: 0,
-                    maxValue: maxValue
-                }}
-        />
-    )
-}
-
-export default ActivityHeatmap;
+export default memo(ActivityHeatmap);
--- a/frontend/src/styles/stats/appLayout.ts
+++ b/frontend/src/styles/stats/appLayout.ts
@@ -0,0 +1,42 @@
+import { palette } from "./palette";
+import type { StyleMap } from "./types";
+
+export const appLayoutStyles: StyleMap = {
+  appHeaderWrap: {
+    padding: "16px 24px 0",
+  },
+
+  appHeaderBrandRow: {
+    display: "flex",
+    alignItems: "center",
+    gap: 10,
+    flexWrap: "wrap",
+  },
+
+  appTitle: {
+    margin: 0,
+    color: palette.textPrimary,
+    fontSize: 18,
+    fontWeight: 600,
+  },
+
+  authStatusBadge: {
+    padding: "3px 8px",
+    borderRadius: 6,
+    fontSize: 12,
+    fontWeight: 600,
+    fontFamily: '"IBM Plex Sans", "Noto Sans", "Liberation Sans", "Segoe UI", sans-serif',
+  },
+
+  authStatusSignedIn: {
+    border: `1px solid ${palette.statusPositiveBorder}`,
+    background: palette.statusPositiveBg,
+    color: palette.statusPositiveText,
+  },
+
+  authStatusSignedOut: {
+    border: `1px solid ${palette.statusNegativeBorder}`,
+    background: palette.statusNegativeBg,
+    color: palette.statusNegativeText,
+  },
+};
--- a/frontend/src/styles/stats/auth.ts
+++ b/frontend/src/styles/stats/auth.ts
@@ -0,0 +1,92 @@
+import { palette } from "./palette";
+import type { StyleMap } from "./types";
+
+export const authStyles: StyleMap = {
+  containerAuth: {
+    maxWidth: 560,
+    margin: "0 auto",
+    padding: "48px 24px",
+  },
+
+  headingXl: {
+    margin: 0,
+    color: palette.textPrimary,
+    fontSize: 28,
+    fontWeight: 600,
+    lineHeight: 1.1,
+  },
+
+  headingBlock: {
+    marginBottom: 22,
+    textAlign: "center",
+  },
+
+  mutedText: {
+    margin: "8px 0 0",
+    color: palette.textSecondary,
+    fontSize: 14,
+  },
+
+  authCard: {
+    padding: 28,
+  },
+
+  authForm: {
+    display: "grid",
+    gap: 12,
+    maxWidth: 380,
+    margin: "0 auto",
+  },
+
+  inputFullWidth: {
+    width: "100%",
+    maxWidth: "100%",
+    boxSizing: "border-box",
+  },
+
+  authControl: {
+    width: "100%",
+    maxWidth: "100%",
+    boxSizing: "border-box",
+  },
+
+  authErrorText: {
+    color: palette.dangerText,
+    margin: "12px auto 0",
+    fontSize: 14,
+    maxWidth: 380,
+    textAlign: "center",
+  },
+
+  authInfoText: {
+    color: palette.successText,
+    margin: "12px auto 0",
+    fontSize: 14,
+    maxWidth: 380,
+    textAlign: "center",
+  },
+
+  authSwitchRow: {
+    marginTop: 16,
+    display: "flex",
+    alignItems: "center",
+    justifyContent: "center",
+    gap: 8,
+    flexWrap: "wrap",
+  },
+
+  authSwitchLabel: {
+    color: palette.textSecondary,
+    fontSize: 14,
+  },
+
+  authSwitchButton: {
+    border: "none",
+    background: "transparent",
+    color: palette.brandGreenBorder,
+    fontSize: 14,
+    fontWeight: 600,
+    cursor: "pointer",
+    padding: 0,
+  },
+};
--- a/frontend/src/styles/stats/cards.ts
+++ b/frontend/src/styles/stats/cards.ts
@@ -0,0 +1,42 @@
+import { palette } from "./palette";
+import type { StyleMap } from "./types";
+
+export const cardStyles: StyleMap = {
+  cardBase: {
+    background: palette.surface,
+    border: `1px solid ${palette.borderDefault}`,
+    borderRadius: 8,
+    padding: 14,
+    boxShadow: `0 1px 0 ${palette.shadowSubtle}`,
+    minHeight: 88,
+  },
+
+  cardTopRow: {
+    display: "flex",
+    justifyContent: "space-between",
+    alignItems: "center",
+    gap: 10,
+  },
+
+  cardLabel: {
+    fontSize: 12,
+    fontWeight: 600,
+    color: palette.textSecondary,
+    letterSpacing: "0.02em",
+    textTransform: "uppercase",
+  },
+
+  cardValue: {
+    fontSize: 24,
+    fontWeight: 700,
+    marginTop: 6,
+    letterSpacing: "-0.02em",
+    color: palette.textPrimary,
+  },
+
+  cardSubLabel: {
+    marginTop: 6,
+    fontSize: 12,
+    color: palette.textSecondary,
+  },
+};
--- a/frontend/src/styles/stats/datasets.ts
+++ b/frontend/src/styles/stats/datasets.ts
@@ -0,0 +1,55 @@
+import { palette } from "./palette";
+import type { StyleMap } from "./types";
+
+export const datasetStyles: StyleMap = {
+  sectionHeaderTitle: {
+    margin: 0,
+    color: palette.textPrimary,
+    fontSize: 28,
+    fontWeight: 600,
+  },
+
+  sectionHeaderSubtitle: {
+    margin: "8px 0 0",
+    color: palette.textSecondary,
+    fontSize: 14,
+  },
+
+  listNoBullets: {
+    listStyle: "none",
+    margin: 0,
+    padding: 0,
+  },
+
+  datasetListItem: {
+    display: "flex",
+    alignItems: "center",
+    justifyContent: "space-between",
+    gap: 12,
+    padding: "14px 16px",
+    borderBottom: `1px solid ${palette.borderMuted}`,
+  },
+
+  datasetName: {
+    fontWeight: 600,
+    color: palette.textPrimary,
+  },
+
+  datasetMeta: {
+    fontSize: 13,
+    color: palette.textSecondary,
+    marginTop: 4,
+  },
+
+  datasetMetaSecondary: {
+    fontSize: 13,
+    color: palette.textSecondary,
+    marginTop: 2,
+  },
+
+  subtleBodyText: {
+    margin: "10px 0 0",
+    fontSize: 13,
+    color: palette.textBody,
+  },
+};
--- a/frontend/src/styles/stats/emotional.ts
+++ b/frontend/src/styles/stats/emotional.ts
@@ -0,0 +1,51 @@
+import { palette } from "./palette";
+import type { StyleMap } from "./types";
+
+export const emotionalStyles: StyleMap = {
+  emotionalSummaryRow: {
+    display: "flex",
+    flexWrap: "wrap",
+    gap: 10,
+    fontSize: 13,
+    color: palette.textTertiary,
+    marginTop: 6,
+  },
+
+  emotionalTopicLabel: {
+    fontSize: 12,
+    fontWeight: 600,
+    color: palette.textSecondary,
+    letterSpacing: "0.02em",
+    textTransform: "uppercase",
+  },
+
+  emotionalTopicValue: {
+    fontSize: 24,
+    fontWeight: 800,
+    marginTop: 4,
+    lineHeight: 1.2,
+  },
+
+  emotionalMetricRow: {
+    display: "flex",
+    justifyContent: "space-between",
+    alignItems: "center",
+    marginTop: 10,
+    fontSize: 13,
+    color: palette.textSecondary,
+  },
+
+  emotionalMetricRowCompact: {
+    display: "flex",
+    justifyContent: "space-between",
+    alignItems: "center",
+    marginTop: 4,
+    fontSize: 13,
+    color: palette.textSecondary,
+  },
+
+  emotionalMetricValue: {
+    fontWeight: 600,
+    color: palette.textPrimary,
+  },
+};
--- a/frontend/src/styles/stats/feedback.ts
+++ b/frontend/src/styles/stats/feedback.ts
@@ -0,0 +1,106 @@
+import { palette } from "./palette";
+import type { StyleMap } from "./types";
+
+export const feedbackStyles: StyleMap = {
+  loadingPage: {
+    width: "100%",
+    minHeight: "100vh",
+    padding: 20,
+    display: "flex",
+    alignItems: "center",
+    justifyContent: "center",
+  },
+
+  loadingCard: {
+    width: "min(560px, 92vw)",
+    background: palette.surface,
+    border: `1px solid ${palette.borderDefault}`,
+    borderRadius: 8,
+    boxShadow: `0 1px 0 ${palette.shadowSubtle}`,
+    padding: 20,
+  },
+
+  loadingHeader: {
+    display: "flex",
+    alignItems: "center",
+    gap: 12,
+  },
+
+  loadingSpinner: {
+    width: 18,
+    height: 18,
+    borderRadius: "50%",
+    border: `2px solid ${palette.borderDefault}`,
+    borderTopColor: palette.brandGreen,
+    animation: "stats-spin 0.9s linear infinite",
+    flexShrink: 0,
+  },
+
+  loadingTitle: {
+    margin: 0,
+    fontSize: 16,
+    fontWeight: 600,
+    color: palette.textPrimary,
+  },
+
+  loadingSubtitle: {
+    margin: "6px 0 0",
+    fontSize: 13,
+    color: palette.textSecondary,
+  },
+
+  loadingSkeleton: {
+    marginTop: 16,
+    display: "grid",
+    gap: 8,
+  },
+
+  loadingSkeletonLine: {
+    height: 9,
+    borderRadius: 999,
+    background: palette.canvas,
+    animation: "stats-pulse 1.25s ease-in-out infinite",
+  },
+
+  loadingSkeletonLineLong: {
+    width: "100%",
+  },
+
+  loadingSkeletonLineMed: {
+    width: "78%",
+  },
+
+  loadingSkeletonLineShort: {
+    width: "62%",
+  },
+
+  alertCardError: {
+    borderColor: palette.alertErrorBorder,
+    background: palette.alertErrorBg,
+    color: palette.alertErrorText,
+    fontSize: 14,
+  },
+
+  alertCardInfo: {
+    borderColor: palette.alertInfoBorder,
+    background: palette.surface,
+    color: palette.textBody,
+    fontSize: 14,
+  },
+
+  statusMessageCard: {
+    marginTop: 12,
+    boxShadow: "none",
+  },
+
+  dashboardMeta: {
+    fontSize: 13,
+    color: palette.textSecondary,
+  },
+
+  tabsRow: {
+    display: "flex",
+    gap: 8,
+    marginTop: 12,
+  },
+};
--- a/frontend/src/styles/stats/foundations.ts
+++ b/frontend/src/styles/stats/foundations.ts
@@ -0,0 +1,167 @@
+import { palette } from "./palette";
+import type { StyleMap } from "./types";
+
+export const foundationStyles: StyleMap = {
+  appShell: {
+    minHeight: "100vh",
+    background: palette.canvas,
+    fontFamily: '"IBM Plex Sans", "Noto Sans", "Liberation Sans", "Segoe UI", sans-serif',
+    color: palette.textPrimary,
+  },
+
+  page: {
+    width: "100%",
+    minHeight: "100vh",
+    padding: 20,
+    background: palette.canvas,
+    fontFamily: '"IBM Plex Sans", "Noto Sans", "Liberation Sans", "Segoe UI", sans-serif',
+    color: palette.textPrimary,
+    overflowX: "hidden",
+    boxSizing: "border-box",
+  },
+
+  container: {
+    maxWidth: 1240,
+    margin: "0 auto",
+  },
+
+  containerWide: {
+    maxWidth: 1100,
+    margin: "0 auto",
+  },
+
+  containerNarrow: {
+    maxWidth: 720,
+    margin: "0 auto",
+  },
+
+  card: {
+    background: palette.surface,
+    borderRadius: 8,
+    padding: 16,
+    border: `1px solid ${palette.borderDefault}`,
+    boxShadow: `0 1px 0 ${palette.shadowSubtle}`,
+  },
+
+  headerBar: {
+    display: "flex",
+    flexWrap: "wrap",
+    alignItems: "center",
+    justifyContent: "space-between",
+    gap: 10,
+  },
+
+  controls: {
+    display: "flex",
+    gap: 8,
+    alignItems: "center",
+  },
+
+  controlsWrapped: {
+    display: "flex",
+    gap: 8,
+    alignItems: "center",
+    flexWrap: "wrap",
+  },
+
+  input: {
+    width: 280,
+    maxWidth: "70vw",
+    padding: "8px 10px",
+    borderRadius: 6,
+    border: `1px solid ${palette.borderDefault}`,
+    outline: "none",
+    fontSize: 14,
+    background: palette.surface,
+    color: palette.textPrimary,
+  },
+
+  buttonPrimary: {
+    padding: "8px 12px",
+    borderRadius: 6,
+    border: `1px solid ${palette.brandGreenBorder}`,
+    background: palette.brandGreen,
+    color: palette.surface,
+    fontWeight: 600,
+    cursor: "pointer",
+    boxShadow: "none",
+  },
+
+  buttonSecondary: {
+    padding: "8px 12px",
+    borderRadius: 6,
+    border: `1px solid ${palette.borderDefault}`,
+    background: palette.canvas,
+    color: palette.textPrimary,
+    fontWeight: 600,
+    cursor: "pointer",
+  },
+
+  buttonDanger: {
+    padding: "8px 12px",
+    borderRadius: 6,
+    border: `1px solid ${palette.borderDefault}`,
+    background: palette.dangerText,
+    color: palette.textPrimary,
+    fontWeight: 600,
+    cursor: "pointer",
+  },
+
+  grid: {
+    marginTop: 12,
+    display: "grid",
+    gridTemplateColumns: "repeat(12, 1fr)",
+    gap: 12,
+  },
+
+  sectionTitle: {
+    margin: 0,
+    fontSize: 17,
+    fontWeight: 600,
+  },
+
+  sectionSubtitle: {
+    margin: "6px 0 14px",
+    fontSize: 13,
+    color: palette.textSecondary,
+  },
+
+  chartWrapper: {
+    width: "100%",
+    height: 350,
+  },
+
+  heatmapWrapper: {
+    width: "100%",
+    height: 320,
+  },
+
+  topUsersList: {
+    display: "flex",
+    flexDirection: "column",
+    gap: 10,
+  },
+
+  topUserItem: {
+    padding: "10px 12px",
+    borderRadius: 8,
+    background: palette.canvas,
+    border: `1px solid ${palette.borderMuted}`,
+  },
+
+  topUserName: {
+    fontWeight: 600,
+    fontSize: 14,
+    color: palette.textPrimary,
+  },
+
+  topUserMeta: {
+    fontSize: 13,
+    color: palette.textSecondary,
+  },
+
+  scrollArea: {
+    maxHeight: 420,
+    overflowY: "auto",
+  },
+};
--- a/frontend/src/styles/stats/modal.ts
+++ b/frontend/src/styles/stats/modal.ts
@@ -0,0 +1,28 @@
+import { palette } from "./palette";
+import type { StyleMap } from "./types";
+
+export const modalStyles: StyleMap = {
+  modalRoot: {
+    position: "relative",
+    zIndex: 50,
+  },
+
+  modalBackdrop: {
+    position: "fixed",
+    inset: 0,
+    background: palette.modalBackdrop,
+  },
+
+  modalContainer: {
+    position: "fixed",
+    inset: 0,
+    display: "flex",
+    alignItems: "center",
+    justifyContent: "center",
+    padding: 16,
+  },
+
+  modalPanel: {
+    width: "min(520px, 95vw)",
+  },
+};
--- a/frontend/src/styles/stats/palette.ts
+++ b/frontend/src/styles/stats/palette.ts
@@ -0,0 +1,26 @@
+export const palette = {
+  canvas: "#f6f8fa",
+  surface: "#ffffff",
+  textPrimary: "#24292f",
+  textSecondary: "#57606a",
+  textTertiary: "#4b5563",
+  textBody: "#374151",
+  borderDefault: "#d0d7de",
+  borderMuted: "#d8dee4",
+  shadowSubtle: "rgba(27, 31, 36, 0.04)",
+  brandGreen: "#2da44e",
+  brandGreenBorder: "#1f883d",
+  statusPositiveBorder: "#b7dfc8",
+  statusPositiveBg: "#edf9f1",
+  statusPositiveText: "#1f6f43",
+  statusNegativeBorder: "#f3c1c1",
+  statusNegativeBg: "#fff2f2",
+  statusNegativeText: "#9a2929",
+  dangerText: "#b91c1c",
+  successText: "#166534",
+  alertErrorBorder: "rgba(185, 28, 28, 0.28)",
+  alertErrorBg: "#fff5f5",
+  alertErrorText: "#991b1b",
+  alertInfoBorder: "rgba(0,0,0,0.06)",
+  modalBackdrop: "rgba(0,0,0,0.45)",
+} as const;
--- a/frontend/src/styles/stats/types.ts
+++ b/frontend/src/styles/stats/types.ts
@@ -0,0 +1,3 @@
+import type { CSSProperties } from "react";
+
+export type StyleMap = Record<string, CSSProperties>;
--- a/frontend/src/styles/stats_styling.tsx
+++ b/frontend/src/styles/stats_styling.tsx
@@ -1,136 +1,22 @@
 import type { CSSProperties } from "react";
+import { appLayoutStyles } from "./stats/appLayout";
+import { authStyles } from "./stats/auth";
+import { cardStyles } from "./stats/cards";
+import { datasetStyles } from "./stats/datasets";
+import { emotionalStyles } from "./stats/emotional";
+import { feedbackStyles } from "./stats/feedback";
+import { foundationStyles } from "./stats/foundations";
+import { modalStyles } from "./stats/modal";

 const StatsStyling: Record<string, CSSProperties> = {
-  page: {
-    width: "100%",
-    minHeight: "100vh",
-    padding: 24,
-    background: "#f6f7fb",
-    fontFamily:
-      '-apple-system, BlinkMacSystemFont, "Segoe UI", Roboto, Inter, Arial, sans-serif',
-    color: "#111827",
-    overflowX: "hidden",
-    boxSizing: "border-box"
-  },
-
-
-  container: {
-    maxWidth: 1400,
-    margin: "0 auto",
-  },
-
-  card: {
-    background: "white",
-    borderRadius: 16,
-    padding: 16,
-    border: "1px solid rgba(0,0,0,0.06)",
-    boxShadow: "0 6px 20px rgba(0,0,0,0.06)",
-  },
-
-  headerBar: {
-    display: "flex",
-    flexWrap: "wrap",
-    alignItems: "center",
-    justifyContent: "space-between",
-    gap: 12,
-  },
-
-  controls: {
-    display: "flex",
-    gap: 10,
-    alignItems: "center",
-  },
-
-  input: {
-    width: 320,
-    maxWidth: "70vw",
-    padding: "10px 12px",
-    borderRadius: 12,
-    border: "1px solid rgba(0,0,0,0.12)",
-    outline: "none",
-    fontSize: 14,
-    background: "#fff",
-    color: "black"
-  },
-
-  buttonPrimary: {
-    padding: "10px 14px",
-    borderRadius: 12,
-    border: "1px solid rgba(0,0,0,0.08)",
-    background: "#2563eb",
-    color: "white",
-    fontWeight: 600,
-    cursor: "pointer",
-    boxShadow: "0 6px 16px rgba(37,99,235,0.25)",
-  },
-
-  buttonSecondary: {
-    padding: "10px 14px",
-    borderRadius: 12,
-    border: "1px solid rgba(0,0,0,0.12)",
-    background: "#fff",
-    color: "#111827",
-    fontWeight: 600,
-    cursor: "pointer",
-  },
-
-  grid: {
-    marginTop: 18,
-    display: "grid",
-    gridTemplateColumns: "repeat(12, 1fr)",
-    gap: 16,
-  },
-
-  sectionTitle: {
-    margin: 0,
-    fontSize: 16,
-    fontWeight: 700,
-  },
-
-  sectionSubtitle: {
-    margin: "6px 0 14px",
-    fontSize: 13,
-    color: "#6b7280",
-  },
-
-  chartWrapper: {
-    width: "100%",
-    height: 350,
-  },
-
-  heatmapWrapper: {
-    width: "100%",
-    height: 320,
-  },
-
-  topUsersList: {
-    display: "flex",
-    flexDirection: "column",
-    gap: 10,
-  },
-
-  topUserItem: {
-    padding: "10px 12px",
-    borderRadius: 12,
-    background: "#f9fafb",
-    border: "1px solid rgba(0,0,0,0.06)",
-  },
-
-  topUserName: {
-    fontWeight: 700,
-    fontSize: 14,
-    color: "black"
-  },
-
-  topUserMeta: {
-    fontSize: 13,
-    color: "#6b7280",
-  },
-
-  scrollArea: {
-    maxHeight: 450,
-    overflowY: "auto",
-  },
+  ...foundationStyles,
+  ...appLayoutStyles,
+  ...authStyles,
+  ...datasetStyles,
+  ...feedbackStyles,
+  ...cardStyles,
+  ...emotionalStyles,
+  ...modalStyles,
 };

-export default StatsStyling;
+export default StatsStyling;
--- a/frontend/src/types/ApiTypes.ts
+++ b/frontend/src/types/ApiTypes.ts
@@ -1,20 +1,28 @@
-// User Responses
-type TopUser = { 
-    author: string; 
-    source: string; 
-    count: number 
+// Shared types
+type FrequencyWord = {
+  word: string;
+  count: number;
 };

-type FrequencyWord = {
-    word: string;
-    count: number;
-}
+type NGram = {
+  count: number;
+  ngram: string;
+};

-type AverageEmotionByTopic = {
-    topic: string;
-    n: number;
-    [emotion: string]: string | number;
-}
+type Emotion = {
+  emotion_anger: number;
+  emotion_disgust: number;
+  emotion_fear: number;
+  emotion_joy: number;
+  emotion_sadness: number;
+};
+
+// User
+type TopUser = {
+  author: string;
+  source: string;
+  count: number;
+};

 type Vocab = {
  author: string;
@@ -26,46 +34,160 @@ type Vocab = {
  top_words: FrequencyWord[];
 };

+type DominantTopic = {
+  topic: string;
+  count: number;
+};
+
 type User = {
  author: string;
  post: number;
  comment: number;
  comment_post_ratio: number;
  comment_share: number;
+  avg_emotions?: Record<string, number>;
+  dominant_topic?: DominantTopic | null;
  vocab?: Vocab | null;
 };

 type InteractionGraph = Record<string, Record<string, number>>;

+type UserEndpointResponse = {
+  top_users: TopUser[];
+  users: User[];
+};
+
 type UserAnalysisResponse = {
  top_users: TopUser[];
  users: User[];
  interaction_graph: InteractionGraph;
 };

-// Time Analysis
+// Time
 type EventsPerDay = {
-    date: Date;
-    count: number;
-}
+  date: Date;
+  count: number;
+};

 type HeatmapCell = {
-    date: Date;
-    hour: number;
-    count: number;
-}
+  date: Date;
+  hour: number;
+  count: number;
+};

 type TimeAnalysisResponse = {
-    events_per_day: EventsPerDay[];
-    weekday_hour_heatmap: HeatmapCell[];
-    burstiness: number;
-}
+  events_per_day: EventsPerDay[];
+  weekday_hour_heatmap: HeatmapCell[];
+};
+
+// Content (combines emotional and linguistic)
+type AverageEmotionByTopic = Emotion & {
+  n: number;
+  topic: string;
+  [key: string]: string | number;
+};
+
+type OverallEmotionAverage = {
+  emotion: string;
+  score: number;
+};
+
+type DominantEmotionDistribution = {
+  emotion: string;
+  count: number;
+  ratio: number;
+};
+
+type EmotionBySource = {
+  source: string;
+  dominant_emotion: string;
+  dominant_score: number;
+  event_count: number;
+};

-// Content Analysis
 type ContentAnalysisResponse = {
-    word_frequencies: FrequencyWord[];
-    average_emotion_by_topic: AverageEmotionByTopic[];
-}
+  word_frequencies: FrequencyWord[];
+  average_emotion_by_topic: AverageEmotionByTopic[];
+  common_three_phrases: NGram[];
+  common_two_phrases: NGram[];
+  overall_emotion_average?: OverallEmotionAverage[];
+  dominant_emotion_distribution?: DominantEmotionDistribution[];
+  emotion_by_source?: EmotionBySource[];
+};
+
+// Linguistic
+type LinguisticAnalysisResponse = {
+  word_frequencies: FrequencyWord[];
+  common_two_phrases: NGram[];
+  common_three_phrases: NGram[];
+  lexical_diversity?: Record<string, number>;
+};
+
+// Emotional
+type EmotionalAnalysisResponse = {
+  average_emotion_by_topic: AverageEmotionByTopic[];
+  overall_emotion_average?: OverallEmotionAverage[];
+  dominant_emotion_distribution?: DominantEmotionDistribution[];
+  emotion_by_source?: EmotionBySource[];
+};
+
+// Interactional
+type ConversationConcentration = {
+  total_commenting_authors: number;
+  top_10pct_author_count: number;
+  top_10pct_comment_share: number;
+  single_comment_authors: number;
+  single_comment_author_ratio: number;
+};
+
+type InteractionAnalysisResponse = {
+  top_interaction_pairs?: [[string, string], number][];
+  conversation_concentration?: ConversationConcentration;
+  interaction_graph: InteractionGraph;
+};
+
+// Cultural
+type IdentityMarkers = {
+  in_group_usage: number;
+  out_group_usage: number;
+  in_group_ratio: number;
+  out_group_ratio: number;
+  in_group_posts: number;
+  out_group_posts: number;
+  tie_posts: number;
+  in_group_emotion_avg?: Record<string, number>;
+  out_group_emotion_avg?: Record<string, number>;
+};
+
+type StanceMarkers = {
+  hedge_total: number;
+  certainty_total: number;
+  deontic_total: number;
+  permission_total: number;
+  hedge_per_1k_tokens: number;
+  certainty_per_1k_tokens: number;
+  deontic_per_1k_tokens: number;
+  permission_per_1k_tokens: number;
+  hedge_emotion_avg?: Record<string, number>;
+  certainty_emotion_avg?: Record<string, number>;
+  deontic_emotion_avg?: Record<string, number>;
+  permission_emotion_avg?: Record<string, number>;
+};
+
+type EntityEmotionAggregate = {
+  post_count: number;
+  emotion_avg: Record<string, number>;
+};
+
+type AverageEmotionPerEntity = {
+  entity_emotion_avg: Record<string, EntityEmotionAggregate>;
+};
+
+type CulturalAnalysisResponse = {
+  identity_markers?: IdentityMarkers;
+  stance_markers?: StanceMarkers;
+  avg_emotion_per_entity?: AverageEmotionPerEntity;
+};

 // Summary
 type SummaryResponse = {
@@ -82,22 +204,36 @@ type SummaryResponse = {
  sources: string[];
 };

-// Filtering Response
+// Filter
 type FilterResponse = {
-    rows: number
-    data: any;
-}
+  rows: number;
+  data: any;
+};

 export type {
-    TopUser,
-    Vocab,
-    User,
-    InteractionGraph,
-    UserAnalysisResponse,
-    FrequencyWord,
-    AverageEmotionByTopic,
-    SummaryResponse,
-    TimeAnalysisResponse,
-    ContentAnalysisResponse,
-    FilterResponse
-}
+  TopUser,
+  DominantTopic,
+  Vocab,
+  User,
+  InteractionGraph,
+  ConversationConcentration,
+  UserAnalysisResponse,
+  UserEndpointResponse,
+  FrequencyWord,
+  AverageEmotionByTopic,
+  OverallEmotionAverage,
+  DominantEmotionDistribution,
+  EmotionBySource,
+  SummaryResponse,
+  TimeAnalysisResponse,
+  ContentAnalysisResponse,
+  LinguisticAnalysisResponse,
+  EmotionalAnalysisResponse,
+  InteractionAnalysisResponse,
+  IdentityMarkers,
+  StanceMarkers,
+  EntityEmotionAggregate,
+  AverageEmotionPerEntity,
+  CulturalAnalysisResponse,
+  FilterResponse,
+};
--- a/frontend/src/utils/corpusExplorer.ts
+++ b/frontend/src/utils/corpusExplorer.ts
@@ -0,0 +1,371 @@
+type EntityRecord = {
+  text?: string;
+  [key: string]: unknown;
+};
+
+type DatasetRecord = {
+  id?: string | number;
+  post_id?: string | number | null;
+  parent_id?: string | number | null;
+  author?: string | null;
+  title?: string | null;
+  content?: string | null;
+  timestamp?: string | number | null;
+  date?: string | null;
+  dt?: string | null;
+  hour?: number | null;
+  weekday?: string | null;
+  reply_to?: string | number | null;
+  source?: string | null;
+  topic?: string | null;
+  topic_confidence?: number | null;
+  type?: string | null;
+  ner_entities?: EntityRecord[] | null;
+  emotion_anger?: number | null;
+  emotion_disgust?: number | null;
+  emotion_fear?: number | null;
+  emotion_joy?: number | null;
+  emotion_sadness?: number | null;
+  [key: string]: unknown;
+};
+
+type CorpusExplorerContext = {
+  authorByPostId: Map<string, string>;
+  authorEventCounts: Map<string, number>;
+  authorCommentCounts: Map<string, number>;
+};
+
+type CorpusExplorerSpec = {
+  title: string;
+  description: string;
+  emptyMessage?: string;
+  matcher: (record: DatasetRecord, context: CorpusExplorerContext) => boolean;
+};
+
+const IN_GROUP_PATTERN = /\b(we|us|our|ourselves)\b/gi;
+const OUT_GROUP_PATTERN = /\b(they|them|their|themselves)\b/gi;
+const HEDGE_PATTERN = /\b(maybe|perhaps|possibly|probably|likely|seems|seem|i think|i feel|i guess|kind of|sort of|somewhat)\b/i;
+const CERTAINTY_PATTERN = /\b(definitely|certainly|clearly|obviously|undeniably|always|never)\b/i;
+const DEONTIC_PATTERN = /\b(must|should|need|needs|have to|has to|ought|required|require)\b/i;
+const PERMISSION_PATTERN = /\b(can|allowed|okay|ok|permitted)\b/i;
+const EMOTION_KEYS = [
+  "emotion_anger",
+  "emotion_disgust",
+  "emotion_fear",
+  "emotion_joy",
+  "emotion_sadness",
+] as const;
+
+const toText = (value: unknown) => {
+  if (typeof value === "string") {
+    return value;
+  }
+
+  if (typeof value === "number" || typeof value === "boolean") {
+    return String(value);
+  }
+
+  if (value && typeof value === "object" && "id" in value) {
+    const id = (value as { id?: unknown }).id;
+    if (typeof id === "string" || typeof id === "number") {
+      return String(id);
+    }
+  }
+
+  return "";
+};
+
+const normalize = (value: unknown) => toText(value).trim().toLowerCase();
+const getAuthor = (record: DatasetRecord) => toText(record.author).trim();
+
+const getRecordText = (record: DatasetRecord) =>
+  `${record.title ?? ""} ${record.content ?? ""}`.trim();
+
+const escapeRegExp = (value: string) =>
+  value.replace(/[.*+?^${}()|[\]\\]/g, "\\$&");
+
+const buildPhrasePattern = (phrase: string) => {
+  const tokens = phrase
+    .toLowerCase()
+    .trim()
+    .split(/\s+/)
+    .filter(Boolean)
+    .map(escapeRegExp);
+
+  if (!tokens.length) {
+    return null;
+  }
+
+  return new RegExp(`\\b${tokens.join("\\s+")}\\b`, "i");
+};
+
+const countMatches = (pattern: RegExp, text: string) =>
+  Array.from(text.matchAll(new RegExp(pattern.source, "gi"))).length;
+
+const getDateBucket = (record: DatasetRecord) => {
+  if (typeof record.date === "string" && record.date) {
+    return record.date.slice(0, 10);
+  }
+
+  if (typeof record.dt === "string" && record.dt) {
+    return record.dt.slice(0, 10);
+  }
+
+  if (typeof record.timestamp === "number") {
+    return new Date(record.timestamp * 1000).toISOString().slice(0, 10);
+  }
+
+  if (typeof record.timestamp === "string" && record.timestamp) {
+    const numeric = Number(record.timestamp);
+    if (Number.isFinite(numeric)) {
+      return new Date(numeric * 1000).toISOString().slice(0, 10);
+    }
+  }
+
+  return "";
+};
+
+const getDominantEmotion = (record: DatasetRecord) => {
+  let bestKey = "";
+  let bestValue = Number.NEGATIVE_INFINITY;
+
+  for (const key of EMOTION_KEYS) {
+    const value = Number(record[key] ?? Number.NEGATIVE_INFINITY);
+    if (value > bestValue) {
+      bestValue = value;
+      bestKey = key;
+    }
+  }
+
+  return bestKey.replace("emotion_", "");
+};
+
+const matchesPhrase = (record: DatasetRecord, phrase: string) => {
+  const pattern = buildPhrasePattern(phrase);
+  if (!pattern) {
+    return false;
+  }
+
+  return pattern.test(getRecordText(record));
+};
+
+const recordIdentityBucket = (record: DatasetRecord) => {
+  const text = getRecordText(record);
+  const inHits = countMatches(IN_GROUP_PATTERN, text);
+  const outHits = countMatches(OUT_GROUP_PATTERN, text);
+
+  if (inHits > outHits) {
+    return "in";
+  }
+
+  if (outHits > inHits) {
+    return "out";
+  }
+
+  return "tie";
+};
+
+const buildExplorerContext = (records: DatasetRecord[]): CorpusExplorerContext => {
+  const authorByPostId = new Map<string, string>();
+  const authorEventCounts = new Map<string, number>();
+  const authorCommentCounts = new Map<string, number>();
+
+  for (const record of records) {
+    const author = getAuthor(record);
+    if (!author) {
+      continue;
+    }
+
+    authorEventCounts.set(author, (authorEventCounts.get(author) ?? 0) + 1);
+
+    if (record.type === "comment") {
+      authorCommentCounts.set(author, (authorCommentCounts.get(author) ?? 0) + 1);
+    }
+
+    if (record.post_id !== null && record.post_id !== undefined) {
+      authorByPostId.set(String(record.post_id), author);
+    }
+  }
+
+  return { authorByPostId, authorEventCounts, authorCommentCounts };
+};
+
+const buildAllRecordsSpec = (): CorpusExplorerSpec => ({
+  title: "Corpus Explorer",
+  description: "All records in the current filtered dataset.",
+  emptyMessage: "No records match the current filters.",
+  matcher: () => true,
+});
+
+const buildUserSpec = (author: string): CorpusExplorerSpec => {
+  const target = normalize(author);
+
+  return {
+    title: `User: ${author}`,
+    description: `All records authored by ${author}.`,
+    emptyMessage: `No records found for ${author}.`,
+    matcher: (record) => normalize(record.author) === target,
+  };
+};
+
+const buildTopicSpec = (topic: string): CorpusExplorerSpec => {
+  const target = normalize(topic);
+
+  return {
+    title: `Topic: ${topic}`,
+    description: `Records assigned to the ${topic} topic bucket.`,
+    emptyMessage: `No records found in the ${topic} topic bucket.`,
+    matcher: (record) => normalize(record.topic) === target,
+  };
+};
+
+const buildDateBucketSpec = (date: string): CorpusExplorerSpec => ({
+  title: `Date Bucket: ${date}`,
+  description: `Records from the ${date} activity bucket.`,
+  emptyMessage: `No records found on ${date}.`,
+  matcher: (record) => getDateBucket(record) === date,
+});
+
+const buildWordSpec = (word: string): CorpusExplorerSpec => ({
+  title: `Word: ${word}`,
+  description: `Records containing the word ${word}.`,
+  emptyMessage: `No records mention ${word}.`,
+  matcher: (record) => matchesPhrase(record, word),
+});
+
+const buildNgramSpec = (ngram: string): CorpusExplorerSpec => ({
+  title: `N-gram: ${ngram}`,
+  description: `Records containing the phrase ${ngram}.`,
+  emptyMessage: `No records contain the phrase ${ngram}.`,
+  matcher: (record) => matchesPhrase(record, ngram),
+});
+
+const buildEntitySpec = (entity: string): CorpusExplorerSpec => {
+  const target = normalize(entity);
+
+  return {
+    title: `Entity: ${entity}`,
+    description: `Records mentioning the ${entity} entity.`,
+    emptyMessage: `No records found for the ${entity} entity.`,
+    matcher: (record) => {
+      const entities = Array.isArray(record.ner_entities) ? record.ner_entities : [];
+      return entities.some((item) => normalize(item?.text) === target) || matchesPhrase(record, entity);
+    },
+  };
+};
+
+const buildSourceSpec = (source: string): CorpusExplorerSpec => {
+  const target = normalize(source);
+
+  return {
+    title: `Source: ${source}`,
+    description: `Records from the ${source} source.`,
+    emptyMessage: `No records found for ${source}.`,
+    matcher: (record) => normalize(record.source) === target,
+  };
+};
+
+const buildDominantEmotionSpec = (emotion: string): CorpusExplorerSpec => {
+  const target = normalize(emotion);
+
+  return {
+    title: `Dominant Emotion: ${emotion}`,
+    description: `Records where ${emotion} is the strongest emotion score.`,
+    emptyMessage: `No records found with dominant emotion ${emotion}.`,
+    matcher: (record) => getDominantEmotion(record) === target,
+  };
+};
+
+const buildReplyPairSpec = (source: string, target: string): CorpusExplorerSpec => {
+  const sourceName = normalize(source);
+  const targetName = normalize(target);
+
+  return {
+    title: `Reply Path: ${source} -> ${target}`,
+    description: `Reply records authored by ${source} in response to ${target}.`,
+    emptyMessage: `No reply records found for ${source} -> ${target}.`,
+    matcher: (record, context) => {
+      if (normalize(record.author) !== sourceName) {
+        return false;
+      }
+
+      const replyTo = record.reply_to;
+      if (replyTo === null || replyTo === undefined || replyTo === "") {
+        return false;
+      }
+
+      return normalize(context.authorByPostId.get(String(replyTo))) === targetName;
+    },
+  };
+};
+
+const buildOneTimeUsersSpec = (): CorpusExplorerSpec => ({
+  title: "One-Time Users",
+  description: "Records written by authors who appear exactly once in the filtered corpus.",
+  emptyMessage: "No one-time-user records found.",
+  matcher: (record, context) => {
+    const author = getAuthor(record);
+    return !!author && context.authorEventCounts.get(author) === 1;
+  },
+});
+
+const buildIdentityBucketSpec = (bucket: "in" | "out" | "tie"): CorpusExplorerSpec => {
+  const labels = {
+    in: "In-Group Posts",
+    out: "Out-Group Posts",
+    tie: "Balanced Posts",
+  } as const;
+
+  return {
+    title: labels[bucket],
+    description: `Records in the ${labels[bucket].toLowerCase()} cultural bucket.`,
+    emptyMessage: `No records found for ${labels[bucket].toLowerCase()}.`,
+    matcher: (record) => recordIdentityBucket(record) === bucket,
+  };
+};
+
+const buildPatternSpec = (
+  title: string,
+  description: string,
+  pattern: RegExp,
+): CorpusExplorerSpec => ({
+  title,
+  description,
+  emptyMessage: `No records found for ${title.toLowerCase()}.`,
+  matcher: (record) => pattern.test(getRecordText(record)),
+});
+
+const buildHedgeSpec = () =>
+  buildPatternSpec("Hedging Words", "Records containing hedging language.", HEDGE_PATTERN);
+
+const buildCertaintySpec = () =>
+  buildPatternSpec("Certainty Words", "Records containing certainty language.", CERTAINTY_PATTERN);
+
+const buildDeonticSpec = () =>
+  buildPatternSpec("Need/Should Words", "Records containing deontic language.", DEONTIC_PATTERN);
+
+const buildPermissionSpec = () =>
+  buildPatternSpec("Permission Words", "Records containing permission language.", PERMISSION_PATTERN);
+
+export type { DatasetRecord, CorpusExplorerSpec };
+export {
+  buildAllRecordsSpec,
+  buildCertaintySpec,
+  buildDateBucketSpec,
+  buildDeonticSpec,
+  buildDominantEmotionSpec,
+  buildEntitySpec,
+  buildExplorerContext,
+  buildHedgeSpec,
+  buildIdentityBucketSpec,
+  buildNgramSpec,
+  buildOneTimeUsersSpec,
+  buildPermissionSpec,
+  buildReplyPairSpec,
+  buildSourceSpec,
+  buildTopicSpec,
+  buildUserSpec,
+  buildWordSpec,
+  getDateBucket,
+  toText,
+};
--- a/frontend/src/utils/documentTitle.ts
+++ b/frontend/src/utils/documentTitle.ts
@@ -0,0 +1,20 @@
+const DEFAULT_TITLE = "Ethnograph View";
+
+const STATIC_TITLES: Record<string, string> = {
+  "/login": "Sign In",
+  "/upload": "Upload Dataset",
+  "/auto-fetch": "Auto Fetch Dataset",
+  "/datasets": "My Datasets",
+};
+
+export const getDocumentTitle = (pathname: string) => {
+  if (pathname.includes("status")) {
+    return "Processing Dataset";
+  }
+
+  if (pathname.includes("stats")) {
+    return "Ethnography Analysis";
+  }
+
+  return STATIC_TITLES[pathname] ?? DEFAULT_TITLE;
+};
--- a/main.py
+++ b/main.py
@@ -1,4 +0,0 @@
-import server.app
-
-if __name__ == "__main__":
-    server.app.app.run(debug=True)
--- a/report/img/analysis_bar.png
+++ b/report/img/analysis_bar.png
--- a/report/img/architecture.png
+++ b/report/img/architecture.png
--- a/report/img/cork_temporal.png
+++ b/report/img/cork_temporal.png
--- a/report/img/flooding_posts.png
+++ b/report/img/flooding_posts.png
--- a/report/img/frontend.png
+++ b/report/img/frontend.png
--- a/report/img/gantt.png
+++ b/report/img/gantt.png
--- a/report/img/heatmap.png
+++ b/report/img/heatmap.png
--- a/report/img/interaction_graph.png
+++ b/report/img/interaction_graph.png
--- a/report/img/kpi_card.png
+++ b/report/img/kpi_card.png
--- a/report/img/moods.png
+++ b/report/img/moods.png
--- a/report/img/navbar.png
+++ b/report/img/navbar.png
--- a/report/img/ngrams.png
+++ b/report/img/ngrams.png
--- a/report/img/nlp_backoff.png
+++ b/report/img/nlp_backoff.png
--- a/report/img/pipeline.png
+++ b/report/img/pipeline.png
--- a/report/img/reddit_bot.png
+++ b/report/img/reddit_bot.png
--- a/report/img/schema.png
+++ b/report/img/schema.png
--- a/report/img/signature.jpg
+++ b/report/img/signature.jpg
--- a/report/img/stance_markers.png
+++ b/report/img/stance_markers.png
--- a/report/img/topic_emotions.png
+++ b/report/img/topic_emotions.png
--- a/report/img/ucc_crest.png
+++ b/report/img/ucc_crest.png
--- a/report/main.tex
+++ b/report/main.tex
--- a/report/references.bib
+++ b/report/references.bib
@@ -0,0 +1,149 @@
+@online{reddit_api,
+  author  = {{Reddit Inc.}},
+  title   = {Reddit API Documentation},
+  year    = {2025},
+  url     = {https://www.reddit.com/dev/api/},
+  urldate = {2026-04-08}
+}
+
+@misc{hartmann2022emotionenglish,
+  author={Hartmann, Jochen},
+  title={Emotion English DistilRoBERTa-base},
+  year={2022},
+  howpublished = {\url{https://huggingface.co/j-hartmann/emotion-english-distilroberta-base/}},
+}
+
+@misc{all_mpnet_base_v2,
+  author={Microsoft Research},
+  title={All-MPNet-Base-V2},
+  year={2021},
+  howpublished = {\url{https://huggingface.co/sentence-transformers/all-mpnet-base-v2}},
+}
+
+@misc{minilm_l6_v2,
+  author={Microsoft Research},
+  title={MiniLM-L6-V2},
+  year={2021},
+  howpublished = {\url{https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2}},
+}
+
+@misc{dslim_bert_base_ner,
+  author={deepset},
+  title={dslim/bert-base-NER},
+  year={2018},
+  howpublished = {\url{https://huggingface.co/dslim/bert-base-NER}},
+}
+
+@inproceedings{demszky2020goemotions,
+ author = {Demszky, Dorottya and Movshovitz-Attias, Dana and Ko, Jeongwoo and Cowen, Alan and Nemade, Gaurav and Ravi, Sujith},
+ booktitle = {58th Annual Meeting of the Association for Computational Linguistics (ACL)},
+ title = {{GoEmotions: A Dataset of Fine-Grained Emotions}},
+ year = {2020}
+}
+
+@article{dominguez2007virtual,
+  author    = {Domínguez, Daniel and Beaulieu, Anne and Estalella, Adolfo and Gómez, Edgar and Schnettler, Bernt and Read, Rosie},
+  title     = {Virtual Ethnography},
+  journal   = {Forum Qualitative Sozialforschung / Forum: Qualitative Social Research},
+  year      = {2007},
+  volume    = {8},
+  number    = {3},
+  url       = {http://nbn-resolving.de/urn:nbn:de:0114-fqs0703E19}
+}
+
+@article{sun2014lurkers,
+  author  = {Sun, Na and Rau, Pei-Luen Patrick and Ma, Liang},
+  title   = {Understanding Lurkers in Online Communities: A Literature Review},
+  journal = {Computers in Human Behavior},
+  year    = {2014},
+  volume  = {38},
+  pages   = {110--117},
+  doi     = {10.1016/j.chb.2014.05.022}
+}
+
+@article{ahmad2024sentiment,
+  author  = {Ahmad, Waqar and others},
+  title   = {Recent Advancements and Challenges of NLP-based Sentiment Analysis: A State-of-the-art Review},
+  journal = {Natural Language Processing Journal},
+  year    = {2024},
+  doi     = {10.1016/j.nlp.2024.100059}
+}
+
+@article{coleman2010ethnographic,
+  ISSN = {00846570},
+  URL = {http://www.jstor.org/stable/25735124},
+  abstract = {This review surveys and divides the ethnographic corpus on digital media into three broad but overlapping categories: the cultural politics of digital media, the vernacular cultures of digital media, and the prosaics of digital media. Engaging these three categories of scholarship on digital media, I consider how ethnographers are exploring the complex relationships between the local practices and global implications of digital media, their materiality and politics, and thier banal, as well as profound, presence in cultural life and modes of communication. I consider the way these media have become central to the articulation of cherished beliefs, ritual practices, and modes of being in the world; the fact that digital media culturally matters is undeniable but showing how, where, and why it matters is necessary to push against peculiarly narrow presumptions about the universality of digital experience.},
+  author = {E. Gabriella Coleman},
+  journal = {Annual Review of Anthropology},
+  pages = {487--505},
+  publisher = {Annual Reviews},
+  title = {Ethnographic Approaches to Digital Media},
+  urldate = {2026-04-15},
+  volume = {39},
+  year = {2010}
+}
+
+@article{shen2021stance,
+  author  = {Shen, Qian and Tao, Yating},
+  title   = {Stance Markers in {English} Medical Research Articles and Newspaper Opinion Columns: A Comparative Corpus-Based Study},
+  journal = {PLOS ONE},
+  volume  = {16},
+  number  = {3},
+  pages   = {e0247981},
+  year    = {2021},
+  doi     = {10.1371/journal.pone.0247981}
+}
+
+@incollection{medvedev2019anatomy,
+  author    = {Medvedev, Alexey N. and Lambiotte, Renaud and Delvenne, Jean-Charles},
+  title     = {The Anatomy of Reddit: An Overview of Academic Research},
+  booktitle = {Dynamics On and Of Complex Networks III},
+  series    = {Springer Proceedings in Complexity},
+  publisher = {Springer},
+  year      = {2019},
+  pages     = {183--204}
+}
+
+@misc{cook2023ethnography,
+  author       = {Cook, Chloe},
+  title        = {What is the Difference Between Ethnography and Digital Ethnography?},
+  year         = {2023},
+  month        = jan,
+  day          = {19},
+  howpublished = {\url{https://ethosapp.com/blog/what-is-the-difference-between-ethnography-and-digital-ethnography/}},
+  note         = {Accessed: 2026-04-16},
+  organization = {EthOS}
+}
+
+@misc{giuffre2026sentiment,
+  author       = {Giuffre, Steven},
+  title        = {What is Sentiment Analysis?},
+  year         = {2026},
+  month        = mar,
+  howpublished = {\url{https://www.vonage.com/resources/articles/sentiment-analysis/}},
+  note         = {Accessed: 2026-04-16},
+  organization = {Vonage}
+}
+
+@misc{mungalpara2022stemming,
+  author       = {Mungalpara, Jaimin},
+  title        = {Stemming Lemmatization Stopwords and {N}-Grams in {NLP}},
+  year         = {2022},
+  month        = jul,
+  day          = {26},
+  howpublished = {\url{https://jaimin-ml2001.medium.com/stemming-lemmatization-stopwords-and-n-grams-in-nlp-96f8e8b6aa6f}},
+  note         = {Accessed: 2026-04-16},
+  organization = {Medium}
+}
+
+@misc{chugani2025ethicalscraping,
+  author       = {Chugani, Vinod},
+  title        = {Ethical Web Scraping: Principles and Practices},
+  year         = {2025},
+  month        = apr,
+  day          = {21},
+  howpublished = {\url{https://www.datacamp.com/blog/ethical-web-scraping}},
+  note         = {Accessed: 2026-04-16},
+  organization = {DataCamp}
+}
+
--- a/requirements.txt
+++ b/requirements.txt
@@ -1,14 +1,19 @@
 beautifulsoup4==4.14.3
+celery==5.6.2
+redis==7.2.1
 Flask==3.1.3
+Flask_Bcrypt==1.0.1
 flask_cors==6.0.2
+Flask_JWT_Extended==4.7.1
 google_api_python_client==2.188.0
 nltk==3.9.2
 numpy==2.4.2
 pandas==3.0.1
 psycopg2==2.9.11
 psycopg2_binary==2.9.11
-python-dotenv==1.2.1
+python-dotenv==1.2.2
 Requests==2.32.5
 sentence_transformers==5.2.2
 torch==2.10.0
 transformers==5.1.0
+gunicorn==25.3.0
--- a/server/analysis/cultural.py
+++ b/server/analysis/cultural.py
@@ -1,7 +1,6 @@
 import pandas as pd
 import re

-from collections import Counter
 from typing import Any


@@ -14,21 +13,21 @@ class CulturalAnalysis:
        df = original_df.copy()
        s = df[self.content_col].fillna("").astype(str).str.lower()

-        in_group_words = {"we", "us", "our", "ourselves"}
-        out_group_words = {"they", "them", "their", "themselves"}
-
        emotion_exclusions = {"emotion_neutral", "emotion_surprise"}
        emotion_cols = [
-            c for c in df.columns
+            c
+            for c in df.columns
            if c.startswith("emotion_") and c not in emotion_exclusions
        ]

        # Tokenize per row
-        tokens_per_row = s.apply(lambda txt: re.findall(r"\b[a-z]{2,}\b", txt))
+        in_pattern = re.compile(r"\b(we|us|our|ourselves)\b")
+        out_pattern = re.compile(r"\b(they|them|their|themselves)\b")
+        token_pattern = re.compile(r"\b[a-z]{2,}\b")

-        total_tokens = int(tokens_per_row.map(len).sum())
-        in_hits = tokens_per_row.map(lambda toks: sum(t in in_group_words for t in toks)).astype(int)
-        out_hits = tokens_per_row.map(lambda toks: sum(t in out_group_words for t in toks)).astype(int)
+        in_hits = s.str.count(in_pattern)
+        out_hits = s.str.count(out_pattern)
+        total_tokens = s.str.count(token_pattern).sum()

        in_count = int(in_hits.sum())
        out_count = int(out_hits.sum())
@@ -42,7 +41,6 @@ class CulturalAnalysis:
            "out_group_usage": out_count,
            "in_group_ratio": round(in_count / max(total_tokens, 1), 5),
            "out_group_ratio": round(out_count / max(total_tokens, 1), 5),
-
            "in_group_posts": int(in_mask.sum()),
            "out_group_posts": int(out_mask.sum()),
            "tie_posts": int(tie_mask.sum()),
@@ -51,101 +49,131 @@ class CulturalAnalysis:
        if emotion_cols:
            emo = df[emotion_cols].apply(pd.to_numeric, errors="coerce").fillna(0.0)

-            in_avg = emo.loc[in_mask].mean() if in_mask.any() else pd.Series(0.0, index=emotion_cols)
-            out_avg = emo.loc[out_mask].mean() if out_mask.any() else pd.Series(0.0, index=emotion_cols)
+            in_avg = (
+                emo.loc[in_mask].mean()
+                if in_mask.any()
+                else pd.Series(0.0, index=emotion_cols)
+            )
+            out_avg = (
+                emo.loc[out_mask].mean()
+                if out_mask.any()
+                else pd.Series(0.0, index=emotion_cols)
+            )

            result["in_group_emotion_avg"] = in_avg.to_dict()
            result["out_group_emotion_avg"] = out_avg.to_dict()

        return result
-    
+
    def get_stance_markers(self, df: pd.DataFrame) -> dict[str, Any]:
        s = df[self.content_col].fillna("").astype(str)
+        emotion_exclusions = {"emotion_neutral", "emotion_surprise"}
+        emotion_cols = [
+            c
+            for c in df.columns
+            if c.startswith("emotion_") and c not in emotion_exclusions
+        ]

-        hedges = {
-            "maybe", "perhaps", "possibly", "probably", "likely", "seems", "seem",
-            "i think", "i feel", "i guess", "kind of", "sort of", "somewhat"
-        }
-        certainty = {
-            "definitely", "certainly", "clearly", "obviously", "undeniably", "always", "never"
-        }
+        hedge_pattern = re.compile(
+            r"\b(maybe|perhaps|possibly|probably|likely|seems|seem|i think|i feel|i guess|kind of|sort of|somewhat)\b"
+        )
+        certainty_pattern = re.compile(
+            r"\b(definitely|certainly|clearly|obviously|undeniably|always|never)\b"
+        )
+        deontic_pattern = re.compile(
+            r"\b(must|should|need|needs|have to|has to|ought|required|require)\b"
+        )
+        permission_pattern = re.compile(r"\b(can|allowed|okay|ok|permitted)\b")

-        deontic = {
-            "must", "should", "need", "needs", "have to", "has to", "ought", "required", "require"
-        }
+        hedge_counts = s.str.count(hedge_pattern)
+        certainty_counts = s.str.count(certainty_pattern)
+        deontic_counts = s.str.count(deontic_pattern)
+        perm_counts = s.str.count(permission_pattern)

-        permission = {"can", "allowed", "okay", "ok", "permitted"}
+        token_counts = s.apply(lambda t: len(re.findall(r"\b[a-z]{2,}\b", t))).replace(
+            0, 1
+        )

-        def count_phrases(text: str, phrases: set[str]) -> int:
-            c = 0
-            for p in phrases:
-                if " " in p:
-                    c += len(re.findall(r"\b" + re.escape(p) + r"\b", text))
-                else:
-                    c += len(re.findall(r"\b" + re.escape(p) + r"\b", text))
-            return c
-
-        hedge_counts = s.apply(lambda t: count_phrases(t, hedges))
-        certainty_counts = s.apply(lambda t: count_phrases(t, certainty))
-        deontic_counts = s.apply(lambda t: count_phrases(t, deontic))
-        perm_counts = s.apply(lambda t: count_phrases(t, permission))
-
-        token_counts = s.apply(lambda t: len(re.findall(r"\b[a-z]{2,}\b", t))).replace(0, 1)
-
-        return {
+        result = {
            "hedge_total": int(hedge_counts.sum()),
            "certainty_total": int(certainty_counts.sum()),
            "deontic_total": int(deontic_counts.sum()),
            "permission_total": int(perm_counts.sum()),
-            "hedge_per_1k_tokens": round(1000 * hedge_counts.sum() / token_counts.sum(), 3),
-            "certainty_per_1k_tokens": round(1000 * certainty_counts.sum() / token_counts.sum(), 3),
-            "deontic_per_1k_tokens": round(1000 * deontic_counts.sum() / token_counts.sum(), 3),
-            "permission_per_1k_tokens": round(1000 * perm_counts.sum() / token_counts.sum(), 3),
+            "hedge_per_1k_tokens": round(
+                1000 * hedge_counts.sum() / token_counts.sum(), 3
+            ),
+            "certainty_per_1k_tokens": round(
+                1000 * certainty_counts.sum() / token_counts.sum(), 3
+            ),
+            "deontic_per_1k_tokens": round(
+                1000 * deontic_counts.sum() / token_counts.sum(), 3
+            ),
+            "permission_per_1k_tokens": round(
+                1000 * perm_counts.sum() / token_counts.sum(), 3
+            ),
        }
-    
-    def get_avg_emotions_per_entity(self, df: pd.DataFrame, top_n: int = 25, min_posts: int = 10) -> dict[str, Any]:
-        if "entities" not in df.columns:
+
+        if emotion_cols:
+            emo = df[emotion_cols].apply(pd.to_numeric, errors="coerce").fillna(0.0)
+
+            result["hedge_emotion_avg"] = (
+                emo.loc[hedge_counts > 0].mean()
+                if (hedge_counts > 0).any()
+                else pd.Series(0.0, index=emotion_cols)
+            ).to_dict()
+            result["certainty_emotion_avg"] = (
+                emo.loc[certainty_counts > 0].mean()
+                if (certainty_counts > 0).any()
+                else pd.Series(0.0, index=emotion_cols)
+            ).to_dict()
+            result["deontic_emotion_avg"] = (
+                emo.loc[deontic_counts > 0].mean()
+                if (deontic_counts > 0).any()
+                else pd.Series(0.0, index=emotion_cols)
+            ).to_dict()
+            result["permission_emotion_avg"] = (
+                emo.loc[perm_counts > 0].mean()
+                if (perm_counts > 0).any()
+                else pd.Series(0.0, index=emotion_cols)
+            ).to_dict()
+
+        return result
+
+    def get_avg_emotions_per_entity(
+        self, df: pd.DataFrame, top_n: int = 25, min_posts: int = 10
+    ) -> dict[str, Any]:
+        if "ner_entities" not in df.columns:
            return {"entity_emotion_avg": {}}

        emotion_cols = [c for c in df.columns if c.startswith("emotion_")]
-        entity_counter = Counter()

-        for row in df["entities"].dropna():
-            if isinstance(row, list):
-                for ent in row:
-                    if isinstance(ent, dict):
-                        text = ent.get("text")
-                        if isinstance(text, str):
-                            text = text.strip()
-                            if len(text) >= 3:  # filter short junk
-                                entity_counter[text] += 1
+        entity_df = df[["ner_entities"] + emotion_cols].explode("ner_entities")

-        top_entities = entity_counter.most_common(top_n)
+        entity_df["entity_text"] = entity_df["ner_entities"].apply(
+            lambda e: (
+                e.get("text").strip()
+                if isinstance(e, dict)
+                and isinstance(e.get("text"), str)
+                and len(e.get("text")) >= 3
+                else None
+            )
+        )

+        entity_df = entity_df.dropna(subset=["entity_text"])
+        entity_counts = entity_df["entity_text"].value_counts().head(top_n)
        entity_emotion_avg = {}

-        for entity_text, _ in top_entities:
-            mask = df["entities"].apply(
-                lambda ents: isinstance(ents, list) and
-                any(isinstance(e, dict) and e.get("text") == entity_text for e in ents)
-            )
-
-            post_count = int(mask.sum())
-
-            if post_count >= min_posts:
+        for entity_text, count in entity_counts.items():
+            if count >= min_posts:
                emo_means = (
-                    df.loc[mask, emotion_cols]
-                    .apply(pd.to_numeric, errors="coerce")
-                    .fillna(0.0)
+                    entity_df[entity_df["entity_text"] == entity_text][emotion_cols]
                    .mean()
                    .to_dict()
                )

                entity_emotion_avg[entity_text] = {
-                    "post_count": post_count,
-                    "emotion_avg": emo_means
+                    "post_count": int(count),
+                    "emotion_avg": emo_means,
                }

-        return {
-            "entity_emotion_avg": entity_emotion_avg
-        }
+        return {"entity_emotion_avg": entity_emotion_avg}
--- a/server/analysis/emotional.py
+++ b/server/analysis/emotional.py
@@ -1,33 +1,86 @@
 import pandas as pd

+
 class EmotionalAnalysis:
-    def avg_emotion_by_topic(self, df: pd.DataFrame) -> dict:
-        emotion_cols = [
-            col for col in df.columns
-            if col.startswith("emotion_")
-        ]
+    def _emotion_cols(self, df: pd.DataFrame) -> list[str]:
+        return [col for col in df.columns if col.startswith("emotion_")]
+
+    def avg_emotion_by_topic(self, df: pd.DataFrame) -> list[dict]:
+        emotion_cols = self._emotion_cols(df)
+
+        if not emotion_cols:
+            return []

        counts = (
-            df[
-                (df["topic"] != "Misc")
-            ]
-            .groupby("topic")
-            .size()
-            .rename("n")
+            df[(df["topic"] != "Misc")].groupby("topic").size().reset_index(name="n")
        )

        avg_emotion_by_topic = (
-            df[
-                (df["topic"] != "Misc")
-            ]
+            df[(df["topic"] != "Misc")]
            .groupby("topic")[emotion_cols]
            .mean()
            .reset_index()
        )

-        avg_emotion_by_topic = avg_emotion_by_topic.merge(
-            counts,
-            on="topic"
-        )
+        avg_emotion_by_topic = avg_emotion_by_topic.merge(counts, on="topic")

-        return avg_emotion_by_topic.to_dict(orient='records')
+        return avg_emotion_by_topic.to_dict(orient="records")
+
+    def overall_emotion_average(self, df: pd.DataFrame) -> list[dict]:
+        emotion_cols = self._emotion_cols(df)
+
+        if not emotion_cols:
+            return []
+
+        means = df[emotion_cols].mean()
+        return [
+            {
+                "emotion": col.replace("emotion_", ""),
+                "score": float(means[col]),
+            }
+            for col in emotion_cols
+        ]
+
+    def dominant_emotion_distribution(self, df: pd.DataFrame) -> list[dict]:
+        emotion_cols = self._emotion_cols(df)
+
+        if not emotion_cols or df.empty:
+            return []
+
+        dominant_per_row = df[emotion_cols].idxmax(axis=1)
+        counts = dominant_per_row.value_counts()
+        total = max(len(dominant_per_row), 1)
+
+        return [
+            {
+                "emotion": col.replace("emotion_", ""),
+                "count": int(count),
+                "ratio": round(float(count / total), 4),
+            }
+            for col, count in counts.items()
+        ]
+
+    def emotion_by_source(self, df: pd.DataFrame) -> list[dict]:
+        emotion_cols = self._emotion_cols(df)
+
+        if not emotion_cols or "source" not in df.columns or df.empty:
+            return []
+
+        source_counts = df.groupby("source").size()
+        source_means = df.groupby("source")[emotion_cols].mean().reset_index()
+        rows = source_means.to_dict(orient="records")
+        output = []
+
+        for row in rows:
+            source = row["source"]
+            dominant_col = max(emotion_cols, key=lambda col: float(row.get(col, 0)))
+            output.append(
+                {
+                    "source": str(source),
+                    "dominant_emotion": dominant_col.replace("emotion_", ""),
+                    "dominant_score": round(float(row.get(dominant_col, 0)), 4),
+                    "event_count": int(source_counts.get(source, 0)),
+                }
+            )
+
+        return output
--- a/server/analysis/enrichment.py
+++ b/server/analysis/enrichment.py
@@ -2,15 +2,18 @@ import pandas as pd

 from server.analysis.nlp import NLP

-class DatasetProcessor:
-    def __init__(self, df, topics):
+
+class DatasetEnrichment:
+    def __init__(self, df: pd.DataFrame, topics: dict):
        self.df = self._explode_comments(df)
        self.topics = topics
        self.nlp = NLP(self.df, "title", "content", self.topics)

    def _explode_comments(self, df) -> pd.DataFrame:
        comments_df = df[["id", "comments"]].explode("comments")
-        comments_df = comments_df[comments_df["comments"].apply(lambda x: isinstance(x, dict))]
+        comments_df = comments_df[
+            comments_df["comments"].apply(lambda x: isinstance(x, dict))
+        ]
        comments_df = pd.json_normalize(comments_df["comments"])

        posts_df = df.drop(columns=["comments"])
@@ -24,16 +27,16 @@ class DatasetProcessor:
        df.drop(columns=["post_id"], inplace=True, errors="ignore")

        return df
-    
+
    def enrich(self) -> pd.DataFrame:
-        self.df['timestamp'] = pd.to_numeric(self.df['timestamp'], errors='raise')
-        self.df['date'] = pd.to_datetime(self.df['timestamp'], unit='s').dt.date
+        self.df["timestamp"] = pd.to_numeric(self.df["timestamp"], errors="raise")
+        self.df["date"] = pd.to_datetime(self.df["timestamp"], unit="s").dt.date
        self.df["dt"] = pd.to_datetime(self.df["timestamp"], unit="s", utc=True)
        self.df["hour"] = self.df["dt"].dt.hour
        self.df["weekday"] = self.df["dt"].dt.day_name()
-        
+
        self.nlp.add_emotion_cols()
        self.nlp.add_topic_col()
        self.nlp.add_ner_cols()

-        return self.df
+        return self.df
--- a/server/analysis/interactional.py
+++ b/server/analysis/interactional.py
@@ -1,8 +1,6 @@
 import pandas as pd
 import re

-from collections import Counter
-

 class InteractionAnalysis:
    def __init__(self, word_exclusions: set[str]):
@@ -12,123 +10,11 @@ class InteractionAnalysis:
        tokens = re.findall(r"\b[a-z]{3,}\b", text)
        return [t for t in tokens if t not in self.word_exclusions]

-    def _vocab_richness_per_user(
-        self, df: pd.DataFrame, min_words: int = 20, top_most_used_words: int = 100
-    ) -> list:
-        df = df.copy()
-        df["content"] = df["content"].fillna("").astype(str).str.lower()
-        df["tokens"] = df["content"].apply(self._tokenize)
-
-        rows = []
-        for author, group in df.groupby("author"):
-            all_tokens = [t for tokens in group["tokens"] for t in tokens]
-
-            total_words = len(all_tokens)
-            unique_words = len(set(all_tokens))
-            events = len(group)
-
-            # Min amount of words for a user, any less than this might give weird results
-            if total_words < min_words:
-                continue
-
-            # 100% = they never reused a word (excluding stop words)
-            vocab_richness = unique_words / total_words
-            avg_words = total_words / max(events, 1)
-
-            counts = Counter(all_tokens)
-            top_words = [
-                {"word": w, "count": int(c)}
-                for w, c in counts.most_common(top_most_used_words)
-            ]
-
-            rows.append(
-                {
-                    "author": author,
-                    "events": int(events),
-                    "total_words": int(total_words),
-                    "unique_words": int(unique_words),
-                    "vocab_richness": round(vocab_richness, 3),
-                    "avg_words_per_event": round(avg_words, 2),
-                    "top_words": top_words,
-                }
-            )
-
-        rows = sorted(rows, key=lambda x: x["vocab_richness"], reverse=True)
-
-        return rows
-
-    def top_users(self, df: pd.DataFrame) -> list:
-        counts = df.groupby(["author", "source"]).size().sort_values(ascending=False)
-
-        top_users = [
-            {"author": author, "source": source, "count": int(count)}
-            for (author, source), count in counts.items()
-        ]
-
-        return top_users
-
-    def per_user_analysis(self, df: pd.DataFrame) -> dict:
-        per_user = df.groupby(["author", "type"]).size().unstack(fill_value=0)
-
-        emotion_cols = [col for col in df.columns if col.startswith("emotion_")]
-
-        avg_emotions_by_author = {}
-        if emotion_cols:
-            avg_emotions = df.groupby("author")[emotion_cols].mean().fillna(0.0)
-            avg_emotions_by_author = {
-                author: {emotion: float(score) for emotion, score in row.items()}
-                for author, row in avg_emotions.iterrows()
-            }
-
-        # ensure columns always exist
-        for col in ("post", "comment"):
-            if col not in per_user.columns:
-                per_user[col] = 0
-
-        per_user["comment_post_ratio"] = per_user["comment"] / per_user["post"].replace(
-            0, 1
-        )
-        per_user["comment_share"] = per_user["comment"] / (
-            per_user["post"] + per_user["comment"]
-        ).replace(0, 1)
-        per_user = per_user.sort_values("comment_post_ratio", ascending=True)
-        per_user_records = per_user.reset_index().to_dict(orient="records")
-
-        vocab_rows = self._vocab_richness_per_user(df)
-        vocab_by_author = {row["author"]: row for row in vocab_rows}
-
-        # merge vocab richness + per_user information
-        merged_users = []
-        for row in per_user_records:
-            author = row["author"]
-            merged_users.append(
-                {
-                    "author": author,
-                    "post": int(row.get("post", 0)),
-                    "comment": int(row.get("comment", 0)),
-                    "comment_post_ratio": float(row.get("comment_post_ratio", 0)),
-                    "comment_share": float(row.get("comment_share", 0)),
-                    "avg_emotions": avg_emotions_by_author.get(author, {}),
-                    "vocab": vocab_by_author.get(
-                        author,
-                        {
-                            "vocab_richness": 0,
-                            "avg_words_per_event": 0,
-                            "top_words": [],
-                        },
-                    ),
-                }
-            )
-
-        merged_users.sort(key=lambda u: u["comment_post_ratio"])
-
-        return merged_users
-
    def interaction_graph(self, df: pd.DataFrame):
        interactions = {a: {} for a in df["author"].dropna().unique()}

-        # reply_to refers to the comment id, this allows us to map comment ids to usernames
-        id_to_author = df.set_index("id")["author"].to_dict()
+        # reply_to refers to the comment id, this allows us to map comment/post ids to usernames
+        id_to_author = df.set_index("post_id")["author"].to_dict()

        for _, row in df.iterrows():
            a = row["author"]
@@ -145,89 +31,40 @@ class InteractionAnalysis:

        return interactions

-    def average_thread_depth(self, df: pd.DataFrame):
-        depths = []
-        id_to_reply = df.set_index("id")["reply_to"].to_dict()
-        for _, row in df.iterrows():
-            depth = 0
-            current_id = row["id"]
+    def top_interaction_pairs(self, df: pd.DataFrame, top_n=10):
+        graph = self.interaction_graph(df)
+        pairs = []

-            while True:
-                reply_to = id_to_reply.get(current_id)
-                if pd.isna(reply_to) or reply_to == "":
-                    break
+        for a, targets in graph.items():
+            for b, count in targets.items():
+                pairs.append(((a, b), count))

-                depth += 1
-                current_id = reply_to
+        pairs.sort(key=lambda x: x[1], reverse=True)
+        return pairs[:top_n]

-            depths.append(depth)
+    def conversation_concentration(self, df: pd.DataFrame) -> dict:
+        if "type" not in df.columns:
+            return {}

-        if not depths:
-            return 0
+        comments = df[df["type"] == "comment"]
+        if comments.empty:
+            return {}

-        return round(sum(depths) / len(depths), 2)
+        author_counts = comments["author"].value_counts()
+        total_comments = len(comments)
+        total_authors = len(author_counts)

-    def average_thread_length_by_emotion(self, df: pd.DataFrame):
-        emotion_exclusions = {"emotion_neutral", "emotion_surprise"}
-
-        emotion_cols = [
-            c
-            for c in df.columns
-            if c.startswith("emotion_") and c not in emotion_exclusions
-        ]
-
-        id_to_reply = df.set_index("id")["reply_to"].to_dict()
-        length_cache = {}
-
-        def thread_length_from(start_id):
-            if start_id in length_cache:
-                return length_cache[start_id]
-
-            seen = set()
-            length = 1
-            current = start_id
-
-            while True:
-                if current in seen:
-                    # infinite loop shouldn't happen, but just in case
-                    break
-                seen.add(current)
-
-                reply_to = id_to_reply.get(current)
-
-                if (
-                    reply_to is None
-                    or (isinstance(reply_to, float) and pd.isna(reply_to))
-                    or reply_to == ""
-                ):
-                    break
-
-                length += 1
-                current = reply_to
-
-                if current in length_cache:
-                    length += length_cache[current] - 1
-                    break
-
-            length_cache[start_id] = length
-            return length
-
-        emotion_to_lengths = {}
-
-        # Fill NaNs in emotion cols to avoid max() issues
-        emo_df = df[["id"] + emotion_cols].copy()
-        emo_df[emotion_cols] = emo_df[emotion_cols].fillna(0)
-
-        for _, row in emo_df.iterrows():
-            msg_id = row["id"]
-            length = thread_length_from(msg_id)
-
-            emotions = {c: row[c] for c in emotion_cols}
-            dominant = max(emotions, key=emotions.get)
-
-            emotion_to_lengths.setdefault(dominant, []).append(length)
+        top_10_pct_n = max(1, int(total_authors * 0.1))
+        top_10_pct_share = round(
+            author_counts.head(top_10_pct_n).sum() / total_comments, 4
+        )

        return {
-            emotion: round(sum(lengths) / len(lengths), 2)
-            for emotion, lengths in emotion_to_lengths.items()
+            "total_commenting_authors": total_authors,
+            "top_10pct_author_count": top_10_pct_n,
+            "top_10pct_comment_share": float(top_10_pct_share),
+            "single_comment_authors": int((author_counts == 1).sum()),
+            "single_comment_author_ratio": float(
+                round((author_counts == 1).sum() / total_authors, 4)
+            ),
        }
--- a/server/analysis/linguistic.py
+++ b/server/analysis/linguistic.py
@@ -1,17 +1,30 @@
-import pandas as pd
 import re
-
 from collections import Counter
-from itertools import islice
+from dataclasses import dataclass
+
+import pandas as pd
+
+
+@dataclass(frozen=True)
+class NGramConfig:
+    min_token_length: int = 3
+    min_count: int = 2
+    max_results: int = 100


 class LinguisticAnalysis:
    def __init__(self, word_exclusions: set[str]):
        self.word_exclusions = word_exclusions
+        self.ngram_config = NGramConfig()

-    def _tokenize(self, text: str):
-        tokens = re.findall(r"\b[a-z]{3,}\b", text)
-        return [t for t in tokens if t not in self.word_exclusions]
+    def _tokenize(self, text: str, *, include_exclusions: bool = False) -> list[str]:
+        pattern = rf"\b[a-z]{{{self.ngram_config.min_token_length},}}\b"
+        tokens = re.findall(pattern, text)
+
+        if include_exclusions:
+            return tokens
+
+        return [token for token in tokens if token not in self.word_exclusions]

    def _clean_text(self, text: str) -> str:
        text = re.sub(r"http\S+", "", text)  # remove URLs
@@ -21,13 +34,24 @@ class LinguisticAnalysis:
        text = re.sub(r"\S+\.(jpg|jpeg|png|webp|gif)", "", text)
        return text

+    def _content_texts(self, df: pd.DataFrame) -> pd.Series:
+        return df["content"].dropna().astype(str).apply(self._clean_text).str.lower()
+
+    def _valid_ngram(self, tokens: tuple[str, ...]) -> bool:
+        if any(token in self.word_exclusions for token in tokens):
+            return False
+
+        if len(set(tokens)) == 1:
+            return False
+
+        return True
+
    def word_frequencies(self, df: pd.DataFrame, limit: int = 100) -> list[dict]:
-        texts = df["content"].dropna().astype(str).str.lower()
+        texts = self._content_texts(df)

        words = []
        for text in texts:
-            tokens = re.findall(r"\b[a-z]{3,}\b", text)
-            words.extend(w for w in tokens if w not in self.word_exclusions)
+            words.extend(self._tokenize(text))

        counts = Counter(words)

@@ -40,24 +64,57 @@ class LinguisticAnalysis:

        return word_frequencies.to_dict(orient="records")

-    def ngrams(self, df: pd.DataFrame, n=2, limit=100):
-        texts = df["content"].dropna().astype(str).apply(self._clean_text).str.lower()
+    def ngrams(self, df: pd.DataFrame, n: int = 2, limit: int | None = None) -> list[dict]:
+        if n < 2:
+            raise ValueError("n must be at least 2")
+
+        texts = self._content_texts(df)
        all_ngrams = []
+        result_limit = limit or self.ngram_config.max_results

        for text in texts:
-            tokens = re.findall(r"\b[a-z]{3,}\b", text)
+            tokens = self._tokenize(text, include_exclusions=True)

-            # stop word removal causes strange behaviors in ngrams
-            # tokens = [w for w in tokens if w not in self.word_exclusions]
+            if len(tokens) < n:
+                continue

-            ngrams = zip(*(islice(tokens, i, None) for i in range(n)))
-            all_ngrams.extend([" ".join(ng) for ng in ngrams])
+            for index in range(len(tokens) - n + 1):
+                ngram_tokens = tuple(tokens[index : index + n])
+                if self._valid_ngram(ngram_tokens):
+                    all_ngrams.append(" ".join(ngram_tokens))

        counts = Counter(all_ngrams)
+        filtered_counts = [
+            (ngram, count)
+            for ngram, count in counts.items()
+            if count >= self.ngram_config.min_count
+        ]
+
+        if not filtered_counts:
+            return []

        return (
-            pd.DataFrame(counts.items(), columns=["ngram", "count"])
-            .sort_values("count", ascending=False)
-            .head(limit)
+            pd.DataFrame(filtered_counts, columns=["ngram", "count"])
+            .sort_values(["count", "ngram"], ascending=[False, True])
+            .head(result_limit)
            .to_dict(orient="records")
        )
+
+    def lexical_diversity(self, df: pd.DataFrame) -> dict:
+        tokens = (
+            df["content"]
+            .fillna("")
+            .astype(str)
+            .str.lower()
+            .str.findall(r"\b[a-z]{2,}\b")
+            .explode()
+        )
+        tokens = tokens[~tokens.isin(self.word_exclusions)]
+        total = max(len(tokens), 1)
+        unique = int(tokens.nunique())
+
+        return {
+            "total_tokens": total,
+            "unique_tokens": unique,
+            "ttr": round(unique / total, 4),
+        }
--- a/server/analysis/nlp.py
+++ b/server/analysis/nlp.py
@@ -6,6 +6,7 @@ from typing import Any
 from transformers import pipeline
 from sentence_transformers import SentenceTransformer

+
 class NLP:
    _topic_models: dict[str, SentenceTransformer] = {}
    _emotion_classifiers: dict[str, Any] = {}
@@ -32,7 +33,7 @@ class NLP:
            )
            self.entity_recognizer = self._get_entity_recognizer(
                self.device_str, self.pipeline_device
-            )           
+            )
        except RuntimeError as exc:
            if self.use_cuda and "out of memory" in str(exc).lower():
                torch.cuda.empty_cache()
@@ -90,7 +91,7 @@ class NLP:
            )
            cls._emotion_classifiers[device_str] = classifier
        return classifier
-    
+
    @classmethod
    def _get_entity_recognizer(cls, device_str: str, pipeline_device: int) -> Any:
        recognizer = cls._entity_recognizers.get(device_str)
@@ -207,8 +208,7 @@ class NLP:
        self.df.drop(columns=existing_drop, inplace=True)

        remaining_emotion_cols = [
-            c for c in self.df.columns
-            if c.startswith("emotion_")
+            c for c in self.df.columns if c.startswith("emotion_")
        ]

        if remaining_emotion_cols:
@@ -227,8 +227,6 @@ class NLP:

            self.df[remaining_emotion_cols] = normalized.values

-        
-
    def add_topic_col(self, confidence_threshold: float = 0.3) -> None:
        titles = self.df[self.title_col].fillna("").astype(str)
        contents = self.df[self.content_col].fillna("").astype(str)
@@ -257,7 +255,7 @@ class NLP:
        self.df.loc[self.df["topic_confidence"] < confidence_threshold, "topic"] = (
            "Misc"
        )
-        
+
    def add_ner_cols(self, max_chars: int = 512) -> None:
        texts = (
            self.df[self.content_col]
@@ -302,8 +300,4 @@ class NLP:

        for label in all_labels:
            col_name = f"entity_{label}"
-            self.df[col_name] = [
-                d.get(label, 0) for d in entity_count_dicts
-            ]
-
-
+            self.df[col_name] = [d.get(label, 0) for d in entity_count_dicts]
--- a/server/analysis/stat_gen.py
+++ b/server/analysis/stat_gen.py
@@ -0,0 +1,189 @@
+import nltk
+import json
+import pandas as pd
+from nltk.corpus import stopwords
+
+from server.analysis.cultural import CulturalAnalysis
+from server.analysis.emotional import EmotionalAnalysis
+from server.analysis.interactional import InteractionAnalysis
+from server.analysis.linguistic import LinguisticAnalysis
+from server.analysis.summary import SummaryAnalysis
+from server.analysis.temporal import TemporalAnalysis
+from server.analysis.user import UserAnalysis
+
+DOMAIN_STOPWORDS = {
+    "www",
+    "https",
+    "http",
+    "boards",
+    "boardsie",
+    "comment",
+    "comments",
+    "discussion",
+    "thread",
+    "post",
+    "posts",
+    "would",
+    "get",
+    "one",
+}
+
+EXCLUDED_AUTHORS = {"[deleted]", "automoderator"}
+
+nltk.download("stopwords")
+EXCLUDE_WORDS = set(stopwords.words("english")) | DOMAIN_STOPWORDS
+
+
+class StatGen:
+    def __init__(self) -> None:
+        self.temporal_analysis = TemporalAnalysis()
+        self.emotional_analysis = EmotionalAnalysis()
+        self.interaction_analysis = InteractionAnalysis(EXCLUDE_WORDS)
+        self.linguistic_analysis = LinguisticAnalysis(EXCLUDE_WORDS)
+        self.cultural_analysis = CulturalAnalysis()
+        self.summary_analysis = SummaryAnalysis()
+        self.user_analysis = UserAnalysis(EXCLUDE_WORDS)
+
+    ## Private Methods
+    def _prepare_filtered_df(self, df: pd.DataFrame, filters: dict | None = None) -> pd.DataFrame:
+        filters = filters or {}
+        filtered_df = df.copy()
+
+        if "author" in filtered_df.columns:
+            normalized_authors = (
+                filtered_df["author"].fillna("").astype(str).str.strip().str.lower()
+            )
+            filtered_df = filtered_df[~normalized_authors.isin(EXCLUDED_AUTHORS)]
+
+        search_query = filters.get("search_query", None)
+        start_date_filter = filters.get("start_date", None)
+        end_date_filter = filters.get("end_date", None)
+        data_source_filter = filters.get("data_sources", None)
+
+        if search_query:
+            mask = filtered_df["content"].str.contains(
+                search_query, case=False, na=False
+            ) | filtered_df["author"].str.contains(search_query, case=False, na=False)
+
+            # Only include title if the column exists
+            if "title" in filtered_df.columns:
+                mask = mask | filtered_df["title"].str.contains(
+                    search_query, case=False, na=False, regex=False
+                )
+
+            filtered_df = filtered_df[mask]
+
+        if start_date_filter:
+            filtered_df = filtered_df[(filtered_df["dt"] >= start_date_filter)]
+
+        if end_date_filter:
+            filtered_df = filtered_df[(filtered_df["dt"] <= end_date_filter)]
+
+        if data_source_filter:
+            filtered_df = filtered_df[filtered_df["source"].isin(data_source_filter)]
+
+        return filtered_df
+
+    def _json_ready_records(self, df: pd.DataFrame) -> list[dict]:
+        return json.loads(
+            df.to_json(orient="records", date_format="iso", date_unit="s")
+        )
+
+    ## Public Methods
+    def filter_dataset(self, df: pd.DataFrame, filters: dict | None = None) -> list[dict]:
+        filtered_df = self._prepare_filtered_df(df, filters)
+        return self._json_ready_records(filtered_df)
+
+    def temporal(
+        self,
+        df: pd.DataFrame,
+        filters: dict | None = None,
+        dataset_id: int | None = None,
+    ) -> dict:
+        filtered_df = self._prepare_filtered_df(df, filters)
+
+        return {
+            "events_per_day": self.temporal_analysis.posts_per_day(filtered_df),
+            "weekday_hour_heatmap": self.temporal_analysis.heatmap(filtered_df),
+        }
+
+    def linguistic(
+        self,
+        df: pd.DataFrame,
+        filters: dict | None = None,
+        dataset_id: int | None = None,
+    ) -> dict:
+        filtered_df = self._prepare_filtered_df(df, filters)
+
+        return {
+            "word_frequencies": self.linguistic_analysis.word_frequencies(filtered_df),
+            "common_two_phrases": self.linguistic_analysis.ngrams(filtered_df),
+            "common_three_phrases": self.linguistic_analysis.ngrams(filtered_df, n=3),
+            "lexical_diversity": self.linguistic_analysis.lexical_diversity(filtered_df)
+        }
+
+    def emotional(
+        self,
+        df: pd.DataFrame,
+        filters: dict | None = None,
+        dataset_id: int | None = None,
+    ) -> dict:
+        filtered_df = self._prepare_filtered_df(df, filters)
+
+        return {
+            "average_emotion_by_topic": self.emotional_analysis.avg_emotion_by_topic(filtered_df),
+            "overall_emotion_average": self.emotional_analysis.overall_emotion_average(filtered_df),
+            "dominant_emotion_distribution": self.emotional_analysis.dominant_emotion_distribution(filtered_df),
+            "emotion_by_source": self.emotional_analysis.emotion_by_source(filtered_df)
+        }
+
+    def user(
+        self,
+        df: pd.DataFrame,
+        filters: dict | None = None,
+        dataset_id: int | None = None,
+    ) -> dict:
+        filtered_df = self._prepare_filtered_df(df, filters)
+
+        return {
+            "top_users": self.user_analysis.top_users(filtered_df),
+            "users": self.user_analysis.per_user_analysis(filtered_df)
+        }
+
+    def interactional(
+        self,
+        df: pd.DataFrame,
+        filters: dict | None = None,
+        dataset_id: int | None = None,
+    ) -> dict:
+        filtered_df = self._prepare_filtered_df(df, filters)
+
+        return {
+            "top_interaction_pairs": self.interaction_analysis.top_interaction_pairs(filtered_df, top_n=100),
+            "interaction_graph": self.interaction_analysis.interaction_graph(filtered_df),
+            "conversation_concentration": self.interaction_analysis.conversation_concentration(filtered_df)
+        }
+
+    def cultural(
+        self,
+        df: pd.DataFrame,
+        filters: dict | None = None,
+        dataset_id: int | None = None,
+    ) -> dict:
+        filtered_df = self._prepare_filtered_df(df, filters)
+
+        return {
+            "identity_markers": self.cultural_analysis.get_identity_markers(filtered_df),
+            "stance_markers": self.cultural_analysis.get_stance_markers(filtered_df),
+            "avg_emotion_per_entity": self.cultural_analysis.get_avg_emotions_per_entity(filtered_df)
+        }
+
+    def summary(
+        self,
+        df: pd.DataFrame,
+        filters: dict | None = None,
+        dataset_id: int | None = None,
+    ) -> dict:
+        filtered_df = self._prepare_filtered_df(df, filters)
+
+        return self.summary_analysis.summary(filtered_df)
--- a/server/analysis/summary.py
+++ b/server/analysis/summary.py
@@ -0,0 +1,64 @@
+import pandas as pd
+
+
+class SummaryAnalysis:
+    def total_events(self, df: pd.DataFrame) -> int:
+        return int(len(df))
+
+    def total_posts(self, df: pd.DataFrame) -> int:
+        return int(len(df[df["type"] == "post"]))
+
+    def total_comments(self, df: pd.DataFrame) -> int:
+        return int(len(df[df["type"] == "comment"]))
+
+    def unique_users(self, df: pd.DataFrame) -> int:
+        return int(len(df["author"].dropna().unique()))
+
+    def comments_per_post(self, total_comments: int, total_posts: int) -> float:
+        return round(total_comments / max(total_posts, 1), 2)
+
+    def lurker_ratio(self, df: pd.DataFrame) -> float:
+        events_per_user = df.groupby("author").size()
+        return round((events_per_user == 1).mean(), 2)
+
+    def time_range(self, df: pd.DataFrame) -> dict:
+        return {
+            "start": int(df["dt"].min().timestamp()),
+            "end": int(df["dt"].max().timestamp()),
+        }
+
+    def sources(self, df: pd.DataFrame) -> list:
+        return df["source"].dropna().unique().tolist()
+
+    def empty_summary(self) -> dict:
+        return {
+            "total_events": 0,
+            "total_posts": 0,
+            "total_comments": 0,
+            "unique_users": 0,
+            "comments_per_post": 0,
+            "lurker_ratio": 0,
+            "time_range": {
+                "start": None,
+                "end": None,
+            },
+            "sources": [],
+        }
+
+    def summary(self, df: pd.DataFrame) -> dict:
+        if df.empty:
+            return self.empty_summary()
+
+        total_posts = self.total_posts(df)
+        total_comments = self.total_comments(df)
+
+        return {
+            "total_events": self.total_events(df),
+            "total_posts": total_posts,
+            "total_comments": total_comments,
+            "unique_users": self.unique_users(df),
+            "comments_per_post": self.comments_per_post(total_comments, total_posts),
+            "lurker_ratio": self.lurker_ratio(df),
+            "time_range": self.time_range(df),
+            "sources": self.sources(df),
+        }
--- a/server/analysis/user.py
+++ b/server/analysis/user.py
@@ -0,0 +1,152 @@
+import pandas as pd
+import re
+
+from collections import Counter
+
+
+class UserAnalysis:
+    def __init__(self, word_exclusions: set[str]):
+        self.word_exclusions = word_exclusions
+
+    def _tokenize(self, text: str):
+        tokens = re.findall(r"\b[a-z]{3,}\b", text)
+        return [t for t in tokens if t not in self.word_exclusions]
+
+    def _vocab_richness_per_user(
+        self, df: pd.DataFrame, min_words: int = 20, top_most_used_words: int = 100
+    ) -> list:
+        df = df.copy()
+        df["content"] = df["content"].fillna("").astype(str).str.lower()
+        df["tokens"] = df["content"].apply(self._tokenize)
+
+        rows = []
+        for author, group in df.groupby("author"):
+            all_tokens = [t for tokens in group["tokens"] for t in tokens]
+
+            total_words = len(all_tokens)
+            unique_words = len(set(all_tokens))
+            events = len(group)
+
+            # Min amount of words for a user, any less than this might give weird results
+            if total_words < min_words:
+                continue
+
+            # 100% = they never reused a word (excluding stop words)
+            vocab_richness = unique_words / total_words
+            avg_words = total_words / max(events, 1)
+
+            counts = Counter(all_tokens)
+            top_words = [
+                {"word": w, "count": int(c)}
+                for w, c in counts.most_common(top_most_used_words)
+            ]
+
+            rows.append(
+                {
+                    "author": author,
+                    "events": int(events),
+                    "total_words": int(total_words),
+                    "unique_words": int(unique_words),
+                    "vocab_richness": round(vocab_richness, 3),
+                    "avg_words_per_event": round(avg_words, 2),
+                    "top_words": top_words,
+                }
+            )
+
+        rows = sorted(rows, key=lambda x: x["vocab_richness"], reverse=True)
+
+        return rows
+
+    def top_users(self, df: pd.DataFrame) -> list:
+        counts = df.groupby(["author", "source"]).size().sort_values(ascending=False)
+
+        top_users = [
+            {"author": author, "source": source, "count": int(count)}
+            for (author, source), count in counts.items()
+        ]
+
+        return top_users
+
+    def per_user_analysis(self, df: pd.DataFrame) -> dict:
+        per_user = df.groupby(["author", "type"]).size().unstack(fill_value=0)
+
+        emotion_cols = [col for col in df.columns if col.startswith("emotion_")]
+        dominant_topic_by_author = {}
+
+        avg_emotions_by_author = {}
+        if emotion_cols:
+            avg_emotions = df.groupby("author")[emotion_cols].mean().fillna(0.0)
+            avg_emotions_by_author = {
+                author: {emotion: float(score) for emotion, score in row.items()}
+                for author, row in avg_emotions.iterrows()
+            }
+
+        if "topic" in df.columns:
+            topic_df = df[
+                df["topic"].notna()
+                & (df["topic"] != "")
+                & (df["topic"] != "Misc")
+            ]
+            if not topic_df.empty:
+                topic_counts = (
+                    topic_df.groupby(["author", "topic"])
+                    .size()
+                    .reset_index(name="count")
+                    .sort_values(
+                        ["author", "count", "topic"],
+                        ascending=[True, False, True],
+                    )
+                    .drop_duplicates(subset=["author"])
+                )
+                dominant_topic_by_author = {
+                    row["author"]: {
+                        "topic": row["topic"],
+                        "count": int(row["count"]),
+                    }
+                    for _, row in topic_counts.iterrows()
+                }
+
+        # ensure columns always exist
+        for col in ("post", "comment"):
+            if col not in per_user.columns:
+                per_user[col] = 0
+
+        per_user["comment_post_ratio"] = per_user["comment"] / per_user["post"].replace(
+            0, 1
+        )
+        per_user["comment_share"] = per_user["comment"] / (
+            per_user["post"] + per_user["comment"]
+        ).replace(0, 1)
+        per_user = per_user.sort_values("comment_post_ratio", ascending=True)
+        per_user_records = per_user.reset_index().to_dict(orient="records")
+
+        vocab_rows = self._vocab_richness_per_user(df)
+        vocab_by_author = {row["author"]: row for row in vocab_rows}
+
+        # merge vocab richness + per_user information
+        merged_users = []
+        for row in per_user_records:
+            author = row["author"]
+            merged_users.append(
+                {
+                    "author": author,
+                    "post": int(row.get("post", 0)),
+                    "comment": int(row.get("comment", 0)),
+                    "comment_post_ratio": float(row.get("comment_post_ratio", 0)),
+                    "comment_share": float(row.get("comment_share", 0)),
+                    "avg_emotions": avg_emotions_by_author.get(author, {}),
+                    "dominant_topic": dominant_topic_by_author.get(author),
+                    "vocab": vocab_by_author.get(
+                        author,
+                        {
+                            "vocab_richness": 0,
+                            "avg_words_per_event": 0,
+                            "top_words": [],
+                        },
+                    ),
+                }
+            )
+
+        merged_users.sort(key=lambda u: u["comment_post_ratio"])
+
+        return merged_users
--- a/server/app.py
+++ b/server/app.py
@@ -1,4 +1,7 @@
 import os
+import pandas as pd
+import traceback
+import json

 from dotenv import load_dotenv
 from flask import Flask, jsonify, request
@@ -11,22 +14,22 @@ from flask_jwt_extended import (
    get_jwt_identity,
 )

-from server.stat_gen import StatGen
-from server.dataset_processor import DatasetProcessor
-from db.database import PostgresConnector
-from server.auth import AuthManager
-
-import pandas as pd
-import traceback
-import json
+from server.analysis.stat_gen import StatGen
+from server.exceptions import NotAuthorisedException, NonExistentDatasetException
+from server.db.database import PostgresConnector
+from server.core.auth import AuthManager
+from server.core.datasets import DatasetManager
+from server.utils import get_request_filters, get_env
+from server.queue.tasks import process_dataset, fetch_and_process_dataset
+from server.connectors.registry import get_available_connectors, get_connector_metadata

 app = Flask(__name__)
-db = PostgresConnector()

 # Env Variables
 load_dotenv()
-frontend_url = os.getenv("FRONTEND_URL", "http://localhost:5173")
-jwt_secret_key = os.getenv("JWT_SECRET_KEY", "super-secret-change-this")
+max_fetch_limit = int(get_env("MAX_FETCH_LIMIT"))
+frontend_url = get_env("FRONTEND_URL")
+jwt_secret_key = get_env("JWT_SECRET_KEY")
 jwt_access_token_expires = int(
    os.getenv("JWT_ACCESS_TOKEN_EXPIRES", 1200)
 )  # Default to 20 minutes
@@ -36,11 +39,41 @@ CORS(app, resources={r"/*": {"origins": frontend_url}})
 app.config["JWT_SECRET_KEY"] = jwt_secret_key
 app.config["JWT_ACCESS_TOKEN_EXPIRES"] = jwt_access_token_expires

+# Security
 bcrypt = Bcrypt(app)
 jwt = JWTManager(app)
-auth_manager = AuthManager(db, bcrypt)

+# Helper Objects
+db = PostgresConnector()
+auth_manager = AuthManager(db, bcrypt)
+dataset_manager = DatasetManager(db)
 stat_gen = StatGen()
+connectors = get_available_connectors()
+
+# Default Files
+with open("server/topics.json") as f:
+    default_topic_list = json.load(f)
+
+
+def normalize_topics(topics):
+    if not isinstance(topics, dict) or len(topics) == 0:
+        return None
+
+    normalized = {}
+
+    for topic_name, topic_keywords in topics.items():
+        if not isinstance(topic_name, str) or not isinstance(topic_keywords, str):
+            return None
+
+        clean_name = topic_name.strip()
+        clean_keywords = topic_keywords.strip()
+
+        if not clean_name or not clean_keywords:
+            return None
+
+        normalized[clean_name] = clean_keywords
+
+    return normalized


@app.route("/register", methods=["POST"])
@@ -65,7 +98,7 @@ def register_user():
        return jsonify({"error": str(e)}), 400
    except Exception as e:
        print(traceback.format_exc())
-        return jsonify({"error": f"An unexpected error occurred: {str(e)}"}), 500
+        return jsonify({"error": f"An unexpected error occurred"}), 500

    print(f"Registered new user: {username}")
    return jsonify({"message": f"User '{username}' registered successfully"}), 200
@@ -90,7 +123,7 @@ def login_user():
            return jsonify({"error": "Invalid username or password"}), 401
    except Exception as e:
        print(traceback.format_exc())
-        return jsonify({"error": f"An unexpected error occurred: {str(e)}"}), 500
+        return jsonify({"error": f"An unexpected error occurred"}), 500


@app.route("/profile", methods=["GET"])
@@ -98,12 +131,126 @@ def login_user():
 def profile():
    current_user = get_jwt_identity()

-    return jsonify(
-        message="Access granted", user=auth_manager.get_user_by_id(current_user)
-    ), 200
+    return (
+        jsonify(
+            message="Access granted", user=auth_manager.get_user_by_id(current_user)
+        ),
+        200,
+    )


-@app.route("/upload", methods=["POST"])
+@app.route("/user/datasets")
+@jwt_required()
+def get_user_datasets():
+    current_user = int(get_jwt_identity())
+    return jsonify(dataset_manager.get_user_datasets(current_user)), 200
+
+
+@app.route("/datasets/sources", methods=["GET"])
+def get_dataset_sources():
+    list_metadata = list(get_connector_metadata().values())
+    return jsonify(list_metadata)
+
+
+@app.route("/datasets/fetch", methods=["POST"])
+@jwt_required()
+def fetch_data():
+    data = request.get_json()
+    connector_metadata = get_connector_metadata()
+
+    # Strong validation needed, otherwise data goes to Celery and crashes silently
+    if not data or "sources" not in data:
+        return jsonify({"error": "Sources must be provided"}), 400
+
+    if "name" not in data or not str(data["name"]).strip():
+        return jsonify({"error": "Dataset name is required"}), 400
+
+    dataset_name = data["name"].strip()
+    user_id = int(get_jwt_identity())
+    custom_topics = data.get("topics")
+    topics_for_processing = default_topic_list
+
+    source_configs = data["sources"]
+
+    if not isinstance(source_configs, list) or len(source_configs) == 0:
+        return jsonify({"error": "Sources must be a non-empty list"}), 400
+
+    for source in source_configs:
+        if not isinstance(source, dict):
+            return jsonify({"error": "Each source must be an object"}), 400
+
+        if "name" not in source:
+            return jsonify({"error": "Each source must contain a name"}), 400
+
+        name = source["name"]
+        limit = source.get("limit", 1000)
+        category = source.get("category")
+        search = source.get("search")
+
+        if limit:
+            try:
+                limit = int(limit)
+            except (ValueError, TypeError):
+                return jsonify({"error": "Limit must be an integer"}), 400
+
+            if limit > 1000:
+                limit = 1000
+
+        if name not in connector_metadata:
+            return jsonify({"error": "Source not supported"}), 400
+
+        if search and not connector_metadata[name]["search_enabled"]:
+            return jsonify({"error": f"Source {name} does not support search"}), 400
+
+        if category and not connector_metadata[name]["categories_enabled"]:
+            return jsonify({"error": f"Source {name} does not support categories"}), 400
+
+        # if category and not connectors[name]().category_exists(category):
+        #     return jsonify({"error": f"Category does not exist for {name}"}), 400
+
+    if custom_topics is not None:
+        normalized_topics = normalize_topics(custom_topics)
+        if not normalized_topics:
+            return (
+                jsonify(
+                    {
+                        "error": "Topics must be a non-empty JSON object with non-empty string keys and values"
+                    }
+                ),
+                400,
+            )
+
+        topics_for_processing = normalized_topics
+
+    try:
+        dataset_id = dataset_manager.save_dataset_info(
+            user_id, dataset_name, topics_for_processing
+        )
+
+        dataset_manager.set_dataset_status(
+            dataset_id,
+            "fetching",
+            f"Data is being fetched from {', '.join(source['name'] for source in source_configs)}",
+        )
+
+        fetch_and_process_dataset.delay(dataset_id, source_configs, topics_for_processing)
+    except Exception:
+        print(traceback.format_exc())
+        return jsonify({"error": "Failed to queue dataset processing"}), 500
+
+    return (
+        jsonify(
+            {
+                "message": "Dataset queued for processing",
+                "dataset_id": dataset_id,
+                "status": "processing",
+            }
+        ),
+        202,
+    )
+
+
+@app.route("/datasets/upload", methods=["POST"])
@jwt_required()
 def upload_data():
    if "posts" not in request.files or "topics" not in request.files:
@@ -111,244 +258,350 @@ def upload_data():

    post_file = request.files["posts"]
    topic_file = request.files["topics"]
+    dataset_name = (request.form.get("name") or "").strip()

-    if post_file.filename == "" or topic_file == "":
+    if not dataset_name:
+        return jsonify({"error": "Missing required dataset name"}), 400
+
+    if post_file.filename == "" or topic_file.filename == "":
        return jsonify({"error": "Empty filename"}), 400

    if not post_file.filename.endswith(".jsonl") or not topic_file.filename.endswith(
        ".json"
    ):
-        return jsonify(
-            {"error": "Invalid file type. Only .jsonl and .json files are allowed."}
-        ), 400
+        return (
+            jsonify(
+                {"error": "Invalid file type. Only .jsonl and .json files are allowed."}
+            ),
+            400,
+        )

    try:
-        current_user = get_jwt_identity()
+        current_user = int(get_jwt_identity())

        posts_df = pd.read_json(post_file, lines=True, convert_dates=False)
        topics = json.load(topic_file)
-
-        processor = DatasetProcessor(posts_df, topics)
-        enriched_df = processor.enrich()
-        dataset_id = db.save_dataset_info(
-            current_user, f"dataset_{current_user}", topics
+        dataset_id = dataset_manager.save_dataset_info(
+            current_user, dataset_name, topics
        )
-        db.save_dataset_content(dataset_id, enriched_df)

-        return jsonify(
-            {"message": "File uploaded successfully", "event_count": len(enriched_df), "dataset_id": dataset_id}
-        ), 200
+        process_dataset.delay(dataset_id, posts_df.to_dict(orient="records"), topics)
+
+        return (
+            jsonify(
+                {
+                    "message": "Dataset queued for processing",
+                    "dataset_id": dataset_id,
+                    "status": "processing",
+                }
+            ),
+            202,
+        )
    except ValueError as e:
-        return jsonify({"error": f"Failed to read JSONL file: {str(e)}"}), 400
+        return jsonify({"error": f"Failed to read JSONL file"}), 400
    except Exception as e:
-        return jsonify({"error": f"An unexpected error occurred: {str(e)}"}), 500
+        return jsonify({"error": f"An unexpected error occurred"}), 500


@app.route("/dataset/<int:dataset_id>", methods=["GET"])
@jwt_required()
 def get_dataset(dataset_id):
-    current_user = get_jwt_identity()
-    dataset = db.get_dataset_info(dataset_id)
-
-    if dataset.get("user_id") != int(current_user):
-        return jsonify({"error": "Unauthorized access to dataset"}), 403
-
-    dataset_content = db.get_dataset_content(dataset_id)
-
-    if dataset_content.empty:
-        return jsonify({"error": "Dataset content not found"}), 404
-
-    return jsonify(dataset_content.to_dict(orient="records")), 200
-
-
-@app.route("/dataset/<int:dataset_id>/content", methods=["GET"])
-@jwt_required()
-def content_endpoint(dataset_id):
-    current_user = get_jwt_identity()
-    dataset = db.get_dataset_info(dataset_id)
-
-    if dataset.get("user_id") != int(current_user):
-        return jsonify({"error": "Unauthorized access to dataset"}), 403
-
-    dataset_content = db.get_dataset_content(dataset_id)
    try:
-        return jsonify(stat_gen.get_content_analysis(dataset_content)), 200
+        user_id = int(get_jwt_identity())
+
+        if not dataset_manager.authorize_user_dataset(dataset_id, user_id):
+            raise NotAuthorisedException(
+                "This user is not authorised to access this dataset"
+            )
+
+        dataset_info = dataset_manager.get_dataset_info(dataset_id)
+        included_cols = {"id", "name", "created_at"}
+
+        return jsonify({k: dataset_info[k] for k in included_cols}), 200
+    except NotAuthorisedException:
+        return jsonify({"error": "User is not authorised to access this content"}), 403
+    except NonExistentDatasetException:
+        return jsonify({"error": "Dataset does not exist"}), 404
+    except Exception:
+        print(traceback.format_exc())
+        return jsonify({"error": "An unexpected error occured"}), 500
+
+
+@app.route("/dataset/<int:dataset_id>", methods=["PATCH"])
+@jwt_required()
+def update_dataset(dataset_id):
+    try:
+        user_id = int(get_jwt_identity())
+
+        if not dataset_manager.authorize_user_dataset(dataset_id, user_id):
+            raise NotAuthorisedException(
+                "This user is not authorised to access this dataset"
+            )
+
+        body = request.get_json()
+        new_name = body.get("name")
+
+        if not new_name or not new_name.strip():
+            return jsonify({"error": "A valid name must be provided"}), 400
+
+        dataset_manager.update_dataset_name(dataset_id, new_name.strip())
+        return (
+            jsonify(
+                {"message": f"Dataset {dataset_id} renamed to '{new_name.strip()}'"}
+            ),
+            200,
+        )
+    except NotAuthorisedException:
+        return jsonify({"error": "User is not authorised to access this content"}), 403
+    except NonExistentDatasetException:
+        return jsonify({"error": "Dataset does not exist"}), 404
+    except Exception:
+        print(traceback.format_exc())
+        return jsonify({"error": "An unexpected error occurred"}), 500
+
+
+@app.route("/dataset/<int:dataset_id>", methods=["DELETE"])
+@jwt_required()
+def delete_dataset(dataset_id):
+    try:
+        user_id = int(get_jwt_identity())
+
+        if not dataset_manager.authorize_user_dataset(dataset_id, user_id):
+            raise NotAuthorisedException(
+                "This user is not authorised to access this dataset"
+            )
+
+        dataset_manager.delete_dataset_info(dataset_id)
+        dataset_manager.delete_dataset_content(dataset_id)
+        return (
+            jsonify(
+                {
+                    "message": f"Dataset {dataset_id} metadata and content successfully deleted"
+                }
+            ),
+            200,
+        )
+    except NotAuthorisedException:
+        return jsonify({"error": "User is not authorised to access this content"}), 403
+    except NonExistentDatasetException:
+        return jsonify({"error": "Dataset does not exist"}), 404
+    except Exception:
+        print(traceback.format_exc())
+        return jsonify({"error": "An unexpected error occured"}), 500
+
+
+@app.route("/dataset/<int:dataset_id>/status", methods=["GET"])
+@jwt_required()
+def get_dataset_status(dataset_id):
+    try:
+        user_id = int(get_jwt_identity())
+
+        if not dataset_manager.authorize_user_dataset(dataset_id, user_id):
+            raise NotAuthorisedException(
+                "This user is not authorised to access this dataset"
+            )
+
+        dataset_status = dataset_manager.get_dataset_status(dataset_id)
+        return jsonify(dataset_status), 200
+    except NotAuthorisedException:
+        return jsonify({"error": "User is not authorised to access this content"}), 403
+    except NonExistentDatasetException:
+        return jsonify({"error": "Dataset does not exist"}), 404
+    except Exception:
+        print(traceback.format_exc())
+        return jsonify({"error": "An unexpected error occured"}), 500
+
+
+@app.route("/dataset/<int:dataset_id>/linguistic", methods=["GET"])
+@jwt_required()
+def get_linguistic_analysis(dataset_id):
+    try:
+        user_id = int(get_jwt_identity())
+        if not dataset_manager.authorize_user_dataset(dataset_id, user_id):
+            raise NotAuthorisedException(
+                "This user is not authorised to access this dataset"
+            )
+
+        dataset_content = dataset_manager.get_dataset_content(dataset_id)
+        filters = get_request_filters()
+        return jsonify(stat_gen.linguistic(dataset_content, filters, dataset_id=dataset_id)), 200
+    except NotAuthorisedException:
+        return jsonify({"error": "User is not authorised to access this content"}), 403
+    except NonExistentDatasetException:
+        return jsonify({"error": "Dataset does not exist"}), 404
    except ValueError as e:
-        return jsonify({"error": f"Malformed or missing data: {str(e)}"}), 400
+        return jsonify({"error": f"Malformed or missing data"}), 400
    except Exception as e:
        print(traceback.format_exc())
-        return jsonify({"error": f"An unexpected error occurred: {str(e)}"}), 500
+        return jsonify({"error": f"An unexpected error occurred"}), 500
+
+
+@app.route("/dataset/<int:dataset_id>/emotional", methods=["GET"])
+@jwt_required()
+def get_emotional_analysis(dataset_id):
+    try:
+        user_id = int(get_jwt_identity())
+        if not dataset_manager.authorize_user_dataset(dataset_id, user_id):
+            raise NotAuthorisedException(
+                "This user is not authorised to access this dataset"
+            )
+
+        dataset_content = dataset_manager.get_dataset_content(dataset_id)
+        filters = get_request_filters()
+        return jsonify(stat_gen.emotional(dataset_content, filters, dataset_id=dataset_id)), 200
+    except NotAuthorisedException:
+        return jsonify({"error": "User is not authorised to access this content"}), 403
+    except NonExistentDatasetException:
+        return jsonify({"error": "Dataset does not exist"}), 404
+    except ValueError as e:
+        return jsonify({"error": f"Malformed or missing data"}), 400
+    except Exception as e:
+        print(traceback.format_exc())
+        return jsonify({"error": f"An unexpected error occurred"}), 500


@app.route("/dataset/<int:dataset_id>/summary", methods=["GET"])
@jwt_required()
 def get_summary(dataset_id):
-    current_user = get_jwt_identity()
-    dataset = db.get_dataset_info(dataset_id)
-
-    if dataset.get("user_id") != int(current_user):
-        return jsonify({"error": "Unauthorized access to dataset"}), 403
-
-    dataset_content = db.get_dataset_content(dataset_id)
-
    try:
-        return jsonify(stat_gen.summary(dataset_content)), 200
+        user_id = int(get_jwt_identity())
+        if not dataset_manager.authorize_user_dataset(dataset_id, user_id):
+            raise NotAuthorisedException(
+                "This user is not authorised to access this dataset"
+            )
+
+        dataset_content = dataset_manager.get_dataset_content(dataset_id)
+        filters = get_request_filters()
+        return jsonify(stat_gen.summary(dataset_content, filters, dataset_id=dataset_id)), 200
+    except NotAuthorisedException:
+        return jsonify({"error": "User is not authorised to access this content"}), 403
+    except NonExistentDatasetException:
+        return jsonify({"error": "Dataset does not exist"}), 404
    except ValueError as e:
-        return jsonify({"error": f"Malformed or missing data: {str(e)}"}), 400
+        return jsonify({"error": f"Malformed or missing data"}), 400
    except Exception as e:
        print(traceback.format_exc())
-        return jsonify({"error": f"An unexpected error occurred: {str(e)}"}), 500
+        return jsonify({"error": f"An unexpected error occurred"}), 500


-@app.route("/dataset/<int:dataset_id>/time", methods=["GET"])
+@app.route("/dataset/<int:dataset_id>/temporal", methods=["GET"])
@jwt_required()
-def get_time_analysis(dataset_id):
-    current_user = get_jwt_identity()
-    dataset = db.get_dataset_info(dataset_id)
-
-    if dataset.get("user_id") != int(current_user):
-        return jsonify({"error": "Unauthorized access to dataset"}), 403
-
-    dataset_content = db.get_dataset_content(dataset_id)
-
+def get_temporal_analysis(dataset_id):
    try:
-        return jsonify(stat_gen.get_time_analysis(dataset_content)), 200
+        user_id = int(get_jwt_identity())
+        if not dataset_manager.authorize_user_dataset(dataset_id, user_id):
+            raise NotAuthorisedException(
+                "This user is not authorised to access this dataset"
+            )
+
+        dataset_content = dataset_manager.get_dataset_content(dataset_id)
+        filters = get_request_filters()
+        return jsonify(stat_gen.temporal(dataset_content, filters, dataset_id=dataset_id)), 200
+    except NotAuthorisedException:
+        return jsonify({"error": "User is not authorised to access this content"}), 403
+    except NonExistentDatasetException:
+        return jsonify({"error": "Dataset does not exist"}), 404
    except ValueError as e:
-        return jsonify({"error": f"Malformed or missing data: {str(e)}"}), 400
+        return jsonify({"error": f"Malformed or missing data"}), 400
    except Exception as e:
        print(traceback.format_exc())
-        return jsonify({"error": f"An unexpected error occurred: {str(e)}"}), 500
+        return jsonify({"error": f"An unexpected error occurred"}), 500


@app.route("/dataset/<int:dataset_id>/user", methods=["GET"])
@jwt_required()
 def get_user_analysis(dataset_id):
-    current_user = get_jwt_identity()
-    dataset = db.get_dataset_info(dataset_id)
-
-    if dataset.get("user_id") != int(current_user):
-        return jsonify({"error": "Unauthorized access to dataset"}), 403
-
-    dataset_content = db.get_dataset_content(dataset_id)
-
    try:
-        return jsonify(stat_gen.get_user_analysis(dataset_content)), 200
+        user_id = int(get_jwt_identity())
+        if not dataset_manager.authorize_user_dataset(dataset_id, user_id):
+            raise NotAuthorisedException(
+                "This user is not authorised to access this dataset"
+            )
+
+        dataset_content = dataset_manager.get_dataset_content(dataset_id)
+        filters = get_request_filters()
+        return jsonify(stat_gen.user(dataset_content, filters, dataset_id=dataset_id)), 200
+    except NotAuthorisedException:
+        return jsonify({"error": "User is not authorised to access this content"}), 403
+    except NonExistentDatasetException:
+        return jsonify({"error": "Dataset does not exist"}), 404
    except ValueError as e:
-        return jsonify({"error": f"Malformed or missing data: {str(e)}"}), 400
+        return jsonify({"error": f"Malformed or missing data"}), 400
    except Exception as e:
        print(traceback.format_exc())
-        return jsonify({"error": f"An unexpected error occurred: {str(e)}"}), 500
+        return jsonify({"error": f"An unexpected error occurred"}), 500


@app.route("/dataset/<int:dataset_id>/cultural", methods=["GET"])
@jwt_required()
 def get_cultural_analysis(dataset_id):
-    current_user = get_jwt_identity()
-    dataset = db.get_dataset_info(dataset_id)
-
-    if dataset.get("user_id") != int(current_user):
-        return jsonify({"error": "Unauthorized access to dataset"}), 403
-
-    dataset_content = db.get_dataset_content(dataset_id)
-
    try:
-        return jsonify(stat_gen.get_cultural_analysis(dataset_content)), 200
+        user_id = int(get_jwt_identity())
+        if not dataset_manager.authorize_user_dataset(dataset_id, user_id):
+            raise NotAuthorisedException(
+                "This user is not authorised to access this dataset"
+            )
+
+        dataset_content = dataset_manager.get_dataset_content(dataset_id)
+        filters = get_request_filters()
+        return jsonify(stat_gen.cultural(dataset_content, filters, dataset_id=dataset_id)), 200
+    except NotAuthorisedException:
+        return jsonify({"error": "User is not authorised to access this content"}), 403
+    except NonExistentDatasetException:
+        return jsonify({"error": "Dataset does not exist"}), 404
    except ValueError as e:
-        return jsonify({"error": f"Malformed or missing data: {str(e)}"}), 400
+        return jsonify({"error": f"Malformed or missing data"}), 400
    except Exception as e:
        print(traceback.format_exc())
-        return jsonify({"error": f"An unexpected error occurred: {str(e)}"}), 500
+        return jsonify({"error": f"An unexpected error occurred"}), 500


-@app.route("/dataset/<int:dataset_id>/interaction", methods=["GET"])
+@app.route("/dataset/<int:dataset_id>/interactional", methods=["GET"])
@jwt_required()
 def get_interaction_analysis(dataset_id):
-    current_user = get_jwt_identity()
-    dataset = db.get_dataset_info(dataset_id)
-
-    if dataset.get("user_id") != int(current_user):
-        return jsonify({"error": "Unauthorized access to dataset"}), 403
-
-    dataset_content = db.get_dataset_content(dataset_id)
-
    try:
-        return jsonify(stat_gen.get_interactional_analysis(dataset_content)), 200
+        user_id = int(get_jwt_identity())
+        if not dataset_manager.authorize_user_dataset(dataset_id, user_id):
+            raise NotAuthorisedException(
+                "This user is not authorised to access this dataset"
+            )
+
+        dataset_content = dataset_manager.get_dataset_content(dataset_id)
+        filters = get_request_filters()
+        return jsonify(stat_gen.interactional(dataset_content, filters, dataset_id=dataset_id)), 200
+    except NotAuthorisedException:
+        return jsonify({"error": "User is not authorised to access this content"}), 403
+    except NonExistentDatasetException:
+        return jsonify({"error": "Dataset does not exist"}), 404
    except ValueError as e:
-        return jsonify({"error": f"Malformed or missing data: {str(e)}"}), 400
+        return jsonify({"error": f"Malformed or missing data"}), 400
    except Exception as e:
        print(traceback.format_exc())
-        return jsonify({"error": f"An unexpected error occurred: {str(e)}"}), 500
+        return jsonify({"error": f"An unexpected error occurred"}), 500


-# @app.route("/filter/query", methods=["POST"])
-# def filter_query():
-#     if stat_obj is None:
-#         return jsonify({"error": "No data uploaded"}), 400
+@app.route("/dataset/<int:dataset_id>/all", methods=["GET"])
+@jwt_required()
+def get_full_dataset(dataset_id: int):
+    try:
+        user_id = int(get_jwt_identity())
+        if not dataset_manager.authorize_user_dataset(dataset_id, user_id):
+            raise NotAuthorisedException(
+                "This user is not authorised to access this dataset"
+            )

-#     data = request.get_json(silent=True) or {}
-
-#     if "query" not in data:
-#         return jsonify(stat_obj.df.to_dict(orient="records")), 200
-
-#     query = data["query"]
-#     filtered_df = stat_obj.filter_by_query(query)
-
-#     return jsonify(filtered_df), 200
-
-
-# @app.route("/filter/time", methods=["POST"])
-# def filter_time():
-#     if stat_obj is None:
-#         return jsonify({"error": "No data uploaded"}), 400
-
-#     data = request.get_json(silent=True)
-#     if not data:
-#         return jsonify({"error": "Invalid or missing JSON body"}), 400
-
-#     if "start" not in data or "end" not in data:
-#         return jsonify({"error": "Please include both start and end dates"}), 400
-
-#     try:
-#         start = pd.to_datetime(data["start"], utc=True)
-#         end = pd.to_datetime(data["end"], utc=True)
-#         filtered_df = stat_obj.set_time_range(start, end)
-#         return jsonify(filtered_df), 200
-#     except Exception:
-#         return jsonify({"error": "Invalid datetime format"}), 400
-
-
-# @app.route("/filter/sources", methods=["POST"])
-# def filter_sources():
-#     if stat_obj is None:
-#         return jsonify({"error": "No data uploaded"}), 400
-
-#     data = request.get_json(silent=True)
-#     if not data:
-#         return jsonify({"error": "Invalid or missing JSON body"}), 400
-
-#     if "sources" not in data:
-#         return jsonify({"error": "Ensure sources hash map is in 'sources' key"}), 400
-
-#     try:
-#         filtered_df = stat_obj.filter_data_sources(data["sources"])
-#         return jsonify(filtered_df), 200
-#     except ValueError:
-#         return jsonify({"error": "Please enable at least one data source"}), 400
-#     except Exception as e:
-#         return jsonify({"error": "An unexpected server error occured: " + str(e)}), 500
-
-
-# @app.route("/filter/reset", methods=["GET"])
-# def reset_dataset():
-#     if stat_obj is None:
-#         return jsonify({"error": "No data uploaded"}), 400
-
-#     try:
-#         stat_obj.reset_dataset()
-#         return jsonify({"success": "Dataset successfully reset"})
-#     except Exception as e:
-#         print(traceback.format_exc())
-#         return jsonify({"error": f"An unexpected error occurred: {str(e)}"}), 500
+        dataset_content = dataset_manager.get_dataset_content(dataset_id)
+        filters = get_request_filters()
+        return jsonify(stat_gen.filter_dataset(dataset_content, filters)), 200
+    except NotAuthorisedException:
+        return jsonify({"error": "User is not authorised to access this content"}), 403
+    except NonExistentDatasetException:
+        return jsonify({"error": "Dataset does not exist"}), 404
+    except ValueError as e:
+        return jsonify({"error": f"Malformed or missing data"}), 400
+    except Exception as e:
+        print(traceback.format_exc())
+        return jsonify({"error": f"An unexpected error occurred"}), 500


 if __name__ == "__main__":
--- a/server/auth.py
+++ b/server/auth.py
@@ -1,29 +0,0 @@
-from db.database import PostgresConnector
-from flask_bcrypt import Bcrypt
-
-class AuthManager:
-    def __init__(self, db: PostgresConnector, bcrypt: Bcrypt):
-        self.db = db
-        self.bcrypt = bcrypt
-
-    def register_user(self, username, email, password):
-        hashed_password = self.bcrypt.generate_password_hash(password).decode("utf-8")
-
-        if self.db.get_user_by_email(email):
-            raise ValueError("Email already registered")
-        
-        if self.db.get_user_by_username(username):
-            raise ValueError("Username already taken")
-
-        self.db.save_user(username, email, hashed_password)
-
-    def authenticate_user(self, username, password):
-        user = self.db.get_user_by_username(username)
-        if user and self.bcrypt.check_password_hash(user['password_hash'], password):
-            return user
-        return None
-    
-    def get_user_by_id(self, user_id):
-        query = "SELECT id, username, email FROM users WHERE id = %s"
-        result = self.db.execute(query, (user_id,), fetch=True)
-        return result[0] if result else None
--- a/server/connectors/base.py
+++ b/server/connectors/base.py
@@ -0,0 +1,24 @@
+from abc import ABC, abstractmethod
+from dto.post import Post
+import os
+
+
+class BaseConnector(ABC):
+    source_name: str  # machine readable
+    display_name: str  # human readablee
+    required_env: list[str] = []  
+
+    search_enabled: bool
+    categories_enabled: bool
+
+    @classmethod
+    def is_available(cls) -> bool:
+        return all(os.getenv(var) for var in cls.required_env)
+
+    @abstractmethod
+    def get_new_posts_by_search(
+        self, search: str = None, category: str = None, post_limit: int = 10
+    ) -> list[Post]: ...
+
+    @abstractmethod
+    def category_exists(self, category: str) -> bool: ...
--- a/server/connectors/boards_api.py
+++ b/server/connectors/boards_api.py
@@ -7,56 +7,94 @@ from dto.post import Post
 from dto.comment import Comment
 from bs4 import BeautifulSoup
 from concurrent.futures import ThreadPoolExecutor, as_completed
+from server.connectors.base import BaseConnector

 logger = logging.getLogger(__name__)

-HEADERS = {
-    "User-Agent": "Mozilla/5.0 (compatible; ForumScraper/1.0)"
-}
+HEADERS = {"User-Agent": "Mozilla/5.0 (compatible; Digital-Ethnography-Aid/1.0)"}
+
+class BoardsAPI(BaseConnector):
+    source_name: str = "boards.ie"
+    display_name: str = "Boards.ie"
+
+    categories_enabled: bool = True
+    search_enabled: bool = False

-class BoardsAPI:
    def __init__(self):
-        self.url = "https://www.boards.ie"
-        self.source_name = "Boards.ie"
+        self.base_url = "https://www.boards.ie"

-    def get_new_category_posts(self, category: str, post_limit: int, comment_limit: int)  -> list[Post]:
+    def get_new_posts_by_search(
+        self, search: str, category: str, post_limit: int
+    ) -> list[Post]:
+        if search:
+            raise NotImplementedError("Search not compatible with boards.ie")
+
+        if category:
+            return self._get_posts(f"{self.base_url}/categories/{category}", post_limit)
+        else:
+            return self._get_posts(f"{self.base_url}/discussions", post_limit)
+
+    def category_exists(self, category: str) -> bool:
+        if not category:
+            return False
+
+        url = f"{self.base_url}/categories/{category}"
+
+        try:
+            response = requests.head(url, headers=HEADERS, allow_redirects=True)
+
+            if response.status_code == 200:
+                return True
+            if response.status_code == 404:
+                return False
+
+            # fallback if HEAD not supported
+            response = requests.get(url, headers=HEADERS)
+            return response.status_code == 200
+
+        except requests.RequestException as e:
+            logger.error(f"Error checking category '{category}': {e}")
+            return False
+
+    ## Private
+    def _get_posts(self, url, limit) -> list[Post]:
        urls = []
        current_page = 1

-        logger.info(f"Fetching posts from category: {category}")
-
-        while len(urls) < post_limit:
-            url = f"{self.url}/categories/{category}/p{current_page}"
+        while len(urls) < limit:
+            url = f"{url}/p{current_page}"
            html = self._fetch_page(url)
            soup = BeautifulSoup(html, "html.parser")

-            logger.debug(f"Processing page {current_page} for category {category}")
+            logger.debug(f"Processing page {current_page} for link: {url}")
            for a in soup.select("a.threadbit-threadlink"):
-                if len(urls) >= post_limit:
+                if len(urls) >= limit:
                    break

                href = a.get("href")
                if href:
                    urls.append(href)
-            
+
            current_page += 1

-        logger.debug(f"Fetched {len(urls)} post URLs from category {category}")
+        logger.debug(f"Fetched {len(urls)} post URLs")

        # Fetch post details for each URL and create Post objects
        posts = []

        def fetch_and_parse(post_url):
            html = self._fetch_page(post_url)
-            post = self._parse_thread(html, post_url, comment_limit)
+            post = self._parse_thread(html, post_url)
            return post

-        with ThreadPoolExecutor(max_workers=30) as executor:
+        with ThreadPoolExecutor(max_workers=5) as executor:
            futures = {executor.submit(fetch_and_parse, url): url for url in urls}

            for i, future in enumerate(as_completed(futures)):
                post_url = futures[future]
-                logger.debug(f"Fetching Post {i + 1} / {len(urls)} details from URL: {post_url}")
+                logger.debug(
+                    f"Fetching Post {i + 1} / {len(urls)} details from URL: {post_url}"
+                )
                try:
                    post = future.result()
                    posts.append(post)
@@ -65,15 +103,14 @@ class BoardsAPI:

        return posts

-
    def _fetch_page(self, url: str) -> str:
        response = requests.get(url, headers=HEADERS)
        response.raise_for_status()
        return response.text

-    def _parse_thread(self, html: str, post_url: str, comment_limit: int) -> Post:
+    def _parse_thread(self, html: str, post_url: str) -> Post:
        soup = BeautifulSoup(html, "html.parser")
-        
+
        # Author
        author_tag = soup.select_one(".userinfo-username-title")
        author = author_tag.text.strip() if author_tag else None
@@ -82,10 +119,16 @@ class BoardsAPI:
        timestamp_tag = soup.select_one(".postbit-header")
        timestamp = None
        if timestamp_tag:
-            match = re.search(r"\d{2}-\d{2}-\d{4}\s+\d{2}:\d{2}[AP]M", timestamp_tag.get_text())
+            match = re.search(
+                r"\d{2}-\d{2}-\d{4}\s+\d{2}:\d{2}[AP]M", timestamp_tag.get_text()
+            )
            timestamp = match.group(0) if match else None
            # convert to unix epoch
-            timestamp = datetime.datetime.strptime(timestamp, "%d-%m-%Y %I:%M%p").timestamp() if timestamp else None
+            timestamp = (
+                datetime.datetime.strptime(timestamp, "%d-%m-%Y %I:%M%p").timestamp()
+                if timestamp
+                else None
+            )

        # Post ID
        post_num = re.search(r"discussion/(\d+)", post_url)
@@ -93,14 +136,16 @@ class BoardsAPI:

        # Content
        content_tag = soup.select_one(".Message.userContent")
-        content = content_tag.get_text(separator="\n", strip=True) if content_tag else None
+        content = (
+            content_tag.get_text(separator="\n", strip=True) if content_tag else None
+        )

        # Title
        title_tag = soup.select_one(".PageTitle h1")
        title = title_tag.text.strip() if title_tag else None

        # Comments
-        comments = self._parse_comments(post_url, post_num, comment_limit)
+        comments = self._parse_comments(post_url, post_num)

        post = Post(
            id=post_num,
@@ -110,16 +155,16 @@ class BoardsAPI:
            url=post_url,
            timestamp=timestamp,
            source=self.source_name,
-            comments=comments
+            comments=comments,
        )

        return post

-    def _parse_comments(self, url: str, post_id: str, comment_limit: int) -> list[Comment]:
+    def _parse_comments(self, url: str, post_id: str) -> list[Comment]:
        comments = []
        current_url = url

-        while current_url and len(comments) < comment_limit:
+        while current_url:
            html = self._fetch_page(current_url)
            page_comments = self._parse_page_comments(html, post_id)
            comments.extend(page_comments)
@@ -128,9 +173,9 @@ class BoardsAPI:
            soup = BeautifulSoup(html, "html.parser")
            next_link = soup.find("a", class_="Next")

-            if next_link and next_link.get('href'):
-                href = next_link.get('href')
-                current_url = href if href.startswith('http') else self.url + href
+            if next_link and next_link.get("href"):
+                href = next_link.get("href")
+                current_url = href if href.startswith("http") else url + href
            else:
                current_url = None

@@ -146,21 +191,29 @@ class BoardsAPI:
            comment_id = tag.get("id")

            # Author
-            user_elem = tag.find('span', class_='userinfo-username-title')
+            user_elem = tag.find("span", class_="userinfo-username-title")
            username = user_elem.get_text(strip=True) if user_elem else None

            # Timestamp
-            date_elem = tag.find('span', class_='DateCreated')
+            date_elem = tag.find("span", class_="DateCreated")
            timestamp = date_elem.get_text(strip=True) if date_elem else None
-            timestamp = datetime.datetime.strptime(timestamp, "%d-%m-%Y %I:%M%p").timestamp() if timestamp else None
+            timestamp = (
+                datetime.datetime.strptime(timestamp, "%d-%m-%Y %I:%M%p").timestamp()
+                if timestamp
+                else None
+            )

            # Content
-            message_div = tag.find('div', class_='Message userContent')
+            message_div = tag.find("div", class_="Message userContent")

            if message_div.blockquote:
                message_div.blockquote.decompose()

-            content = message_div.get_text(separator="\n", strip=True) if message_div else None
+            content = (
+                message_div.get_text(separator="\n", strip=True)
+                if message_div
+                else None
+            )

            comment = Comment(
                id=comment_id,
@@ -169,10 +222,8 @@ class BoardsAPI:
                content=content,
                timestamp=timestamp,
                reply_to=None,
-                source=self.source_name
+                source=self.source_name,
            )
            comments.append(comment)

        return comments
-
-
--- a/server/connectors/reddit_api.py
+++ b/server/connectors/reddit_api.py
@@ -0,0 +1,259 @@
+import requests
+import logging
+import time
+import os
+
+from dotenv import load_dotenv
+from requests.auth import HTTPBasicAuth
+
+from dto.post import Post
+from dto.user import User
+from dto.comment import Comment
+from server.connectors.base import BaseConnector
+
+logger = logging.getLogger(__name__)
+
+CLIENT_ID = os.getenv("REDDIT_CLIENT_ID")
+CLIENT_SECRET = os.getenv("REDDIT_CLIENT_SECRET")
+
+class RedditAPI(BaseConnector):
+    source_name: str = "reddit"
+    display_name: str = "Reddit"
+    search_enabled: bool = True
+    categories_enabled: bool = True
+
+    def __init__(self):
+        self.url = "https://www.reddit.com/"
+        self.token = None
+        self.token_expiry = 0
+
+    # Public Methods #
+    def get_new_posts_by_search(
+        self, search: str, category: str, post_limit: int
+    ) -> list[Post]:
+
+        prefix = f"r/{category}/" if category else ""
+        params = {"limit": post_limit}
+
+        if search:
+            endpoint = f"{prefix}search.json"
+            params.update(
+                {"q": search, "sort": "new", "restrict_sr": "on" if category else "off"}
+            )
+        else:
+            endpoint = f"{prefix}new.json"
+
+        posts = []
+        after = None
+
+        while len(posts) < post_limit:
+            batch_limit = min(100, post_limit - len(posts))
+            params["limit"] = batch_limit
+            if after:
+                params["after"] = after
+
+            data = self._fetch_post_overviews(endpoint, params)
+
+            if not data or "data" not in data or not data["data"].get("children"):
+                break
+
+            batch_posts = self._parse_posts(data)
+            posts.extend(batch_posts)
+
+            after = data["data"].get("after")
+            if not after:
+                break
+
+        return posts[:post_limit]
+
+    def _get_new_subreddit_posts(self, subreddit: str, limit: int = 10) -> list[Post]:
+        posts = []
+        after = None
+        url = f"r/{subreddit}/new.json"
+
+        logger.info(f"Fetching new posts from subreddit: {subreddit}")
+
+        while len(posts) < limit:
+            batch_limit = min(100, limit - len(posts))
+            params = {"limit": batch_limit, "after": after}
+
+            data = self._fetch_post_overviews(url, params)
+            batch_posts = self._parse_posts(data)
+
+            logger.debug(
+                f"Fetched {len(batch_posts)} new posts from subreddit {subreddit}"
+            )
+
+            if not batch_posts:
+                break
+
+            posts.extend(batch_posts)
+            after = data["data"].get("after")
+            if not after:
+                break
+
+        return posts
+
+    def get_user(self, username: str) -> User:
+        data = self._fetch_post_overviews(f"user/{username}/about.json", {})
+        return self._parse_user(data)
+
+    def category_exists(self, category: str) -> bool:
+        try:
+            data = self._fetch_post_overviews(f"r/{category}/about.json", {})
+            return (
+                data is not None
+                and "data" in data
+                and data["data"].get("id") is not None
+            )
+        except Exception:
+            return False
+
+    ## Private Methods ##
+    def _parse_posts(self, data) -> list[Post]:
+        posts = []
+
+        total_num_posts = len(data["data"]["children"])
+        current_index = 0
+
+        for item in data["data"]["children"]:
+            current_index += 1
+            logger.debug(f"Parsing post {current_index} of {total_num_posts}")
+
+            post_data = item["data"]
+            post = Post(
+                id=post_data["id"],
+                author=post_data["author"],
+                title=post_data["title"],
+                content=post_data.get("selftext", ""),
+                url=post_data["url"],
+                timestamp=post_data["created_utc"],
+                source=self.source_name,
+                comments=self._get_post_comments(post_data["id"]),
+            )
+            post.subreddit = post_data["subreddit"]
+            post.upvotes = post_data["ups"]
+
+            posts.append(post)
+        return posts
+
+    def _get_post_comments(self, post_id: str) -> list[Comment]:
+        comments: list[Comment] = []
+        url = f"comments/{post_id}.json"
+
+        data = self._fetch_post_overviews(url, {})
+        if len(data) < 2:
+            return comments
+
+        comment_data = data[1]["data"]["children"]
+
+        def _parse_comment_tree(items, parent_id=None):
+            for item in items:
+                if item["kind"] != "t1":
+                    continue
+
+                comment_info = item["data"]
+                comment = Comment(
+                    id=comment_info["id"],
+                    post_id=post_id,
+                    author=comment_info["author"],
+                    content=comment_info.get("body", ""),
+                    timestamp=comment_info["created_utc"],
+                    reply_to=parent_id or comment_info.get("parent_id", None),
+                    source=self.source_name,
+                )
+
+                comments.append(comment)
+
+                # Process replies recursively
+                replies = comment_info.get("replies")
+                if replies and isinstance(replies, dict):
+                    reply_items = replies.get("data", {}).get("children", [])
+                    _parse_comment_tree(reply_items, parent_id=comment.id)
+
+        _parse_comment_tree(comment_data)
+        return comments
+
+    def _parse_user(self, data) -> User:
+        user_data = data["data"]
+        user = User(username=user_data["name"], created_utc=user_data["created_utc"])
+        user.karma = user_data["total_karma"]
+        return user
+    
+    def _get_token(self):
+        if self.token and time.time() < self.token_expiry:
+            return self.token
+
+        logger.info("Fetching new Reddit access token...")
+
+        auth = HTTPBasicAuth(CLIENT_ID, CLIENT_SECRET)
+
+        data = {
+            "grant_type": "client_credentials"
+        }
+
+        headers = {
+            "User-Agent": "python:ethnography-college-project:0.1 (by /u/ThisBirchWood)"
+        }
+
+        response = requests.post(
+            "https://www.reddit.com/api/v1/access_token",
+            auth=auth,
+            data=data,
+            headers=headers,
+        )
+
+        response.raise_for_status()
+        token_json = response.json()
+
+        self.token = token_json["access_token"]
+        self.token_expiry = time.time() + token_json["expires_in"] - 60
+
+        logger.info(
+            f"Obtained new Reddit access token (expires in {token_json['expires_in']}s)"
+        )
+
+        return self.token
+
+    def _fetch_post_overviews(self, endpoint: str, params: dict) -> dict:
+        url = f"https://oauth.reddit.com/{endpoint.lstrip('/')}"
+        max_retries = 15
+        backoff = 1  # seconds
+
+        for attempt in range(max_retries):
+            try:
+                response = requests.get(
+                    url,
+                    headers={
+                        "User-agent": "python:ethnography-college-project:0.1 (by /u/ThisBirchWood)",
+                        "Authorization": f"Bearer {self._get_token()}",
+                    },
+                    params=params,
+                )
+
+                if response.status_code == 429:
+                    try:
+                        wait_time = int(response.headers.get("X-Ratelimit-Reset", backoff))
+                        wait_time += 1  # Add a small buffer to ensure the rate limit has reset
+                    except ValueError:
+                        wait_time = backoff
+
+                    logger.warning(
+                        f"Rate limited by Reddit API. Retrying in {wait_time} seconds..."
+                    )
+
+                    time.sleep(wait_time)
+                    backoff *= 2
+                    continue
+
+                if response.status_code == 500:
+                    logger.warning("Server error from Reddit API. Retrying...")
+                    time.sleep(backoff)
+                    backoff *= 2
+                    continue
+
+                response.raise_for_status()
+                return response.json()
+            except requests.RequestException as e:
+                print(f"Error fetching data from Reddit API: {e}")
+                return {}
--- a/server/connectors/registry.py
+++ b/server/connectors/registry.py
@@ -0,0 +1,35 @@
+import pkgutil
+import importlib
+import server.connectors
+from server.connectors.base import BaseConnector
+
+
+def _discover_connectors() -> list[type[BaseConnector]]:
+    """Walk the connectors package and collect all BaseConnector subclasses."""
+    for _, module_name, _ in pkgutil.iter_modules(server.connectors.__path__):
+        if module_name in ("base", "registry"):
+            continue
+        importlib.import_module(f"server.connectors.{module_name}")
+
+    return [
+        cls
+        for cls in BaseConnector.__subclasses__()
+        if cls.source_name  # guard against abstract intermediaries
+    ]
+
+
+def get_available_connectors() -> dict[str, type[BaseConnector]]:
+    return {c.source_name: c for c in _discover_connectors() if c.is_available()}
+
+
+def get_connector_metadata() -> dict[str, dict]:
+    res = {}
+    for id, obj in get_available_connectors().items():
+        res[id] = {
+            "id": id,
+            "label": obj.display_name,
+            "search_enabled": obj.search_enabled,
+            "categories_enabled": obj.categories_enabled,
+        }
+
+    return res
--- a/server/connectors/youtube_api.py
+++ b/server/connectors/youtube_api.py
@@ -0,0 +1,118 @@
+import os
+import datetime
+import logging
+
+from dotenv import load_dotenv
+from googleapiclient.discovery import build
+from googleapiclient.errors import HttpError
+from dto.post import Post
+from dto.comment import Comment
+from server.connectors.base import BaseConnector
+
+load_dotenv()
+API_KEY = os.getenv("YOUTUBE_API_KEY")
+
+logger = logging.getLogger(__name__)
+logger.setLevel(logging.INFO)
+
+
+class YouTubeAPI(BaseConnector):
+    source_name: str = "youtube"
+    display_name: str = "YouTube"
+    search_enabled: bool = True
+    categories_enabled: bool = False
+
+    def __init__(self):
+        self.youtube = build("youtube", "v3", developerKey=API_KEY)
+
+    def get_new_posts_by_search(
+        self, search: str, category: str, post_limit: int
+    ) -> list[Post]:
+        videos = self._search_videos(search, post_limit)
+        posts = []
+
+        for video in videos:
+            video_id = video["id"]["videoId"]
+            snippet = video["snippet"]
+            title = snippet["title"]
+            description = snippet["description"]
+            published_at = datetime.datetime.strptime(
+                snippet["publishedAt"], "%Y-%m-%dT%H:%M:%SZ"
+            ).timestamp()
+            channel_title = snippet["channelTitle"]
+
+            comments = []
+            comments_data = self._get_video_comments(video_id)
+            for comment_thread in comments_data:
+                comment_snippet = comment_thread["snippet"]["topLevelComment"][
+                    "snippet"
+                ]
+                comment = Comment(
+                    id=comment_thread["id"],
+                    post_id=video_id,
+                    content=comment_snippet["textDisplay"],
+                    author=comment_snippet["authorDisplayName"],
+                    timestamp=datetime.datetime.strptime(
+                        comment_snippet["publishedAt"], "%Y-%m-%dT%H:%M:%SZ"
+                    ).timestamp(),
+                    reply_to=None,
+                    source=self.source_name,
+                )
+
+                comments.append(comment)
+
+            post = Post(
+                id=video_id,
+                content=f"{title}\n\n{description}",
+                author=channel_title,
+                timestamp=published_at,
+                url=f"https://www.youtube.com/watch?v={video_id}",
+                title=title,
+                source=self.source_name,
+                comments=comments,
+            )
+
+            posts.append(post)
+
+        return posts
+
+    def category_exists(self, category):
+        return True
+
+    def _search_videos(self, query, limit):
+        results = []
+        next_page_token = None
+
+        while len(results) < limit:
+            batch_size = min(50, limit - len(results))
+
+            request = self.youtube.search().list(
+                q=query, 
+                part="snippet", 
+                type="video", 
+                maxResults=batch_size, 
+                pageToken=next_page_token
+            )
+
+            response = request.execute()
+            results.extend(response.get("items", []))
+            logging.info(f"Fetched {len(results)} out of {limit} videos for query '{query}'")
+
+            next_page_token = response.get("nextPageToken")
+            if not next_page_token:
+                logging.warning(f"No more pages of results available for query '{query}'")
+                break
+
+        return results[:limit]
+
+    def _get_video_comments(self, video_id):
+        request = self.youtube.commentThreads().list(
+            part="snippet", videoId=video_id, textFormat="plainText"
+        )
+
+        try:
+            response = request.execute()
+        except HttpError as e:
+            print(f"Error fetching comments for video {video_id}: {e}")
+            return []
+        return response.get("items", [])
--- a/server/core/auth.py
+++ b/server/core/auth.py
@@ -0,0 +1,61 @@
+import re
+
+from server.db.database import PostgresConnector
+from flask_bcrypt import Bcrypt
+
+EMAIL_REGEX = re.compile(r"[^@]+@[^@]+\.[^@]+")
+
+
+class AuthManager:
+    def __init__(self, db: PostgresConnector, bcrypt: Bcrypt):
+        self.db = db
+        self.bcrypt = bcrypt
+
+    # private
+    def _save_user(self, username, email, password_hash):
+        query = """
+            INSERT INTO users (username, email, password_hash)
+            VALUES (%s, %s, %s)
+        """
+        self.db.execute(query, (username, email, password_hash))
+
+    # public
+    def register_user(self, username, email, password):
+        hashed_password = self.bcrypt.generate_password_hash(password).decode("utf-8")
+
+        if len(username) < 3:
+            raise ValueError("Username must be longer than 3 characters")
+
+        if not EMAIL_REGEX.match(email):
+            raise ValueError("Please enter a valid email address")
+
+        if self.get_user_by_email(email):
+            raise ValueError("Email already registered")
+
+        if self.get_user_by_username(username):
+            raise ValueError("Username already taken")
+
+        self._save_user(username, email, hashed_password)
+
+    def authenticate_user(self, username, password):
+        user = self.get_user_by_username(username)
+        if user and self.bcrypt.check_password_hash(user["password_hash"], password):
+            return user
+        return None
+
+    def get_user_by_id(self, user_id):
+        query = "SELECT id, username, email FROM users WHERE id = %s"
+        result = self.db.execute(query, (user_id,), fetch=True)
+        return result[0] if result else None
+
+    def get_user_by_username(self, username) -> dict:
+        query = (
+            "SELECT id, username, email, password_hash FROM users WHERE username = %s"
+        )
+        result = self.db.execute(query, (username,), fetch=True)
+        return result[0] if result else None
+
+    def get_user_by_email(self, email) -> dict:
+        query = "SELECT id, username, email, password_hash FROM users WHERE email = %s"
+        result = self.db.execute(query, (email,), fetch=True)
+        return result[0] if result else None
--- a/server/core/datasets.py
+++ b/server/core/datasets.py
@@ -0,0 +1,202 @@
+import pandas as pd
+from server.db.database import PostgresConnector
+from psycopg2.extras import Json
+from server.exceptions import NonExistentDatasetException
+
+
+class DatasetManager:
+    def __init__(self, db: PostgresConnector):
+        self.db = db
+
+    def authorize_user_dataset(self, dataset_id: int, user_id: int) -> bool:
+        dataset_info = self.get_dataset_info(dataset_id)
+
+        if dataset_info.get("user_id", None) == None:
+            return False
+
+        if dataset_info.get("user_id") != user_id:
+            return False
+
+        return True
+
+    def get_user_datasets(self, user_id: int) -> list[dict]:
+        query = "SELECT * FROM datasets WHERE user_id = %s"
+        return self.db.execute(query, (user_id,), fetch=True)
+
+    def get_dataset_content(self, dataset_id: int) -> pd.DataFrame:
+        query = "SELECT * FROM events WHERE dataset_id = %s"
+        result = self.db.execute(query, (dataset_id,), fetch=True)
+        df = pd.DataFrame(result)
+        if df.empty:
+            return df
+
+        dedupe_columns = [
+            column
+            for column in [
+                "post_id",
+                "parent_id",
+                "reply_to",
+                "author",
+                "type",
+                "timestamp",
+                "dt",
+                "title",
+                "content",
+                "source",
+                "topic",
+            ]
+            if column in df.columns
+        ]
+
+        if dedupe_columns:
+            df = df.drop_duplicates(subset=dedupe_columns, keep="first")
+        else:
+            df = df.drop_duplicates(keep="first")
+
+        return df.reset_index(drop=True)
+
+    def get_dataset_info(self, dataset_id: int) -> dict:
+        query = "SELECT * FROM datasets WHERE id = %s"
+        result = self.db.execute(query, (dataset_id,), fetch=True)
+
+        if not result:
+            raise NonExistentDatasetException(f"Dataset {dataset_id} does not exist")
+
+        return result[0]
+
+    def save_dataset_info(self, user_id: int, dataset_name: str, topics: dict) -> int:
+        query = """
+            INSERT INTO datasets (user_id, name, topics)
+            VALUES (%s, %s, %s)
+            RETURNING id
+        """
+        result = self.db.execute(
+            query, (user_id, dataset_name, Json(topics)), fetch=True
+        )
+        return result[0]["id"] if result else None
+
+    def save_dataset_content(self, dataset_id: int, event_data: pd.DataFrame):
+        if event_data.empty:
+            return
+
+        dedupe_columns = [
+            column for column in ["id", "type", "source"] if column in event_data.columns
+        ]
+        if dedupe_columns:
+            event_data = event_data.drop_duplicates(subset=dedupe_columns, keep="first")
+        else:
+            event_data = event_data.drop_duplicates(keep="first")
+
+        self.delete_dataset_content(dataset_id)
+
+        query = """
+            INSERT INTO events (
+                dataset_id,
+                post_id,
+                type,
+                parent_id,
+                author,
+                title,
+                content,
+                timestamp,
+                date,
+                dt,
+                hour,
+                weekday,
+                reply_to,
+                source,
+                topic,
+                topic_confidence,
+                ner_entities,
+                emotion_anger,
+                emotion_disgust,
+                emotion_fear,
+                emotion_joy,
+                emotion_sadness
+            )
+            VALUES (
+                %s, %s, %s, %s, %s,
+                %s, %s, %s, %s, %s,
+                %s, %s, %s, %s, %s,
+                %s, %s, %s, %s, %s,
+                %s, %s
+            )
+        """
+
+        values = [
+            (
+                dataset_id,
+                row["id"],
+                row["type"],
+                row["parent_id"],
+                row["author"],
+                row.get("title"),
+                row["content"],
+                row["timestamp"],
+                row["date"],
+                row["dt"],
+                row["hour"],
+                row["weekday"],
+                row.get("reply_to"),
+                row["source"],
+                row.get("topic"),
+                row.get("topic_confidence"),
+                Json(row["entities"]) if row.get("entities") is not None else None,
+                row.get("emotion_anger"),
+                row.get("emotion_disgust"),
+                row.get("emotion_fear"),
+                row.get("emotion_joy"),
+                row.get("emotion_sadness"),
+            )
+            for _, row in event_data.iterrows()
+        ]
+
+        self.db.execute_batch(query, values)
+
+    def set_dataset_status(
+        self, dataset_id: int, status: str, status_message: str | None = None
+    ):
+        if status not in ["fetching", "processing", "complete", "error"]:
+            raise ValueError("Invalid status")
+
+        query = """
+            UPDATE datasets
+            SET status = %s,
+                status_message = %s,
+                completed_at = CASE
+                    WHEN %s = 'complete' THEN NOW()
+                    ELSE NULL
+                END
+            WHERE id = %s
+        """
+
+        self.db.execute(query, (status, status_message, status, dataset_id))
+
+    def get_dataset_status(self, dataset_id: int):
+        query = """
+            SELECT status, status_message, completed_at
+            FROM datasets
+            WHERE id = %s
+        """
+
+        result = self.db.execute(query, (dataset_id,), fetch=True)
+
+        if not result:
+            print(result)
+            raise NonExistentDatasetException(f"Dataset {dataset_id} does not exist")
+
+        return result[0]
+
+    def update_dataset_name(self, dataset_id: int, new_name: str):
+        query = "UPDATE datasets SET name = %s WHERE id = %s"
+        self.db.execute(query, (new_name, dataset_id))
+
+    def delete_dataset_info(self, dataset_id: int):
+        query = "DELETE FROM datasets WHERE id = %s"
+
+        self.db.execute(query, (dataset_id,))
+
+    def delete_dataset_content(self, dataset_id: int):
+        query = "DELETE FROM events WHERE dataset_id = %s"
+
+        self.db.execute(query, (dataset_id,))
--- a/server/db/database.py
+++ b/server/db/database.py
@@ -0,0 +1,62 @@
+import os
+import psycopg2
+import os
+from dotenv import load_dotenv
+from psycopg2.extras import RealDictCursor
+from psycopg2.extras import execute_batch
+
+load_dotenv()
+postgres_host = os.getenv("POSTGRES_HOST", "localhost")
+postgres_port = os.getenv("POSTGRES_PORT", 5432)
+postgres_user = os.getenv("POSTGRES_USER", "postgres")
+postgres_password = os.getenv("POSTGRES_PASSWORD", "postgres")
+postgres_db = os.getenv("POSTGRES_DB", "postgres")
+
+from server.exceptions import DatabaseNotConfiguredException
+
+
+class PostgresConnector:
+    """
+    Simple PostgreSQL connector (single connection).
+    """
+
+    def __init__(self):
+
+        try:
+            self.connection = psycopg2.connect(
+                host=postgres_host,
+                port=postgres_port,
+                user=postgres_user,
+                password=postgres_password,
+                database=postgres_db,
+            )
+        except psycopg2.OperationalError as e:
+            raise DatabaseNotConfiguredException(
+                f"Ensure database is up and running: {e}"
+            )
+
+        self.connection.autocommit = False
+
+    def execute(self, query, params=None, fetch=False) -> list:
+        try:
+            with self.connection.cursor(cursor_factory=RealDictCursor) as cursor:
+                cursor.execute(query, params)
+                result = cursor.fetchall() if fetch else None
+            self.connection.commit()
+            return result
+        except Exception:
+            self.connection.rollback()
+            raise
+
+    def execute_batch(self, query, values):
+        try:
+            with self.connection.cursor(cursor_factory=RealDictCursor) as cursor:
+                execute_batch(cursor, query, values)
+            self.connection.commit()
+        except Exception:
+            self.connection.rollback()
+            raise
+
+    def close(self):
+        if self.connection:
+            self.connection.close()
--- a/server/db/schema.sql
+++ b/server/db/schema.sql
@@ -11,15 +11,27 @@ CREATE TABLE datasets (
    user_id INTEGER NOT NULL,
    name VARCHAR(255) NOT NULL,
    description TEXT,
+
+    -- Job state machine
+    status TEXT NOT NULL DEFAULT 'processing',
+    status_message TEXT,
+    completed_at TIMESTAMP,
+
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    topics JSONB,
-    FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE CASCADE
+    FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE CASCADE,
+
+    -- Enforce valid states
+    CONSTRAINT datasets_status_check
+    CHECK (status IN ('fetching', 'processing', 'complete', 'error'))
 );

 CREATE TABLE events (
    /* Required Fields */
    id SERIAL PRIMARY KEY,
    dataset_id INTEGER NOT NULL,
+
+    post_id VARCHAR(255) NOT NULL,
    type VARCHAR(255) NOT NULL,

    author VARCHAR(255) NOT NULL,
@@ -30,7 +42,10 @@ CREATE TABLE events (
    hour INTEGER NOT NULL,
    weekday VARCHAR(255) NOT NULL,

-    /* Comments and Replies */
+    /* Posts Only */
+    title TEXT,
+
+    /* Comments Only*/
    parent_id VARCHAR(255),
    reply_to VARCHAR(255),
    source VARCHAR(255) NOT NULL,
--- a/server/exceptions.py
+++ b/server/exceptions.py
@@ -0,0 +1,8 @@
+class NotAuthorisedException(Exception):
+    pass
+
+class NonExistentDatasetException(Exception):
+    pass
+
+class DatabaseNotConfiguredException(Exception):
+    pass
--- a/server/queue/celery_app.py
+++ b/server/queue/celery_app.py
@@ -0,0 +1,23 @@
+from celery import Celery
+from dotenv import load_dotenv
+from server.utils import get_env
+
+load_dotenv()
+REDIS_URL = get_env("REDIS_URL")
+
+
+def create_celery():
+    celery = Celery(
+        "ethnograph",
+        broker=REDIS_URL,
+        backend=REDIS_URL,
+    )
+    celery.conf.task_serializer = "json"
+    celery.conf.result_serializer = "json"
+    celery.conf.accept_content = ["json"]
+    return celery
+
+
+celery = create_celery()
+
+from server.queue import tasks
--- a/server/queue/tasks.py
+++ b/server/queue/tasks.py
@@ -0,0 +1,84 @@
+from time import time
+
+import pandas as pd
+import logging
+
+from server.queue.celery_app import celery
+from server.analysis.enrichment import DatasetEnrichment
+from server.db.database import PostgresConnector
+from server.core.datasets import DatasetManager
+from server.connectors.registry import get_available_connectors
+
+logger = logging.getLogger(__name__)
+
+
+@celery.task(bind=True, max_retries=3)
+def process_dataset(self, dataset_id: int, posts: list, topics: dict):
+    db = PostgresConnector()
+    dataset_manager = DatasetManager(db)
+
+    try:
+        df = pd.DataFrame(posts)
+
+        dataset_manager.set_dataset_status(
+            dataset_id, "processing", "NLP Processing Started"
+        )
+
+        processor = DatasetEnrichment(df, topics)
+        enriched_df = processor.enrich()
+
+        dataset_manager.save_dataset_content(dataset_id, enriched_df)
+        dataset_manager.set_dataset_status(
+            dataset_id, "complete", "NLP Processing Completed Successfully"
+        )
+    except Exception as e:
+        dataset_manager.set_dataset_status(
+            dataset_id, "error", f"An error occurred: {e}"
+        )
+
+
+@celery.task(bind=True, max_retries=3)
+def fetch_and_process_dataset(
+    self, dataset_id: int, source_info: list[dict], topics: dict
+):
+    connectors = get_available_connectors()
+    db = PostgresConnector()
+    dataset_manager = DatasetManager(db)
+    posts = []
+
+    try:
+        for metadata in source_info:
+            fetch_start = time()
+            name = metadata["name"]
+            search = metadata.get("search")
+            category = metadata.get("category")
+            limit = metadata.get("limit", 100)
+
+            connector = connectors[name]()
+            raw_posts = connector.get_new_posts_by_search(
+                search=search, category=category, post_limit=limit
+            )
+            posts.extend(post.to_dict() for post in raw_posts)
+
+        fetch_time = time() - fetch_start
+        df = pd.DataFrame(posts)
+
+        nlp_start = time()
+
+        dataset_manager.set_dataset_status(
+            dataset_id, "processing", "NLP Processing Started"
+        )
+
+        processor = DatasetEnrichment(df, topics)
+        enriched_df = processor.enrich()
+
+        nlp_time = time() - nlp_start
+
+        dataset_manager.save_dataset_content(dataset_id, enriched_df)
+        dataset_manager.set_dataset_status(
+            dataset_id, "complete", f"Completed Successfully. Fetch time: {fetch_time:.2f}s, NLP time: {nlp_time:.2f}s"
+        )
+    except Exception as e:
+        dataset_manager.set_dataset_status(
+            dataset_id, "error", f"An error occurred: {e}"
+        )
--- a/server/stat_gen.py
+++ b/server/stat_gen.py
@@ -1,135 +0,0 @@
-import datetime
-
-import nltk
-import pandas as pd
-from nltk.corpus import stopwords
-
-from server.analysis.cultural import CulturalAnalysis
-from server.analysis.emotional import EmotionalAnalysis
-from server.analysis.interactional import InteractionAnalysis
-from server.analysis.linguistic import LinguisticAnalysis
-from server.analysis.temporal import TemporalAnalysis
-
-DOMAIN_STOPWORDS = {
-    "www",
-    "https",
-    "http",
-    "boards",
-    "boardsie",
-    "comment",
-    "comments",
-    "discussion",
-    "thread",
-    "post",
-    "posts",
-    "would",
-    "get",
-    "one",
-}
-
-nltk.download("stopwords")
-EXCLUDE_WORDS = set(stopwords.words("english")) | DOMAIN_STOPWORDS
-
-
-class StatGen:
-    def __init__(self) -> None:
-        self.temporal_analysis = TemporalAnalysis()
-        self.emotional_analysis = EmotionalAnalysis()
-        self.interaction_analysis = InteractionAnalysis(EXCLUDE_WORDS)
-        self.linguistic_analysis = LinguisticAnalysis(EXCLUDE_WORDS)
-        self.cultural_analysis = CulturalAnalysis()
-
-    def get_time_analysis(self, df: pd.DataFrame) -> dict:
-        return {
-            "events_per_day": self.temporal_analysis.posts_per_day(df),
-            "weekday_hour_heatmap": self.temporal_analysis.heatmap(df),
-        }
-
-    def get_content_analysis(self, df: pd.DataFrame) -> dict:
-        return {
-            "word_frequencies": self.linguistic_analysis.word_frequencies(df),
-            "common_two_phrases": self.linguistic_analysis.ngrams(df),
-            "common_three_phrases": self.linguistic_analysis.ngrams(df, n=3),
-            "average_emotion_by_topic": self.emotional_analysis.avg_emotion_by_topic(df),
-            "reply_time_by_emotion": self.temporal_analysis.avg_reply_time_per_emotion(df),
-        }
-
-    def get_user_analysis(self, df: pd.DataFrame) -> dict:
-        return {
-            "top_users": self.interaction_analysis.top_users(df),
-            "users": self.interaction_analysis.per_user_analysis(df),
-            "interaction_graph": self.interaction_analysis.interaction_graph(df),
-        }
-
-    def get_interactional_analysis(self, df: pd.DataFrame) -> dict:
-        return {
-            "average_thread_depth": self.interaction_analysis.average_thread_depth(df),
-            "average_thread_length_by_emotion": self.interaction_analysis.average_thread_length_by_emotion(df),
-        }
-
-    def get_cultural_analysis(self, df: pd.DataFrame) -> dict:
-        return {
-            "identity_markers": self.cultural_analysis.get_identity_markers(df),
-            "stance_markers": self.cultural_analysis.get_stance_markers(df),
-            "entity_salience": self.cultural_analysis.get_avg_emotions_per_entity(df),
-        }
-
-    def summary(self, df: pd.DataFrame) -> dict:
-        total_posts = (df["type"] == "post").sum()
-        total_comments = (df["type"] == "comment").sum()
-        events_per_user = df.groupby("author").size()
-
-        return {
-            "total_events": int(len(df)),
-            "total_posts": int(total_posts),
-            "total_comments": int(total_comments),
-            "unique_users": int(events_per_user.count()),
-            "comments_per_post": round(total_comments / max(total_posts, 1), 2),
-            "lurker_ratio": round((events_per_user == 1).mean(), 2),
-            "time_range": {
-                "start": int(df["dt"].min().timestamp()),
-                "end": int(df["dt"].max().timestamp()),
-            },
-            "sources": df["source"].dropna().unique().tolist(),
-        }
-
-    # def filter_by_query(self, df: pd.DataFrame, search_query: str) -> dict:
-    #     filtered_df = df[df["content"].str.contains(search_query, na=False)]
-
-    #     return {
-    #         "rows": len(filtered_df),
-    #         "data": filtered_df.to_dict(orient="records"),
-    #     }
-
-    # def set_time_range(
-    #     self,
-    #     original_df: pd.DataFrame,
-    #     start: datetime.datetime,
-    #     end: datetime.datetime,
-    # ) -> dict:
-    #     df = self._prepare_df(original_df)
-    #     filtered_df = df[(df["dt"] >= start) & (df["dt"] <= end)]
-
-    #     return {
-    #         "rows": len(filtered_df),
-    #         "data": filtered_df.to_dict(orient="records"),
-    #     }
-
-    # def filter_data_sources(
-    #     self, original_df: pd.DataFrame, data_sources: dict
-    # ) -> dict:
-    #     df = self._prepare_df(original_df)
-    #     enabled_sources = [src for src, enabled in data_sources.items() if enabled]
-
-    #     if not enabled_sources:
-    #         raise ValueError("Please choose at least one data source")
-
-    #     filtered_df = df[df["source"].isin(enabled_sources)]
-
-    #     return {
-    #         "rows": len(filtered_df),
-    #         "data": filtered_df.to_dict(orient="records"),
-    #     }
-
-    # def reset_dataset(self, original_df: pd.DataFrame) -> pd.DataFrame:
-    #     return self._prepare_df(original_df)
--- a/server/topics.json
+++ b/server/topics.json
@@ -0,0 +1,67 @@
+{
+  "Personal Life": "daily life, life updates, what happened today, personal stories, life events, reflections",
+
+  "Relationships": "dating, relationships, breakups, friendships, family relationships, marriage, relationship advice",
+
+  "Family & Parenting": "parents, parenting, children, raising kids, family dynamics, family stories",
+
+  "Work & Careers": "jobs, workplaces, office life, promotions, quitting jobs, career advice, workplace drama",
+
+  "Education": "school, studying, exams, university, homework, academic pressure, learning experiences",
+
+  "Money & Finance": "saving money, debt, budgeting, cost of living, financial advice, personal finance",
+
+  "Health & Fitness": "exercise, gym, workouts, running, diet, fitness routines, weight loss",
+
+  "Mental Health": "stress, anxiety, depression, burnout, therapy, emotional wellbeing",
+
+  "Food & Cooking": "meals, cooking, recipes, restaurants, snacks, food opinions",
+
+  "Travel": "holidays, trips, tourism, travel experiences, airports, flights, travel tips",
+
+  "Entertainment": "movies, TV shows, streaming services, celebrities, pop culture",
+
+  "Music": "songs, albums, artists, concerts, music opinions",
+
+  "Gaming": "video games, gaming culture, consoles, PC gaming, esports",
+
+  "Sports": "sports matches, teams, players, competitions, sports opinions",
+
+  "Technology": "phones, gadgets, apps, AI, software, tech trends",
+
+  "Internet Culture": "memes, viral trends, online jokes, internet drama, trending topics",
+
+  "Social Media": "platforms, influencers, content creators, algorithms, online communities",
+
+  "News & Current Events": "breaking news, world events, major incidents, public discussions",
+
+  "Politics": "political debates, elections, government policies, ideology",
+
+  "Culture & Society": "social issues, cultural trends, generational debates, societal changes",
+
+  "Identity & Lifestyle": "personal identity, lifestyle choices, values, self-expression",
+
+  "Hobbies & Interests": "art, photography, crafts, collecting, hobbies",
+
+  "Fashion & Beauty": "clothing, style, makeup, skincare, fashion trends",
+
+  "Animals & Pets": "pets, animal videos, pet care, wildlife",
+
+  "Humour": "jokes, funny stories, sarcasm, memes",
+
+  "Opinions & Debates": "hot takes, controversial opinions, arguments, discussions",
+
+  "Advice & Tips": "life advice, tutorials, how-to tips, recommendations",
+
+  "Product Reviews": "reviews, recommendations, experiences with products",
+
+  "Complaints & Rants": "frustrations, complaining, venting about things",
+
+  "Motivation & Inspiration": "motivational quotes, success stories, encouragement",
+
+  "Questions & Curiosity": "asking questions, seeking opinions, curiosity posts",
+
+  "Celebrations & Achievements": "birthdays, milestones, achievements, good news",
+
+  "Random Thoughts": "shower thoughts, observations, random ideas"
+}
--- a/server/utils.py
+++ b/server/utils.py
@@ -0,0 +1,57 @@
+import datetime
+import os
+from flask import request
+
+def parse_datetime_filter(value):
+    if not value:
+        return None
+
+    try:
+        return datetime.datetime.fromisoformat(value)
+    except ValueError:
+        try:
+            return datetime.datetime.fromtimestamp(float(value))
+        except ValueError as err:
+            raise ValueError(
+                "Date filters must be ISO-8601 strings or Unix timestamps"
+            ) from err
+
+
+def get_request_filters() -> dict:
+    filters = {}
+
+    search_query = request.args.get("search_query") or request.args.get("query")
+    if search_query:
+        filters["search_query"] = search_query
+
+    start_date = parse_datetime_filter(
+        request.args.get("start_date") or request.args.get("start")
+    )
+    if start_date:
+        filters["start_date"] = start_date
+
+    end_date = parse_datetime_filter(
+        request.args.get("end_date") or request.args.get("end")
+    )
+    if end_date:
+        filters["end_date"] = end_date
+
+    data_sources = request.args.getlist("data_sources")
+    if not data_sources:
+        data_sources = request.args.getlist("sources")
+
+    if len(data_sources) == 1 and "," in data_sources[0]:
+        data_sources = [
+            source.strip() for source in data_sources[0].split(",") if source.strip()
+        ]
+
+    if data_sources:
+        filters["data_sources"] = data_sources
+
+    return filters
+
+def get_env(name: str) -> str:
+    value = os.getenv(name)
+    if not value:
+        raise RuntimeError(f"Missing required environment variable: {name}")
+    return value