docs(readme): update readme

docs(report): add Declaration of Originality and Acknowledgements sections
refactor(connector): clean up comments
2026-04-19 13:54:09 +01:00 · 2026-04-18 22:10:16 +01:00 · 2026-04-18 22:10:03 +01:00 · 2026-04-18 16:09:22 +01:00 · 2026-04-18 15:44:04 +01:00 · 2026-04-17 20:31:39 +01:00
81 changed files with 6787 additions and 1473 deletions
--- a/.gitignore
+++ b/.gitignore
@@ -10,4 +10,7 @@ __pycache__/
 node_modules/
 dist/
-*.sh
+helper
 db
 report/build
 .DS_Store
--- a/README.md
+++ b/README.md
@@ -1,29 +1,49 @@
 # crosspost
-**crosspost** is a browser-based tool designed to support *digital ethnography*, the study of how people interact, communicate, and form culture in online spaces such as forums, social media platforms, and comment-driven communities.
+A web-based analytics platform for exploring online communities. Built as a final year CS project at UCC, crosspost ingests data from Reddit, YouTube, and Boards.ie, runs NLP analysis on it (emotion detection, topic classification, named entity recognition, stance markers), and surfaces the results through an interactive dashboard.
 The motivating use case is digital ethnography — studying how people talk, what they talk about, and how culture forms in online spaces. The included dataset is centred on Cork, Ireland.
-The project aims to make it easier for students, researchers, and journalists to collect, organise, and explore online discourse in a structured and ethical way, without requiring deep technical expertise.
+## What it does
 - Fetch posts and comments from Reddit, YouTube, and Boards.ie (or upload your own .jsonl file)
 - Normalise everything into a unified schema regardless of source
 - Run NLP analysis asynchronously in the background via Celery workers
 - Explore results through a tabbed dashboard: temporal patterns, word clouds, emotion breakdowns, user activity, interaction graphs, topic clusters, and more
 - Multi-user support — each user has their own datasets, isolated from everyone else
-By combining data ingestion, analysis, and visualisation in a single system, crosspost turns raw online interactions into meaningful insights about how conversations emerge, evolve, and spread across platforms.
+# Prerequisites
 - Docker & Docker Compose
 - A Reddit App (client id & secret)
 - YouTube Data v3 API Key
-## Goals for this project
+# Setup
- Collect data ethically: enable users to link/upload text, images, and interaction data (messages etc) from specified online communities. Potentially and automated method for importing (using APIs or scraping techniques) could be included as well.
+1) **Clone the Repo**
- Organise content: Store gathered material in a structured database with tagging for themes, dates, and sources.
+```
-Analyse patterns: Use natural language processing (NLP) to detect frequent keywords, sentiment, and interaction networks.
+git clone https://github.com/your-username/crosspost.git
- Visualise insights: Present findings as charts, timelines, and network diagrams to reveal how conversations and topics evolve.
+cd crosspost
- Have clearly stated and explained ethical and privacy guidelines for users. The student will design the architecture, implement data pipelines, integrate basic NLP models, and create an interactive dashboard. 
+```
-Beyond programming, the project involves applying ethical research principles, handling data responsibly, and designing for non-technical users. By the end, the project will demonstrate how computer science can bridge technology and social research — turning raw online interactions into meaningful cultural insights.
+2) **Configure Enviornment Vars**
 ```
 cp example.env .env
 ```
 Fill in each required empty env. Some are already filled in, these are sensible defaults that usually don't need to be changed
-## Scope
+3) **Start everything**
 ```
 docker compose up -d
 ```
-This project focuses on:
+This starts:
- Designing a modular data ingestion pipeline
+- `crosspost_db` — PostgreSQL on port 5432
- Implementing backend data processing and storage
+- `crosspost_redis` — Redis on port 6379
- Integrating lightweight NLP-based analysis
+- `crosspost_flask` — Flask API on port 5000
- Building a simple, accessible frontend for exploration and visualisation
+- `crosspost_worker` — Celery worker for background NLP/fetching tasks
 - `crosspost_frontend` — Vite dev server on port 5173
-# Requirements
+# Data Format for Manual Uploads
 If you want to upload your own data rather than fetch it via the connectors, the expected format is newline-delimited JSON (.jsonl) where each line is a post object:
 ```json
 {"id": "abc123", "author": "username", "title": "Post title", "content": "Post body", "url": "https://...", "timestamp": 1700000000.0, "source": "reddit", "comments": []}
 ```
- **Python** ≥ 3.9
+# Notes
- **Python packages** listed in `requirements.txt`
+- **GPU support**: The Celery worker is configured with `--pool=solo` to avoid memory conflicts when multiple NLP models are loaded. If you have an NVIDIA GPU, uncomment the deploy.resources block in docker-compose.yml and make sure the NVIDIA Container Toolkit is installed.
 - npm ≥ version 11 
--- a/connectors/reddit_api.py
+++ b/connectors/reddit_api.py
@@ -1,178 +0,0 @@
 import requests
 import logging
 import time
 from dto.post import Post
 from dto.user import User
 from dto.comment import Comment
 logger = logging.getLogger(__name__)
 class RedditAPI:
    def __init__(self):
        self.url = "https://www.reddit.com/"
        self.source_name = "Reddit"
    # Public Methods #
    def search_new_subreddit_posts(self, search: str, subreddit: str, limit: int) -> list[Post]:
        params = {
            'q': search,
            'limit': limit,
            'restrict_sr': 'on',
            'sort': 'new'
        }
        logger.info(f"Searching subreddit '{subreddit}' for '{search}' with limit {limit}")
        url = f"r/{subreddit}/search.json"
        posts = []
        while len(posts) < limit:
            batch_limit = min(100, limit - len(posts))
            params['limit'] = batch_limit
            data = self._fetch_post_overviews(url, params)
            batch_posts = self._parse_posts(data)
            logger.debug(f"Fetched {len(batch_posts)} posts from search in subreddit {subreddit}")
            if not batch_posts:
                break
            posts.extend(batch_posts)
        return posts
    def get_new_subreddit_posts(self, subreddit: str, limit: int = 10) -> list[Post]:
        posts = []
        after = None
        url = f"r/{subreddit}/new.json"
        logger.info(f"Fetching new posts from subreddit: {subreddit}")
        while len(posts) < limit:
            batch_limit = min(100, limit - len(posts))
            params = {
                'limit': batch_limit,
                'after': after
            }
            data = self._fetch_post_overviews(url, params)
            batch_posts = self._parse_posts(data)
            logger.debug(f"Fetched {len(batch_posts)} new posts from subreddit {subreddit}")
            if not batch_posts:
                break
            posts.extend(batch_posts)
            after = data['data'].get('after')
            if not after:
                break
        return posts
    def get_user(self, username: str) -> User:
        data = self._fetch_post_overviews(f"user/{username}/about.json", {})
        return self._parse_user(data)
    ## Private Methods ##
    def _parse_posts(self, data) -> list[Post]:
        posts = []
        total_num_posts = len(data['data']['children'])
        current_index = 0
        for item in data['data']['children']:
            current_index += 1
            logger.debug(f"Parsing post {current_index} of {total_num_posts}")
            post_data = item['data']
            post = Post(
                id=post_data['id'],
                author=post_data['author'],
                title=post_data['title'],
                content=post_data.get('selftext', ''),
                url=post_data['url'],
                timestamp=post_data['created_utc'],
                source=self.source_name,
                comments=self._get_post_comments(post_data['id']))
            post.subreddit = post_data['subreddit']
            post.upvotes = post_data['ups']
            posts.append(post)
        return posts
    def _get_post_comments(self, post_id: str) -> list[Comment]:
        comments: list[Comment] = []
        url = f"comments/{post_id}.json"
        data = self._fetch_post_overviews(url, {})
        if len(data) < 2:
            return comments
        comment_data = data[1]['data']['children']
        def _parse_comment_tree(items, parent_id=None):
            for item in items:
                if item['kind'] != 't1':
                    continue
                comment_info = item['data']
                comment = Comment(
                    id=comment_info['id'],
                    post_id=post_id,
                    author=comment_info['author'],
                    content=comment_info.get('body', ''),
                    timestamp=comment_info['created_utc'],
                    reply_to=parent_id or comment_info.get('parent_id', None),
                    source=self.source_name
                )
                comments.append(comment)
                # Process replies recursively
                replies = comment_info.get('replies')
                if replies and isinstance(replies, dict):
                    reply_items = replies.get('data', {}).get('children', [])
                    _parse_comment_tree(reply_items, parent_id=comment.id)
        _parse_comment_tree(comment_data)
        return comments
    def _parse_user(self, data) -> User:
        user_data = data['data']
        user = User(
            username=user_data['name'],
            created_utc=user_data['created_utc'])
        user.karma = user_data['total_karma']
        return user
    def _fetch_post_overviews(self, endpoint: str, params: dict) -> dict:
        url = f"{self.url}{endpoint}"
        max_retries = 15
        backoff = 1 # seconds
        for attempt in range(max_retries):
            try:
                response = requests.get(url, headers={'User-agent': 'python:ethnography-college-project:0.1 (by /u/ThisBirchWood)'}, params=params)
                if response.status_code == 429:
                    wait_time = response.headers.get("Retry-After", backoff)
                    logger.warning(f"Rate limited by Reddit API. Retrying in {wait_time} seconds...")
                    time.sleep(wait_time)
                    backoff *= 2
                    continue
                if response.status_code == 500:
                    logger.warning("Server error from Reddit API. Retrying...")
                    time.sleep(backoff)
                    backoff *= 2
                    continue
                response.raise_for_status()
                return response.json()
            except requests.RequestException as e:
                print(f"Error fetching data from Reddit API: {e}")
                return {}
--- a/connectors/youtube_api.py
+++ b/connectors/youtube_api.py
@@ -1,84 +0,0 @@
 import os
 import datetime
 from dotenv import load_dotenv
 from googleapiclient.discovery import build
 from googleapiclient.errors import HttpError
 from dto.post import Post
 from dto.comment import Comment
 load_dotenv()
 API_KEY = os.getenv("YOUTUBE_API_KEY")
 class YouTubeAPI:
    def __init__(self):
        self.youtube = build('youtube', 'v3', developerKey=API_KEY)
    def search_videos(self, query, limit):
        request = self.youtube.search().list(
            q=query,
            part='snippet',
            type='video',
            maxResults=limit
        )
        response = request.execute()
        return response.get('items', [])
    def get_video_comments(self, video_id, limit):
        request = self.youtube.commentThreads().list(
            part='snippet',
            videoId=video_id,
            maxResults=limit,
            textFormat='plainText'
        )
        try:
            response = request.execute()
        except HttpError as e:
            print(f"Error fetching comments for video {video_id}: {e}")
            return []
        return response.get('items', [])
    def fetch_videos(self, query, video_limit, comment_limit) -> list[Post]:
        videos = self.search_videos(query, video_limit)
        posts = []
        for video in videos:
            video_id = video['id']['videoId']
            snippet = video['snippet']
            title = snippet['title']
            description = snippet['description']
            published_at = datetime.datetime.strptime(snippet['publishedAt'], "%Y-%m-%dT%H:%M:%SZ").timestamp()
            channel_title = snippet['channelTitle']
            comments = []
            comments_data = self.get_video_comments(video_id, comment_limit)
            for comment_thread in comments_data:
                comment_snippet = comment_thread['snippet']['topLevelComment']['snippet']
                comment = Comment(
                    id=comment_thread['id'],
                    post_id=video_id,
                    content=comment_snippet['textDisplay'],
                    author=comment_snippet['authorDisplayName'],
                    timestamp=datetime.datetime.strptime(comment_snippet['publishedAt'], "%Y-%m-%dT%H:%M:%SZ").timestamp(),
                    reply_to=None,
                    source="YouTube"
                )
                comments.append(comment)
            post = Post(
                id=video_id,
                content=f"{title}\n\n{description}",
                author=channel_title,
                timestamp=published_at,
                url=f"https://www.youtube.com/watch?v={video_id}",
                title=title,
                source="YouTube",
                comments=comments
            )
            posts.append(post)
        return posts
--- a/create_dataset.py
+++ b/create_dataset.py
@@ -1,43 +0,0 @@
 import json
 import logging
 from connectors.reddit_api import RedditAPI
 from connectors.boards_api import BoardsAPI
 from connectors.youtube_api import YouTubeAPI
 posts_file = 'posts_test.jsonl'
 reddit_connector = RedditAPI()
 boards_connector = BoardsAPI()
 youtube_connector = YouTubeAPI()
 logging.basicConfig(level=logging.DEBUG)
 logging.getLogger("urllib3").setLevel(logging.WARNING)
 def remove_empty_posts(posts):
    return [post for post in posts if post.content.strip() != ""]
 def save_to_jsonl(filename, posts):
    with open(filename, 'a', encoding='utf-8') as f:
        for post in posts:
            # Convert post object to dict if it's a dataclass
            data = post.to_dict()
            f.write(json.dumps(data) + '\n')
 def main():
    boards_posts = boards_connector.get_new_category_posts('cork-city', 1200, 1200)
    save_to_jsonl(posts_file, boards_posts)
    reddit_posts = reddit_connector.get_new_subreddit_posts('cork', 1200)
    reddit_posts = remove_empty_posts(reddit_posts)
    save_to_jsonl(posts_file, reddit_posts)
    ireland_posts = reddit_connector.search_new_subreddit_posts('cork', 'ireland', 1200)
    ireland_posts = remove_empty_posts(ireland_posts)
    save_to_jsonl(posts_file, ireland_posts)
    youtube_videos = youtube_connector.fetch_videos('cork city', 1200, 1200)
    save_to_jsonl(posts_file, youtube_videos)
 if __name__ == "__main__":
    main()
--- a/docker-compose.dev.yml
+++ b/docker-compose.dev.yml
@@ -28,7 +28,7 @@ services:
      - .env
    ports:
      - "5000:5000"
-    command: flask --app server.app run --host=0.0.0.0 --debug
+    command: gunicorn server.app:app --bind 0.0.0.0:5000 --workers 2 --threads 4
    depends_on:
      - postgres
      - redis
@@ -43,7 +43,7 @@ services:
      - .env
    command: >
      celery -A server.queue.celery_app.celery worker
-      --loglevel=info
+      --loglevel=debug
      --pool=solo
    depends_on:
      - postgres
--- a/dto/user.py
+++ b/dto/user.py
@@ -1,8 +0,0 @@
 # Generic User Data Transfer Object for social media platforms
 class User:
    def __init__(self, username: str, created_utc: int, ):
        self.username = username
        self.created_utc = created_utc
        # Optionals
        self.karma = None
--- a/example.env
+++ b/example.env
@@ -1,13 +1,16 @@
 # API Keys
 YOUTUBE_API_KEY=
 REDDIT_CLIENT_ID=
 REDDIT_CLIENT_SECRET=
 # Database
-POSTGRES_USER=
+# Database
-POSTGRES_PASSWORD=
+POSTGRES_USER=postgres
-POSTGRES_DB=
+POSTGRES_PASSWORD=postgres
-POSTGRES_HOST=
+POSTGRES_DB=mydatabase
 POSTGRES_HOST=postgres
 POSTGRES_PORT=5432
-POSTGRES_DIR=
+POSTGRES_DIR=./db
 # JWT
 JWT_SECRET_KEY=
@@ -18,5 +21,10 @@ HF_HOME=/models/huggingface
 TRANSFORMERS_CACHE=/models/huggingface
 TORCH_HOME=/models/torch
-# Frontend
+# URLs
 FRONTEND_URL=http://localhost:5173
 BACKEND_URL=http://backend:5000
 REDIS_URL=redis://redis:6379/0
 # API & Scraping
 MAX_FETCH_LIMIT=1000
--- a/frontend/Dockerfile
+++ b/frontend/Dockerfile
@@ -10,4 +10,4 @@ COPY . .
 EXPOSE 5173
-CMD ["npm", "run", "dev", "--", "--host"]
+CMD ["npm", "run", "dev", "--", "--host", "0.0.0.0"]
--- a/frontend/index.html
+++ b/frontend/index.html
@@ -2,7 +2,7 @@
 <html lang="en">
  <head>
    <meta charset="UTF-8" />
-    <link rel="icon" type="image/svg+xml" href="/vite.svg" />
+    <link rel="icon" type="image/png" href="/icon.png" />
    <meta name="viewport" content="width=device-width, initial-scale=1.0" />
    <title>frontend</title>
  </head>
--- a/frontend/public/icon.png
+++ b/frontend/public/icon.png
--- a/frontend/src/App.tsx
+++ b/frontend/src/App.tsx
@@ -5,6 +5,7 @@ import DatasetsPage from "./pages/Datasets";
 import DatasetStatusPage from "./pages/DatasetStatus";
 import LoginPage from "./pages/Login";
 import UploadPage from "./pages/Upload";
 import AutoFetchPage from "./pages/AutoFetch";
 import StatPage from "./pages/Stats";
 import { getDocumentTitle } from "./utils/documentTitle";
 import DatasetEditPage from "./pages/DatasetEdit";
@@ -22,6 +23,7 @@ function App() {
        <Route path="/" element={<Navigate to="/login" replace />} />
        <Route path="/login" element={<LoginPage />} />
        <Route path="/upload" element={<UploadPage />} />
        <Route path="/auto-fetch" element={<AutoFetchPage />} />
        <Route path="/datasets" element={<DatasetsPage />} />
        <Route path="/dataset/:datasetId/status" element={<DatasetStatusPage />} />
        <Route path="/dataset/:datasetId/stats" element={<StatPage />} />
--- a/frontend/src/components/AppLayout.tsx
+++ b/frontend/src/components/AppLayout.tsx
@@ -3,7 +3,7 @@ import axios from "axios";
 import { Outlet, useLocation, useNavigate } from "react-router-dom";
 import StatsStyling from "../styles/stats_styling";
-const API_BASE_URL = import.meta.env.VITE_BACKEND_URL
+const API_BASE_URL = import.meta.env.VITE_BACKEND_URL;
 type ProfileResponse = {
  user?: Record<string, unknown>;
@@ -33,7 +33,10 @@ const AppLayout = () => {
  const location = useLocation();
  const navigate = useNavigate();
  const [isSignedIn, setIsSignedIn] = useState(false);
-  const [currentUser, setCurrentUser] = useState<Record<string, unknown> | null>(null);
+  const [currentUser, setCurrentUser] = useState<Record<
    string,
    unknown
  > | null>(null);
  const syncAuthState = useCallback(async () => {
    const token = localStorage.getItem("access_token");
@@ -48,7 +51,9 @@ const AppLayout = () => {
    axios.defaults.headers.common.Authorization = `Bearer ${token}`;
    try {
-      const response = await axios.get<ProfileResponse>(`${API_BASE_URL}/profile`);
+      const response = await axios.get<ProfileResponse>(
        `${API_BASE_URL}/profile`,
      );
      setIsSignedIn(true);
      setCurrentUser(response.data.user ?? null);
    } catch {
@@ -81,27 +86,35 @@ const AppLayout = () => {
      <div style={{ ...styles.container, ...styles.appHeaderWrap }}>
        <div style={{ ...styles.card, ...styles.headerBar }}>
          <div style={styles.appHeaderBrandRow}>
-            <span style={styles.appTitle}>
+            <span style={styles.appTitle}>CrossPost Analysis Engine</span>
              CrossPost Analysis Engine
            </span>
            <span
              style={{
                ...styles.authStatusBadge,
-                ...(isSignedIn ? styles.authStatusSignedIn : styles.authStatusSignedOut),
+                ...(isSignedIn
                  ? styles.authStatusSignedIn
                  : styles.authStatusSignedOut),
              }}
            >
-              {isSignedIn ? `Signed in: ${getUserLabel(currentUser)}` : "Not signed in"}
+              {isSignedIn
                ? `Signed in: ${getUserLabel(currentUser)}`
                : "Not signed in"}
            </span>
          </div>
          <div style={styles.controlsWrapped}>
-            {isSignedIn && <button
+            {isSignedIn && (
-              type="button"
+              <button
-              style={location.pathname === "/datasets" ? styles.buttonPrimary : styles.buttonSecondary}
+                type="button"
-              onClick={() => navigate("/datasets")}
+                style={
-            >
+                  location.pathname === "/datasets"
-              My datasets
+                    ? styles.buttonPrimary
-            </button>}
+                    : styles.buttonSecondary
                }
                onClick={() => navigate("/datasets")}
              >
                My datasets
              </button>
            )}
            <button
              type="button"
--- a/frontend/src/components/Card.tsx
+++ b/frontend/src/components/Card.tsx
@@ -8,20 +8,20 @@ const Card = (props: {
  value: string | number;
  sublabel?: string;
  rightSlot?: React.ReactNode;
-  style?: CSSProperties
+  style?: CSSProperties;
 }) => {
  return (
    <div style={{ ...styles.cardBase, ...props.style }}>
      <div style={styles.cardTopRow}>
-        <div style={styles.cardLabel}>
+        <div style={styles.cardLabel}>{props.label}</div>
                {props.label}
        </div>
        {props.rightSlot ? <div>{props.rightSlot}</div> : null}
      </div>
      <div style={styles.cardValue}>{props.value}</div>
-      {props.sublabel ? <div style={styles.cardSubLabel}>{props.sublabel}</div> : null}
+      {props.sublabel ? (
        <div style={styles.cardSubLabel}>{props.sublabel}</div>
      ) : null}
    </div>
  );
-}
+};
 export default Card;
--- a/frontend/src/components/ConfirmationModal.tsx
+++ b/frontend/src/components/ConfirmationModal.tsx
@@ -34,10 +34,20 @@ export default function ConfirmationModal({
          <p style={styles.sectionSubtitle}>{message}</p>
          <div style={{ display: "flex", justifyContent: "flex-end", gap: 8 }}>
-            <button type="button" onClick={onCancel} style={styles.buttonSecondary} disabled={loading}>
+            <button
              type="button"
              onClick={onCancel}
              style={styles.buttonSecondary}
              disabled={loading}
            >
              {cancelLabel}
            </button>
-            <button type="button" onClick={onConfirm} style={styles.buttonDanger} disabled={loading}>
+            <button
              type="button"
              onClick={onConfirm}
              style={styles.buttonDanger}
              disabled={loading}
            >
              {loading ? "Deleting..." : confirmLabel}
            </button>
          </div>
--- a/frontend/src/components/CorpusExplorer.tsx
+++ b/frontend/src/components/CorpusExplorer.tsx
@@ -0,0 +1,247 @@
 import { useEffect, useState } from "react";
 import { Dialog, DialogPanel, DialogTitle } from "@headlessui/react";
 import StatsStyling from "../styles/stats_styling";
 import type { DatasetRecord } from "../utils/corpusExplorer";
 const styles = StatsStyling;
 const INITIAL_RECORD_COUNT = 60;
 const RECORD_BATCH_SIZE = 60;
 const EXCERPT_LENGTH = 320;
 const cleanText = (value: unknown) => {
  if (typeof value !== "string") {
    return "";
  }
  const trimmed = value.trim();
  if (!trimmed) {
    return "";
  }
  const lowered = trimmed.toLowerCase();
  if (lowered === "nan" || lowered === "null" || lowered === "undefined") {
    return "";
  }
  return trimmed;
 };
 const displayText = (value: unknown, fallback: string) => {
  const cleaned = cleanText(value);
  return cleaned || fallback;
 };
 type CorpusExplorerProps = {
  open: boolean;
  onClose: () => void;
  title: string;
  description: string;
  records: DatasetRecord[];
  loading: boolean;
  error: string;
  emptyMessage: string;
 };
 const formatRecordDate = (record: DatasetRecord) => {
  if (typeof record.dt === "string" && record.dt) {
    const date = new Date(record.dt);
    if (!Number.isNaN(date.getTime())) {
      return date.toLocaleString();
    }
  }
  if (typeof record.date === "string" && record.date) {
    return record.date;
  }
  if (typeof record.timestamp === "number") {
    return new Date(record.timestamp * 1000).toLocaleString();
  }
  return "Unknown time";
 };
 const getRecordKey = (record: DatasetRecord, index: number) =>
  String(record.id ?? record.post_id ?? `${record.author ?? "record"}-${index}`);
 const getRecordTitle = (record: DatasetRecord) => {
  if (record.type === "comment") {
    return "";
  }
  const title = cleanText(record.title);
  if (title) {
    return title;
  }
  const content = cleanText(record.content);
  if (!content) {
    return "Untitled record";
  }
  return content.length > 120 ? `${content.slice(0, 117)}...` : content;
 };
 const CorpusExplorer = ({
  open,
  onClose,
  title,
  description,
  records,
  loading,
  error,
  emptyMessage,
 }: CorpusExplorerProps) => {
  const [visibleCount, setVisibleCount] = useState(INITIAL_RECORD_COUNT);
  const [expandedKeys, setExpandedKeys] = useState<Record<string, boolean>>({});
  useEffect(() => {
    if (open) {
      setVisibleCount(INITIAL_RECORD_COUNT);
      setExpandedKeys({});
    }
  }, [open, title, records.length]);
  const hasMoreRecords = visibleCount < records.length;
  return (
    <Dialog open={open} onClose={onClose} style={styles.modalRoot}>
      <div style={styles.modalBackdrop} />
      <div style={styles.modalContainer}>
        <DialogPanel
          style={{
            ...styles.card,
            ...styles.modalPanel,
            width: "min(960px, 96vw)",
            maxHeight: "88vh",
            display: "flex",
            flexDirection: "column",
            gap: 12,
            overflow: "hidden",
          }}
        >
          <div style={styles.headerBar}>
            <div style={{ minWidth: 0 }}>
              <DialogTitle style={styles.sectionTitle}>{title}</DialogTitle>
              <p style={styles.sectionSubtitle}>
                {description} {loading ? "Loading records..." : `${records.length.toLocaleString()} records.`}
              </p>
            </div>
            <button onClick={onClose} style={styles.buttonSecondary}>
              Close
            </button>
          </div>
          {error ? <p style={styles.sectionSubtitle}>{error}</p> : null}
          {!loading && !error && !records.length ? (
            <p style={styles.sectionSubtitle}>{emptyMessage}</p>
          ) : null}
          {loading ? <div style={styles.topUserMeta}>Preparing corpus slice...</div> : null}
          {!loading && !error && records.length ? (
            <>
              <div
                style={{
                  ...styles.topUsersList,
                  overflowY: "auto",
                  overflowX: "hidden",
                  paddingRight: 4,
                }}
              >
                {records.slice(0, visibleCount).map((record, index) => {
                  const recordKey = getRecordKey(record, index);
                  const titleText = getRecordTitle(record);
                  const content = cleanText(record.content);
                  const isExpanded = !!expandedKeys[recordKey];
                  const canExpand = content.length > EXCERPT_LENGTH;
                  const excerpt =
                    canExpand && !isExpanded
                      ? `${content.slice(0, EXCERPT_LENGTH - 3)}...`
                      : content || "No content available.";
                  return (
                    <div key={recordKey} style={styles.topUserItem}>
                      <div style={{ ...styles.headerBar, alignItems: "flex-start" }}>
                        <div style={{ minWidth: 0, flex: 1 }}>
                          {titleText ? <div style={styles.topUserName}>{titleText}</div> : null}
                          <div
                            style={{
                              ...styles.topUserMeta,
                              overflowWrap: "anywhere",
                              wordBreak: "break-word",
                            }}
                          >
                            {displayText(record.author, "Unknown author")} • {displayText(record.source, "Unknown source")} • {displayText(record.type, "record")} • {formatRecordDate(record)}
                          </div>
                        </div>
                        <div
                          style={{
                            ...styles.topUserMeta,
                            marginLeft: 12,
                            textAlign: "right",
                            overflowWrap: "anywhere",
                            wordBreak: "break-word",
                          }}
                        >
                          {cleanText(record.topic) ? `Topic: ${cleanText(record.topic)}` : ""}
                        </div>
                      </div>
                      <div
                        style={{
                          ...styles.topUserMeta,
                          marginTop: 8,
                          whiteSpace: "pre-wrap",
                          overflowWrap: "anywhere",
                          wordBreak: "break-word",
                        }}
                      >
                        {excerpt}
                      </div>
                      {canExpand ? (
                        <div style={{ marginTop: 10 }}>
                          <button
                            onClick={() =>
                              setExpandedKeys((current) => ({
                                ...current,
                                [recordKey]: !current[recordKey],
                              }))
                            }
                            style={styles.buttonSecondary}
                          >
                            {isExpanded ? "Show Less" : "Show More"}
                          </button>
                        </div>
                      ) : null}
                    </div>
                  );
                })}
              </div>
              {hasMoreRecords ? (
                <div style={{ display: "flex", justifyContent: "center" }}>
                  <button
                    onClick={() =>
                      setVisibleCount((current) => current + RECORD_BATCH_SIZE)
                    }
                    style={styles.buttonSecondary}
                  >
                    Show More Records
                  </button>
                </div>
              ) : null}
            </>
          ) : null}
        </DialogPanel>
      </div>
    </Dialog>
  );
 };
 export default CorpusExplorer;
--- a/frontend/src/components/CulturalStats.tsx
+++ b/frontend/src/components/CulturalStats.tsx
@@ -0,0 +1,249 @@
 import Card from "./Card";
 import StatsStyling from "../styles/stats_styling";
 import type { CulturalAnalysisResponse } from "../types/ApiTypes";
 import {
  buildCertaintySpec,
  buildDeonticSpec,
  buildEntitySpec,
  buildHedgeSpec,
  buildIdentityBucketSpec,
  buildPermissionSpec,
  type CorpusExplorerSpec,
 } from "../utils/corpusExplorer";
 const styles = StatsStyling;
 const exploreButtonStyle = { padding: "4px 8px", fontSize: 12 };
 type CulturalStatsProps = {
  data: CulturalAnalysisResponse;
  onExplore: (spec: CorpusExplorerSpec) => void;
 };
 const renderExploreButton = (onClick: () => void) => (
  <button
    onClick={onClick}
    style={{ ...styles.buttonSecondary, ...exploreButtonStyle }}
  >
    Explore
  </button>
 );
 const CulturalStats = ({ data, onExplore }: CulturalStatsProps) => {
  const identity = data.identity_markers;
  const stance = data.stance_markers;
  const inGroupWords = identity?.in_group_usage ?? 0;
  const outGroupWords = identity?.out_group_usage ?? 0;
  const totalGroupWords = inGroupWords + outGroupWords;
  const inGroupWordRate =
    typeof identity?.in_group_ratio === "number"
      ? identity.in_group_ratio * 100
      : null;
  const outGroupWordRate =
    typeof identity?.out_group_ratio === "number"
      ? identity.out_group_ratio * 100
      : null;
  const rawEntities = data.avg_emotion_per_entity?.entity_emotion_avg ?? {};
  const entities = Object.entries(rawEntities)
    .sort((a, b) => b[1].post_count - a[1].post_count)
    .slice(0, 20);
  const topEmotion = (emotionAvg: Record<string, number> | undefined) => {
    const entries = Object.entries(emotionAvg ?? {});
    if (!entries.length) {
      return "-";
    }
    entries.sort((a, b) => b[1] - a[1]);
    const dominant = entries[0] ?? ["emotion_unknown", 0];
    const dominantLabel = dominant[0].replace("emotion_", "");
    return `${dominantLabel} (${(dominant[1] * 100).toFixed(1)}%)`;
  };
  return (
    <div style={styles.page}>
      <div style={{ ...styles.container, ...styles.grid }}>
        <div style={{ ...styles.card, gridColumn: "span 12" }}>
          <h2 style={styles.sectionTitle}>Community Framing Overview</h2>
          <p style={styles.sectionSubtitle}>
            Simple view of how often people use "us" words vs "them" words, and
            the tone around that language.
          </p>
        </div>
        <Card
          label="In-Group Words"
          value={inGroupWords.toLocaleString()}
          sublabel="Times we/us/our appears"
          style={{ gridColumn: "span 3" }}
        />
        <Card
          label="Out-Group Words"
          value={outGroupWords.toLocaleString()}
          sublabel="Times they/them/their appears"
          style={{ gridColumn: "span 3" }}
        />
        <Card
          label="In-Group Posts"
          value={identity?.in_group_posts?.toLocaleString() ?? "-"}
          sublabel='Posts leaning toward "us" language'
          rightSlot={renderExploreButton(() =>
            onExplore(buildIdentityBucketSpec("in")),
          )}
          style={{ gridColumn: "span 3" }}
        />
        <Card
          label="Out-Group Posts"
          value={identity?.out_group_posts?.toLocaleString() ?? "-"}
          sublabel='Posts leaning toward "them" language'
          rightSlot={renderExploreButton(() =>
            onExplore(buildIdentityBucketSpec("out")),
          )}
          style={{ gridColumn: "span 3" }}
        />
        <Card
          label="Balanced Posts"
          value={identity?.tie_posts?.toLocaleString() ?? "-"}
          sublabel="Posts with equal us/them signals"
          rightSlot={renderExploreButton(() =>
            onExplore(buildIdentityBucketSpec("tie")),
          )}
          style={{ gridColumn: "span 3" }}
        />
        <Card
          label="Total Group Words"
          value={totalGroupWords.toLocaleString()}
          sublabel="In-group + out-group words"
          style={{ gridColumn: "span 3" }}
        />
        <Card
          label="In-Group Share"
          value={
            inGroupWordRate === null ? "-" : `${inGroupWordRate.toFixed(2)}%`
          }
          sublabel="Share of all words"
          style={{ gridColumn: "span 3" }}
        />
        <Card
          label="Out-Group Share"
          value={
            outGroupWordRate === null ? "-" : `${outGroupWordRate.toFixed(2)}%`
          }
          sublabel="Share of all words"
          style={{ gridColumn: "span 3" }}
        />
        <Card
          label="Hedging Words"
          value={stance?.hedge_total?.toLocaleString() ?? "-"}
          sublabel={
            typeof stance?.hedge_per_1k_tokens === "number"
              ? `${stance.hedge_per_1k_tokens.toFixed(1)} per 1k words`
              : "Word frequency"
          }
          rightSlot={renderExploreButton(() => onExplore(buildHedgeSpec()))}
          style={{ gridColumn: "span 3" }}
        />
        <Card
          label="Certainty Words"
          value={stance?.certainty_total?.toLocaleString() ?? "-"}
          sublabel={
            typeof stance?.certainty_per_1k_tokens === "number"
              ? `${stance.certainty_per_1k_tokens.toFixed(1)} per 1k words`
              : "Word frequency"
          }
          rightSlot={renderExploreButton(() => onExplore(buildCertaintySpec()))}
          style={{ gridColumn: "span 3" }}
        />
        <Card
          label="Need/Should Words"
          value={stance?.deontic_total?.toLocaleString() ?? "-"}
          sublabel={
            typeof stance?.deontic_per_1k_tokens === "number"
              ? `${stance.deontic_per_1k_tokens.toFixed(1)} per 1k words`
              : "Word frequency"
          }
          rightSlot={renderExploreButton(() => onExplore(buildDeonticSpec()))}
          style={{ gridColumn: "span 3" }}
        />
        <Card
          label="Permission Words"
          value={stance?.permission_total?.toLocaleString() ?? "-"}
          sublabel={
            typeof stance?.permission_per_1k_tokens === "number"
              ? `${stance.permission_per_1k_tokens.toFixed(1)} per 1k words`
              : "Word frequency"
          }
          rightSlot={renderExploreButton(() => onExplore(buildPermissionSpec()))}
          style={{ gridColumn: "span 3" }}
        />
        <div style={{ ...styles.card, gridColumn: "span 6" }}>
          <h2 style={styles.sectionTitle}>Mood in "Us" Posts</h2>
          <p style={styles.sectionSubtitle}>
            Most likely emotion when in-group wording is stronger.
          </p>
          <div style={styles.topUserName}>{topEmotion(identity?.in_group_emotion_avg)}</div>
          <div style={{ marginTop: 12 }}>
            <button
              onClick={() => onExplore(buildIdentityBucketSpec("in"))}
              style={styles.buttonSecondary}
            >
              Explore records
            </button>
          </div>
        </div>
        <div style={{ ...styles.card, gridColumn: "span 6" }}>
          <h2 style={styles.sectionTitle}>Mood in "Them" Posts</h2>
          <p style={styles.sectionSubtitle}>
            Most likely emotion when out-group wording is stronger.
          </p>
          <div style={styles.topUserName}>{topEmotion(identity?.out_group_emotion_avg)}</div>
          <div style={{ marginTop: 12 }}>
            <button
              onClick={() => onExplore(buildIdentityBucketSpec("out"))}
              style={styles.buttonSecondary}
            >
              Explore records
            </button>
          </div>
        </div>
        <div style={{ ...styles.card, gridColumn: "span 12" }}>
          <h2 style={styles.sectionTitle}>Entity Mood Snapshot</h2>
          <p style={styles.sectionSubtitle}>
            Most mentioned entities and the mood that appears most with each.
          </p>
          {!entities.length ? (
            <div style={styles.topUserMeta}>No entity-level cultural data available.</div>
          ) : (
            <div
              style={{
                ...styles.topUsersList,
                maxHeight: 420,
                overflowY: "auto",
              }}
            >
              {entities.map(([entity, aggregate]) => (
                <div
                  key={entity}
                  style={{ ...styles.topUserItem, cursor: "pointer" }}
                  onClick={() => onExplore(buildEntitySpec(entity))}
                >
                  <div style={styles.topUserName}>{entity}</div>
                  <div style={styles.topUserMeta}>
                    {aggregate.post_count.toLocaleString()} posts • Likely mood:{" "}
                    {topEmotion(aggregate.emotion_avg)}
                  </div>
                </div>
              ))}
            </div>
          )}
        </div>
      </div>
    </div>
  );
 };
 export default CulturalStats;
--- a/frontend/src/components/EmotionalStats.tsx
+++ b/frontend/src/components/EmotionalStats.tsx
@@ -1,14 +1,25 @@
-import type { ContentAnalysisResponse } from "../types/ApiTypes"
+import type { EmotionalAnalysisResponse } from "../types/ApiTypes";
 import StatsStyling from "../styles/stats_styling";
 import {
  buildDominantEmotionSpec,
  buildSourceSpec,
  buildTopicSpec,
  type CorpusExplorerSpec,
 } from "../utils/corpusExplorer";
 const styles = StatsStyling;
 type EmotionalStatsProps = {
-  contentData: ContentAnalysisResponse;
+  emotionalData: EmotionalAnalysisResponse;
-}
+  onExplore: (spec: CorpusExplorerSpec) => void;
 };
-const EmotionalStats = ({contentData}: EmotionalStatsProps) => {
+const EmotionalStats = ({ emotionalData, onExplore }: EmotionalStatsProps) => {
-  const rows = contentData.average_emotion_by_topic ?? [];
+  const rows = emotionalData.average_emotion_by_topic ?? [];
  const overallEmotionAverage = emotionalData.overall_emotion_average ?? [];
  const dominantEmotionDistribution =
    emotionalData.dominant_emotion_distribution ?? [];
  const emotionBySource = emotionalData.emotion_by_source ?? [];
  const lowSampleThreshold = 20;
  const stableSampleThreshold = 50;
  const emotionKeys = rows.length
@@ -31,7 +42,7 @@ const EmotionalStats = ({contentData}: EmotionalStatsProps) => {
      topic: String(row.topic),
      count: Number(row.n ?? 0),
      emotion: maxKey.replace("emotion_", "") || "unknown",
-      value: maxValue > Number.NEGATIVE_INFINITY ? maxValue : 0
+      value: maxValue > Number.NEGATIVE_INFINITY ? maxValue : 0,
    };
  });
@@ -45,8 +56,12 @@ const EmotionalStats = ({contentData}: EmotionalStatsProps) => {
    .filter((count) => Number.isFinite(count) && count > 0)
    .sort((a, b) => a - b);
-  const lowSampleTopics = strongestPerTopic.filter((topic) => topic.count < lowSampleThreshold).length;
+  const lowSampleTopics = strongestPerTopic.filter(
-  const stableSampleTopics = strongestPerTopic.filter((topic) => topic.count >= stableSampleThreshold).length;
+    (topic) => topic.count < lowSampleThreshold,
  ).length;
  const stableSampleTopics = strongestPerTopic.filter(
    (topic) => topic.count >= stableSampleThreshold,
  ).length;
  const medianSampleSize = sampleSizes.length
    ? sampleSizes[Math.floor(sampleSizes.length / 2)]
@@ -64,42 +79,184 @@ const EmotionalStats = ({contentData}: EmotionalStatsProps) => {
  return (
    <div style={styles.page}>
      <div style={{ ...styles.container, ...styles.card, marginTop: 16 }}>
-        <h2 style={styles.sectionTitle}>Average Emotion by Topic</h2>
+        <h2 style={styles.sectionTitle}>Topic Mood Overview</h2>
-        <p style={styles.sectionSubtitle}>Read confidence together with sample size. Topics with fewer than {lowSampleThreshold} events are usually noisy and less reliable.</p>
+        <p style={styles.sectionSubtitle}>
          Use the strength score together with post count. Topics with fewer
          than {lowSampleThreshold} events are often noisy.
        </p>
        <div style={styles.emotionalSummaryRow}>
-          <span><strong style={{ color: "#24292f" }}>Topics:</strong> {strongestPerTopic.length}</span>
+          <span>
-          <span><strong style={{ color: "#24292f" }}>Median Sample:</strong> {medianSampleSize} events</span>
+            <strong style={{ color: "#24292f" }}>Topics:</strong>{" "}
-          <span><strong style={{ color: "#24292f" }}>Low Sample (&lt;{lowSampleThreshold}):</strong> {lowSampleTopics}</span>
+            {strongestPerTopic.length}
-          <span><strong style={{ color: "#24292f" }}>Stable Sample ({stableSampleThreshold}+):</strong> {stableSampleTopics}</span>
+          </span>
          <span>
            <strong style={{ color: "#24292f" }}>Median Posts:</strong>{" "}
            {medianSampleSize}
          </span>
          <span>
            <strong style={{ color: "#24292f" }}>
              Small Topics (&lt;{lowSampleThreshold}):
            </strong>{" "}
            {lowSampleTopics}
          </span>
          <span>
            <strong style={{ color: "#24292f" }}>
              Stable Topics ({stableSampleThreshold}+):
            </strong>{" "}
            {stableSampleTopics}
          </span>
        </div>
-        <p style={{ ...styles.sectionSubtitle, marginTop: 10, marginBottom: 0 }}>
+        <p
-          Confidence reflects how strongly one emotion leads within a topic, not model accuracy. Use larger samples for stronger conclusions.
+          style={{ ...styles.sectionSubtitle, marginTop: 10, marginBottom: 0 }}
        >
          Strength means how far the top emotion is ahead in that topic. It does
          not mean model accuracy.
        </p>
      </div>
      <div style={{ ...styles.container, ...styles.grid }}>
-        {strongestPerTopic.map((topic) => (
+        <div style={{ ...styles.card, gridColumn: "span 4" }}>
-          <div key={topic.topic} style={{ ...styles.card, gridColumn: "span 4" }}>
+          <h2 style={styles.sectionTitle}>Mood Averages</h2>
-            <h3 style={{ ...styles.sectionTitle, marginBottom: 6 }}>{topic.topic}</h3>
+          <p style={styles.sectionSubtitle}>Average score for each emotion.</p>
-            <div style={styles.emotionalTopicLabel}>
+          {!overallEmotionAverage.length ? (
-              Top Emotion
+            <div style={styles.topUserMeta}>
              No overall emotion averages available.
            </div>
-            <div style={styles.emotionalTopicValue}>
+          ) : (
-              {formatEmotion(topic.emotion)}
+            <div
              style={{
                ...styles.topUsersList,
                maxHeight: 260,
                overflowY: "auto",
              }}
            >
              {[...overallEmotionAverage]
                .sort((a, b) => b.score - a.score)
                .map((row) => (
                  <div
                    key={row.emotion}
                    style={{ ...styles.topUserItem, cursor: "pointer" }}
                    onClick={() => onExplore(buildDominantEmotionSpec(row.emotion))}
                  >
                    <div style={styles.topUserName}>
                      {formatEmotion(row.emotion)}
                    </div>
                    <div style={styles.topUserMeta}>{row.score.toFixed(3)}</div>
                  </div>
                ))}
            </div>
-            <div style={styles.emotionalMetricRow}>
+          )}
-              <span>Confidence</span>
+        </div>
-              <span style={styles.emotionalMetricValue}>{topic.value.toFixed(3)}</span>
+
        <div style={{ ...styles.card, gridColumn: "span 4" }}>
          <h2 style={styles.sectionTitle}>Mood Split</h2>
          <p style={styles.sectionSubtitle}>
            How often each emotion is dominant.
          </p>
          {!dominantEmotionDistribution.length ? (
            <div style={styles.topUserMeta}>
              No dominant-emotion split available.
            </div>
-            <div style={styles.emotionalMetricRowCompact}>
+          ) : (
-              <span>Sample Size</span>
+            <div
-              <span style={styles.emotionalMetricValue}>{topic.count} events</span>
+              style={{
                ...styles.topUsersList,
                maxHeight: 260,
                overflowY: "auto",
              }}
            >
              {[...dominantEmotionDistribution]
                .sort((a, b) => b.ratio - a.ratio)
                .map((row) => (
                  <div
                    key={row.emotion}
                    style={{ ...styles.topUserItem, cursor: "pointer" }}
                    onClick={() => onExplore(buildDominantEmotionSpec(row.emotion))}
                  >
                    <div style={styles.topUserName}>
                      {formatEmotion(row.emotion)}
                    </div>
                    <div style={styles.topUserMeta}>
                      {(row.ratio * 100).toFixed(1)}% •{" "}
                      {row.count.toLocaleString()} events
                    </div>
                  </div>
                ))}
            </div>
          )}
        </div>
        <div style={{ ...styles.card, gridColumn: "span 4" }}>
          <h2 style={styles.sectionTitle}>Mood by Source</h2>
          <p style={styles.sectionSubtitle}>Leading emotion in each source.</p>
          {!emotionBySource.length ? (
            <div style={styles.topUserMeta}>
              No source emotion profile available.
            </div>
          ) : (
            <div
              style={{
                ...styles.topUsersList,
                maxHeight: 260,
                overflowY: "auto",
              }}
            >
              {[...emotionBySource]
                .sort((a, b) => b.event_count - a.event_count)
                .map((row) => (
                  <div
                    key={row.source}
                    style={{ ...styles.topUserItem, cursor: "pointer" }}
                    onClick={() => onExplore(buildSourceSpec(row.source))}
                  >
                    <div style={styles.topUserName}>{row.source}</div>
                    <div style={styles.topUserMeta}>
                      {formatEmotion(row.dominant_emotion)} •{" "}
                      {row.dominant_score.toFixed(3)} •{" "}
                      {row.event_count.toLocaleString()} events
                    </div>
                  </div>
                ))}
            </div>
          )}
        </div>
        <div style={{ ...styles.card, gridColumn: "span 12" }}>
          <h2 style={styles.sectionTitle}>Topic Snapshots</h2>
          <p style={styles.sectionSubtitle}>
            Per-topic mood with strength and post count.
          </p>
          <div style={{ ...styles.grid, marginTop: 10 }}>
            {strongestPerTopic.map((topic) => (
              <div
                key={topic.topic}
                style={{ ...styles.cardBase, gridColumn: "span 4", cursor: "pointer" }}
                onClick={() => onExplore(buildTopicSpec(topic.topic))}
              >
                <h3 style={{ ...styles.sectionTitle, marginBottom: 6 }}>
                  {topic.topic}
                </h3>
                <div style={styles.emotionalTopicLabel}>Likely Mood</div>
                <div style={styles.emotionalTopicValue}>
                  {formatEmotion(topic.emotion)}
                </div>
                <div style={styles.emotionalMetricRow}>
                  <span>Strength</span>
                  <span style={styles.emotionalMetricValue}>
                    {topic.value.toFixed(3)}
                  </span>
                </div>
                <div style={styles.emotionalMetricRowCompact}>
                  <span>Posts in Topic</span>
                  <span style={styles.emotionalMetricValue}>{topic.count}</span>
                </div>
              </div>
            ))}
          </div>
-        ))}
+        </div>
      </div>
    </div>
  );
-}
+};
 export default EmotionalStats;
--- a/frontend/src/components/InteractionalStats.tsx
+++ b/frontend/src/components/InteractionalStats.tsx
@@ -0,0 +1,262 @@
 import Card from "./Card";
 import StatsStyling from "../styles/stats_styling";
 import type { InteractionAnalysisResponse } from "../types/ApiTypes";
 import {
  ResponsiveContainer,
  BarChart,
  Bar,
  XAxis,
  YAxis,
  CartesianGrid,
  Tooltip,
  PieChart,
  Pie,
  Cell,
  Legend,
 } from "recharts";
 const styles = StatsStyling;
 type InteractionalStatsProps = {
  data: InteractionAnalysisResponse;
 };
 const InteractionalStats = ({ data }: InteractionalStatsProps) => {
  const graph = data.interaction_graph ?? {};
  const userCount = Object.keys(graph).length;
  let edgeCount = 0;
  let interactionVolume = 0;
  for (const targets of Object.values(graph)) {
    for (const value of Object.values(targets)) {
      edgeCount += 1;
      interactionVolume += value;
    }
  }
  const concentration = data.conversation_concentration;
  const topTenCommentShare =
    typeof concentration?.top_10pct_comment_share === "number"
      ? concentration?.top_10pct_comment_share
      : null;
  const topTenAuthorCount =
    typeof concentration?.top_10pct_author_count === "number"
      ? concentration.top_10pct_author_count
      : null;
  const totalCommentingAuthors =
    typeof concentration?.total_commenting_authors === "number"
      ? concentration.total_commenting_authors
      : null;
  const singleCommentAuthorRatio =
    typeof concentration?.single_comment_author_ratio === "number"
      ? concentration.single_comment_author_ratio
      : null;
  const singleCommentAuthors =
    typeof concentration?.single_comment_authors === "number"
      ? concentration.single_comment_authors
      : null;
  const topPairs = (data.top_interaction_pairs ?? [])
    .filter((item): item is [[string, string], number] => {
      if (!Array.isArray(item) || item.length !== 2) {
        return false;
      }
      const pair = item[0];
      const count = item[1];
      return (
        Array.isArray(pair) &&
        pair.length === 2 &&
        typeof pair[0] === "string" &&
        typeof pair[1] === "string" &&
        typeof count === "number"
      );
    })
    .slice(0, 20);
  const topPairChartData = topPairs
    .slice(0, 8)
    .map(([[source, target], value], index) => ({
      pair: `${source} -> ${target}`,
      replies: value,
      rank: index + 1,
    }));
  const topTenSharePercent =
    topTenCommentShare === null ? null : topTenCommentShare * 100;
  const nonTopTenSharePercent =
    topTenSharePercent === null ? null : Math.max(0, 100 - topTenSharePercent);
  let concentrationPieData: { name: string; value: number }[] = [];
  if (topTenSharePercent !== null && nonTopTenSharePercent !== null) {
    concentrationPieData = [
      { name: "Top 10% authors", value: topTenSharePercent },
      { name: "Other authors", value: nonTopTenSharePercent },
    ];
  }
  const PIE_COLORS = ["#2b6777", "#c8d8e4"];
  return (
    <div style={styles.page}>
      <div style={{ ...styles.container, ...styles.grid }}>
        <div style={{ ...styles.card, gridColumn: "span 12" }}>
          <h2 style={styles.sectionTitle}>Conversation Overview</h2>
          <p style={styles.sectionSubtitle}>
            Who talks to who, how much they interact, and how concentrated the replies are.
          </p>
        </div>
        <Card
          label="Users in Network"
          value={userCount.toLocaleString()}
          sublabel="Users in the reply graph"
          style={{ gridColumn: "span 4" }}
        />
        <Card
          label="User-to-User Links"
          value={edgeCount.toLocaleString()}
          sublabel="Unique reply directions"
          style={{ gridColumn: "span 4" }}
        />
        <Card
          label="Total Replies"
          value={interactionVolume.toLocaleString()}
          sublabel="All reply links combined"
          style={{ gridColumn: "span 4" }}
        />
        <Card
          label="Concentrated Replies"
          value={
            topTenSharePercent === null
              ? "-"
              : `${topTenSharePercent.toFixed(1)}%`
          }
          sublabel={
            topTenAuthorCount === null || totalCommentingAuthors === null
              ? "Reply share from the top 10% commenters"
              : `${topTenAuthorCount.toLocaleString()} of ${totalCommentingAuthors.toLocaleString()} authors`
          }
          style={{ gridColumn: "span 6" }}
        />
        <Card
          label="Single-Comment Authors"
          value={
            singleCommentAuthorRatio === null
              ? "-"
              : `${(singleCommentAuthorRatio * 100).toFixed(1)}%`
          }
          sublabel={
            singleCommentAuthors === null
              ? "Authors who commented exactly once"
              : `${singleCommentAuthors.toLocaleString()} authors commented exactly once`
          }
          style={{ gridColumn: "span 6" }}
        />
        <div style={{ ...styles.card, gridColumn: "span 12" }}>
          <h2 style={styles.sectionTitle}>Conversation Visuals</h2>
          <p style={styles.sectionSubtitle}>
            Main reply links and concentration split.
          </p>
          <div style={{ ...styles.grid, marginTop: 12 }}>
            <div style={{ ...styles.cardBase, gridColumn: "span 6" }}>
              <h3 style={{ ...styles.sectionTitle, fontSize: "1rem" }}>
                Top Interaction Pairs
              </h3>
              <div style={{ width: "100%", height: 300 }}>
                <ResponsiveContainer>
                  <BarChart
                    data={topPairChartData}
                    layout="vertical"
                    margin={{ top: 8, right: 16, left: 16, bottom: 8 }}
                  >
                    <CartesianGrid strokeDasharray="3 3" stroke="#d9e2ec" />
                    <XAxis type="number" allowDecimals={false} />
                    <YAxis
                      type="category"
                      dataKey="rank"
                      tickFormatter={(value) => `#${value}`}
                      width={36}
                    />
                    <Tooltip />
                    <Bar
                      dataKey="replies"
                      fill="#2b6777"
                      radius={[0, 6, 6, 0]}
                    />
                  </BarChart>
                </ResponsiveContainer>
              </div>
            </div>
            <div style={{ ...styles.cardBase, gridColumn: "span 6" }}>
              <h3 style={{ ...styles.sectionTitle, fontSize: "1rem" }}>
                Top 10% vs Other Comment Share
              </h3>
              <div style={{ width: "100%", height: 300 }}>
                <ResponsiveContainer>
                  <PieChart>
                    <Pie
                      data={concentrationPieData}
                      dataKey="value"
                      nameKey="name"
                      innerRadius={56}
                      outerRadius={88}
                      paddingAngle={2}
                    >
                      {concentrationPieData.map((entry, index) => (
                        <Cell
                          key={`${entry.name}-${index}`}
                          fill={PIE_COLORS[index % PIE_COLORS.length]}
                        />
                      ))}
                    </Pie>
                    <Tooltip />
                    <Legend verticalAlign="bottom" height={36} />
                  </PieChart>
                </ResponsiveContainer>
              </div>
            </div>
          </div>
        </div>
        <div style={{ ...styles.card, gridColumn: "span 12" }}>
          <h2 style={styles.sectionTitle}>Frequent Reply Paths</h2>
          <p style={styles.sectionSubtitle}>
            Most common user-to-user reply paths.
          </p>
          {!topPairs.length ? (
            <div style={styles.topUserMeta}>
              No interaction pair data available.
            </div>
          ) : (
            <div
              style={{
                ...styles.topUsersList,
                maxHeight: 420,
                overflowY: "auto",
              }}
            >
              {topPairs.map(([[source, target], value], index) => (
                <div
                  key={`${source}->${target}-${index}`}
                  style={styles.topUserItem}
                >
                  <div style={styles.topUserName}>
                    {source} -&gt; {target}
                  </div>
                  <div style={styles.topUserMeta}>
                    {value.toLocaleString()} replies
                  </div>
                </div>
              ))}
            </div>
          )}
        </div>
      </div>
    </div>
  );
 };
 export default InteractionalStats;
--- a/frontend/src/components/LinguisticStats.tsx
+++ b/frontend/src/components/LinguisticStats.tsx
@@ -0,0 +1,137 @@
 import Card from "./Card";
 import StatsStyling from "../styles/stats_styling";
 import type { LinguisticAnalysisResponse } from "../types/ApiTypes";
 import {
  buildNgramSpec,
  buildWordSpec,
  type CorpusExplorerSpec,
 } from "../utils/corpusExplorer";
 const styles = StatsStyling;
 type LinguisticStatsProps = {
  data: LinguisticAnalysisResponse;
  onExplore: (spec: CorpusExplorerSpec) => void;
 };
 const LinguisticStats = ({ data, onExplore }: LinguisticStatsProps) => {
  const lexical = data.lexical_diversity;
  const words = data.word_frequencies ?? [];
  const bigrams = data.common_two_phrases ?? [];
  const trigrams = data.common_three_phrases ?? [];
  const topWords = words.slice(0, 20);
  const topBigrams = bigrams.slice(0, 10);
  const topTrigrams = trigrams.slice(0, 10);
  return (
    <div style={styles.page}>
      <div style={{ ...styles.container, ...styles.grid }}>
        <div style={{ ...styles.card, gridColumn: "span 12" }}>
          <h2 style={styles.sectionTitle}>Language Overview</h2>
          <p style={styles.sectionSubtitle}>
            Quick read on how broad and repetitive the wording is.
          </p>
        </div>
        <Card
          label="Total Words"
          value={lexical?.total_tokens?.toLocaleString() ?? "—"}
          sublabel="Words after basic filtering"
          style={{ gridColumn: "span 4" }}
        />
        <Card
          label="Unique Words"
          value={lexical?.unique_tokens?.toLocaleString() ?? "—"}
          sublabel="Different words used"
          style={{ gridColumn: "span 4" }}
        />
        <Card
          label="Vocabulary Variety"
          value={
            typeof lexical?.ttr === "number" ? lexical.ttr.toFixed(4) : "—"
          }
          sublabel="Higher means less repetition"
          style={{ gridColumn: "span 4" }}
        />
        <div style={{ ...styles.card, gridColumn: "span 4" }}>
          <h2 style={styles.sectionTitle}>Top Words</h2>
          <p style={styles.sectionSubtitle}>Most used single words.</p>
          <div
            style={{
              ...styles.topUsersList,
              maxHeight: 360,
              overflowY: "auto",
            }}
          >
            {topWords.map((item) => (
              <div
                key={item.word}
                style={{ ...styles.topUserItem, cursor: "pointer" }}
                onClick={() => onExplore(buildWordSpec(item.word))}
              >
                <div style={styles.topUserName}>{item.word}</div>
                <div style={styles.topUserMeta}>
                  {item.count.toLocaleString()} uses
                </div>
              </div>
            ))}
          </div>
        </div>
        <div style={{ ...styles.card, gridColumn: "span 4" }}>
          <h2 style={styles.sectionTitle}>Top Bigrams</h2>
          <p style={styles.sectionSubtitle}>Most used 2-word phrases.</p>
          <div
            style={{
              ...styles.topUsersList,
              maxHeight: 360,
              overflowY: "auto",
            }}
          >
            {topBigrams.map((item) => (
              <div
                key={item.ngram}
                style={{ ...styles.topUserItem, cursor: "pointer" }}
                onClick={() => onExplore(buildNgramSpec(item.ngram))}
              >
                <div style={styles.topUserName}>{item.ngram}</div>
                <div style={styles.topUserMeta}>
                  {item.count.toLocaleString()} uses
                </div>
              </div>
            ))}
          </div>
        </div>
        <div style={{ ...styles.card, gridColumn: "span 4" }}>
          <h2 style={styles.sectionTitle}>Top Trigrams</h2>
          <p style={styles.sectionSubtitle}>Most used 3-word phrases.</p>
          <div
            style={{
              ...styles.topUsersList,
              maxHeight: 360,
              overflowY: "auto",
            }}
          >
            {topTrigrams.map((item) => (
              <div
                key={item.ngram}
                style={{ ...styles.topUserItem, cursor: "pointer" }}
                onClick={() => onExplore(buildNgramSpec(item.ngram))}
              >
                <div style={styles.topUserName}>{item.ngram}</div>
                <div style={styles.topUserMeta}>
                  {item.count.toLocaleString()} uses
                </div>
              </div>
            ))}
          </div>
        </div>
      </div>
    </div>
  );
 };
 export default LinguisticStats;
--- a/frontend/src/components/SummaryStats.tsx
+++ b/frontend/src/components/SummaryStats.tsx
@@ -1,4 +1,4 @@
-import { useState } from "react";
+import { memo, useMemo } from "react";
 import {
  LineChart,
  Line,
@@ -6,32 +6,55 @@ import {
  YAxis,
  Tooltip,
  CartesianGrid,
-  ResponsiveContainer
+  ResponsiveContainer,
 } from "recharts";
 import ActivityHeatmap from "../stats/ActivityHeatmap";
-import { ReactWordcloud } from '@cp949/react-wordcloud';
+import { ReactWordcloud } from "@cp949/react-wordcloud";
 import StatsStyling from "../styles/stats_styling";
 import Card from "../components/Card";
 import UserModal from "../components/UserModal";
 import {
  type SummaryResponse,
  type FrequencyWord,
-  type UserAnalysisResponse, 
+  type UserEndpointResponse,
  type TimeAnalysisResponse,
-  type ContentAnalysisResponse,
+  type LinguisticAnalysisResponse,
-  type User
+} from "../types/ApiTypes";
-} from '../types/ApiTypes'
+import {
  buildAllRecordsSpec,
  buildDateBucketSpec,
  buildOneTimeUsersSpec,
  buildUserSpec,
  type CorpusExplorerSpec,
 } from "../utils/corpusExplorer";
 const styles = StatsStyling;
 const MAX_WORDCLOUD_WORDS = 250;
 const exploreButtonStyle = { padding: "4px 8px", fontSize: 12 };
 const WORDCLOUD_OPTIONS = {
  rotations: 2,
  rotationAngles: [0, 90] as [number, number],
  fontSizes: [14, 60] as [number, number],
  enableTooltip: true,
 };
 type SummaryStatsProps = {
-    userData: UserAnalysisResponse | null;
+  userData: UserEndpointResponse | null;
-    timeData: TimeAnalysisResponse | null;
+  timeData: TimeAnalysisResponse | null;
-    contentData: ContentAnalysisResponse | null;
+  linguisticData: LinguisticAnalysisResponse | null;
-    summary: SummaryResponse | null;
+  summary: SummaryResponse | null;
-}
+  onExplore: (spec: CorpusExplorerSpec) => void;
 };
 type WordCloudPanelProps = {
  words: { text: string; value: number }[];
 };
 const WordCloudPanel = memo(({ words }: WordCloudPanelProps) => (
  <ReactWordcloud words={words} options={WORDCLOUD_OPTIONS} />
 ));
 function formatDateRange(startUnix: number, endUnix: number) {
  const start = new Date(startUnix * 1000);
@@ -44,174 +67,188 @@ function formatDateRange(startUnix: number, endUnix: number) {
      day: "2-digit",
    });
-  return `${fmt(start)} → ${fmt(end)}`;
+  return `${fmt(start)} -> ${fmt(end)}`;
 }
 function convertFrequencyData(data: FrequencyWord[]) {
-    return data.map((d: FrequencyWord) => ({
+  return data.map((d: FrequencyWord) => ({
-        text: d.word,
+    text: d.word,
-        value: d.count,
+    value: d.count,
-      }))
+  }));
 }
-const SummaryStats = ({userData, timeData, contentData, summary}: SummaryStatsProps) => {
+const renderExploreButton = (onClick: () => void) => (
-    const [selectedUser, setSelectedUser] = useState<string | null>(null);
+  <button
-    const selectedUserData: User | null = userData?.users.find((u) => u.author === selectedUser) ?? null;
+    onClick={onClick}
    style={{ ...styles.buttonSecondary, ...exploreButtonStyle }}
  >
    Explore
  </button>
 );
-    console.log(summary)
+const SummaryStats = ({
  userData,
  timeData,
  linguisticData,
  summary,
  onExplore,
 }: SummaryStatsProps) => {
  const wordCloudWords = useMemo(
    () =>
      convertFrequencyData(
        (linguisticData?.word_frequencies ?? []).slice(0, MAX_WORDCLOUD_WORDS),
      ),
    [linguisticData?.word_frequencies],
  );
-    return (
+  const topUsersPreview = useMemo(
    () => (userData?.top_users ?? []).slice(0, 100),
    [userData?.top_users],
  );
  return (
    <div style={styles.page}>
      <div style={{ ...styles.container, ...styles.grid }}>
        <Card
          label="Total Activity"
          value={summary?.total_events ?? "-"}
          sublabel="Posts + comments"
          rightSlot={renderExploreButton(() => onExplore(buildAllRecordsSpec()))}
          style={{ gridColumn: "span 4" }}
        />
        <Card
          label="Active People"
          value={summary?.unique_users ?? "-"}
          sublabel="Distinct users"
          rightSlot={renderExploreButton(() => onExplore(buildAllRecordsSpec()))}
          style={{ gridColumn: "span 4" }}
        />
        <Card
          label="Posts vs Comments"
          value={
            summary ? `${summary.total_posts} / ${summary.total_comments}` : "-"
          }
          sublabel={`Comments per post: ${summary?.comments_per_post ?? "-"}`}
          rightSlot={renderExploreButton(() => onExplore(buildAllRecordsSpec()))}
          style={{ gridColumn: "span 4" }}
        />
-        {/* main grid*/}
+        <Card
-        <div style={{ ...styles.container, ...styles.grid}}>
+          label="Time Range"
-            <Card
+          value={
-            label="Total Events"
+            summary?.time_range
-            value={summary?.total_events ?? "—"}
+              ? formatDateRange(summary.time_range.start, summary.time_range.end)
-            sublabel="Posts + comments"
+              : "-"
-            style={{
+          }
-                gridColumn: "span 4"
+          sublabel="Based on dataset timestamps"
-            }}
+          rightSlot={renderExploreButton(() => onExplore(buildAllRecordsSpec()))}
-            />
+          style={{ gridColumn: "span 4" }}
-            <Card
+        />
            label="Unique Users"
            value={summary?.unique_users ?? "—"}
            sublabel="Distinct authors"
            style={{
                gridColumn: "span 4"
            }}
            />
            <Card
            label="Posts / Comments"
            value={
                summary
                ? `${summary.total_posts} / ${summary.total_comments}`
                : "—"
            }
            sublabel={`Comments per post: ${summary?.comments_per_post ?? "—"}`}
            style={{
                gridColumn: "span 4"
            }}
            />
-            <Card
+        <Card
-            label="Time Range"
+          label="One-Time Users"
-            value={
+          value={
-                summary?.time_range
+            typeof summary?.lurker_ratio === "number"
-                ? formatDateRange(summary.time_range.start, summary.time_range.end)
+              ? `${Math.round(summary.lurker_ratio * 100)}%`
-                : "—"
+              : "-"
-            }
+          }
-            sublabel="Based on dataset timestamps"
+          sublabel="Users with only one event"
-            style={{
+          rightSlot={renderExploreButton(() => onExplore(buildOneTimeUsersSpec()))}
-                gridColumn: "span 4"
+          style={{ gridColumn: "span 4" }}
-            }}
+        />
            />
-            <Card
+        <Card
-            label="Lurker Ratio"
+          label="Sources"
-            value={
+          value={summary?.sources?.length ?? "-"}
-                typeof summary?.lurker_ratio === "number"
+          sublabel={
-                ? `${Math.round(summary.lurker_ratio * 100)}%`
+            summary?.sources?.length
-                : "—"
+              ? summary.sources.slice(0, 3).join(", ") +
-            }
+                (summary.sources.length > 3 ? "..." : "")
-            sublabel="Users with only 1 event"
+              : "-"
-            style={{
+          }
-                gridColumn: "span 4"
+          rightSlot={renderExploreButton(() => onExplore(buildAllRecordsSpec()))}
-            }}
+          style={{ gridColumn: "span 4" }}
-            />
+        />
            <Card
            label="Sources"
            value={summary?.sources?.length ?? "—"}
            sublabel={
                summary?.sources?.length
                ? summary.sources.slice(0, 3).join(", ") +
                    (summary.sources.length > 3 ? "…" : "")
                : "—"
            }
            style={{
                gridColumn: "span 4"
            }}
            />
        {/* events per day */}
        <div style={{ ...styles.card, gridColumn: "span 5" }}>
-            <h2 style={styles.sectionTitle}>Events per Day</h2>
+          <h2 style={styles.sectionTitle}>Activity Over Time</h2>
-            <p style={styles.sectionSubtitle}>Trend of activity over time</p>
+          <p style={styles.sectionSubtitle}>How much posting happened each day.</p>
-            <div style={styles.chartWrapper}>
+          <div style={styles.chartWrapper}>
            <ResponsiveContainer width="100%" height="100%">
-                <LineChart data={timeData?.events_per_day.filter((d) => new Date(d.date) >= new Date('2026-01-10'))}>
+              <LineChart
                data={timeData?.events_per_day ?? []}
                onClick={(state: unknown) => {
                  const payload = (state as { activePayload?: Array<{ payload?: { date?: string } }> })
                    ?.activePayload?.[0]?.payload as
                    | { date?: string }
                    | undefined;
                  if (payload?.date) {
                    onExplore(buildDateBucketSpec(String(payload.date)));
                  }
                }}
              >
                <CartesianGrid strokeDasharray="3 3" />
                <XAxis dataKey="date" />
                <YAxis />
                <Tooltip />
-                <Line type="monotone" dataKey="count" name="Events" />
+                <Line
-                </LineChart>
+                  type="monotone"
                  dataKey="count"
                  name="Events"
                  isAnimationActive={false}
                />
              </LineChart>
            </ResponsiveContainer>
-            </div>
+          </div>
        </div>
        {/* Word Cloud */}
        <div style={{ ...styles.card, gridColumn: "span 4" }}>
-            <h2 style={styles.sectionTitle}>Word Cloud</h2>
+          <h2 style={styles.sectionTitle}>Common Words</h2>
-            <p style={styles.sectionSubtitle}>Most common terms across events</p>
+          <p style={styles.sectionSubtitle}>
            Frequently used words across the dataset.
          </p>
-            <div style={styles.chartWrapper}>
+          <div style={styles.chartWrapper}>
-            <ReactWordcloud
+            <WordCloudPanel words={wordCloudWords} />
-                words={convertFrequencyData(contentData?.word_frequencies ?? [])}
+          </div>
                options={{
                rotations: 2,
                rotationAngles: [0, 90],
                fontSizes: [14, 60],
                enableTooltip: true,
                }}
            />
            </div>
        </div>
-        {/* Top Users */}
+        <div
-        <div style={{...styles.card, ...styles.scrollArea, gridColumn: "span 3",
+          style={{ ...styles.card, ...styles.scrollArea, gridColumn: "span 3" }}
        }}
        >
-            <h2 style={styles.sectionTitle}>Top Users</h2>
+          <h2 style={styles.sectionTitle}>Most Active Users</h2>
-            <p style={styles.sectionSubtitle}>Most active authors</p>
+          <p style={styles.sectionSubtitle}>Who posted the most events.</p>
-            <div style={styles.topUsersList}>
+          <div style={styles.topUsersList}>
-            {userData?.top_users.slice(0, 100).map((item) => (
+            {topUsersPreview.map((item) => (
-                <div
+              <div
                key={`${item.author}-${item.source}`}
                style={{ ...styles.topUserItem, cursor: "pointer" }}
-                onClick={() => setSelectedUser(item.author)}
+                onClick={() => onExplore(buildUserSpec(item.author))}
-                >
+              >
                <div style={styles.topUserName}>{item.author}</div>
                <div style={styles.topUserMeta}>
-                    {item.source} • {item.count} events
+                  {item.source} • {item.count} events
                </div>
                </div>
              </div>
            ))}
-            </div>
+          </div>
        </div>
        {/* Heatmap */}
        <div style={{ ...styles.card, gridColumn: "span 12" }}>
-            <h2 style={styles.sectionTitle}>Heatmap</h2>
+          <h2 style={styles.sectionTitle}>Weekly Activity Pattern</h2>
-            <p style={styles.sectionSubtitle}>Activity density across time</p>
+          <p style={styles.sectionSubtitle}>
            When activity tends to happen by weekday and hour.
          </p>
-            <div style={styles.heatmapWrapper}>
+          <div style={styles.heatmapWrapper}>
            <ActivityHeatmap data={timeData?.weekday_hour_heatmap ?? []} />
-            </div>
+          </div>
        </div>
-        </div>
+      </div>
        <UserModal
        open={!!selectedUser}
        onClose={() => setSelectedUser(null)}
        username={selectedUser ?? ""}
        userData={selectedUserData}
        />
    </div>
-    );
+  );
-}
+};
 export default SummaryStats;
--- a/frontend/src/components/UserModal.tsx
+++ b/frontend/src/components/UserModal.tsx
@@ -11,7 +11,16 @@ type Props = {
  username: string;
 };
-export default function UserModal({ open, onClose, userData, username }: Props) {
+export default function UserModal({
  open,
  onClose,
  userData,
  username,
 }: Props) {
  const dominantEmotionEntry = Object.entries(
    userData?.avg_emotions ?? {},
  ).sort((a, b) => b[1] - a[1])[0];
  return (
    <Dialog open={open} onClose={onClose} style={styles.modalRoot}>
      <div style={styles.modalBackdrop} />
@@ -33,7 +42,9 @@ export default function UserModal({ open, onClose, userData, username }: Props)
            <p style={styles.sectionSubtitle}>No data for this user.</p>
          ) : (
            <div style={styles.topUsersList}>
-              <div style={{...styles.topUserName, fontSize: 20}}>{userData.author}</div>
+              <div style={{ ...styles.topUserName, fontSize: 20 }}>
                {userData.author}
              </div>
              <div style={styles.topUserItem}>
                <div style={styles.topUserName}>Posts</div>
                <div style={styles.topUserMeta}>{userData.post}</div>
@@ -62,7 +73,27 @@ export default function UserModal({ open, onClose, userData, username }: Props)
                <div style={styles.topUserItem}>
                  <div style={styles.topUserName}>Vocab Richness</div>
                  <div style={styles.topUserMeta}>
-                    {userData.vocab.vocab_richness} (avg {userData.vocab.avg_words_per_event} words/event)
+                    {userData.vocab.vocab_richness} (avg{" "}
                    {userData.vocab.avg_words_per_event} words/event)
                  </div>
                </div>
              ) : null}
              {dominantEmotionEntry ? (
                <div style={styles.topUserItem}>
                  <div style={styles.topUserName}>Dominant Avg Emotion</div>
                  <div style={styles.topUserMeta}>
                    {dominantEmotionEntry[0].replace("emotion_", "")} (
                    {dominantEmotionEntry[1].toFixed(3)})
                  </div>
                </div>
              ) : null}
              {userData.dominant_topic ? (
                <div style={styles.topUserItem}>
                  <div style={styles.topUserName}>Most Common Topic</div>
                  <div style={styles.topUserMeta}>
                    {userData.dominant_topic.topic} ({userData.dominant_topic.count} events)
                  </div>
                </div>
              ) : null}
--- a/frontend/src/components/UserStats.tsx
+++ b/frontend/src/components/UserStats.tsx
@@ -1,49 +1,64 @@
 import { useEffect, useMemo, useRef, useState } from "react";
 import ForceGraph3D from "react-force-graph-3d";
-import {
+import { type TopUser, type InteractionGraph } from "../types/ApiTypes";
    type UserAnalysisResponse,
    type InteractionGraph
 } from '../types/ApiTypes';
 import StatsStyling from "../styles/stats_styling";
 import Card from "./Card";
 import {
  buildReplyPairSpec,
  toText,
  buildUserSpec,
  type CorpusExplorerSpec,
 } from "../utils/corpusExplorer";
 const styles = StatsStyling;
 type GraphLink = {
-    source: string;
+  source: string;
-    target: string;
+  target: string;
-    value: number;
+  value: number;
 };
-function ApiToGraphData(apiData: InteractionGraph) {
+function toGraphData(apiData: InteractionGraph) {
-    const nodes = Object.keys(apiData).map(username => ({ id: username }));
+  const links: GraphLink[] = [];
-    const links: GraphLink[] = [];
+  const connectedNodeIds = new Set<string>();
-    for (const [source, targets] of Object.entries(apiData)) {
+  for (const [source, targets] of Object.entries(apiData)) {
-        for (const [target, count] of Object.entries(targets)) {
+    for (const [target, count] of Object.entries(targets)) {
-            links.push({ source, target, value: count });
+      if (count < 2 || source === "[deleted]" || target === "[deleted]") {
-        }
+        continue;
      }
      links.push({ source, target, value: count });
      connectedNodeIds.add(source);
      connectedNodeIds.add(target);
    }
  }
-    // drop low-value and deleted interactions to reduce clutter
+  const filteredNodes = Array.from(connectedNodeIds, (id) => ({ id }));
    const filteredLinks = links.filter(link => 
        link.value >= 2 && 
        link.source !== "[deleted]" && 
        link.target !== "[deleted]"
    );
-    // also filter out nodes that are no longer connected after link filtering
+  return { nodes: filteredNodes, links };
    const connectedNodeIds = new Set(filteredLinks.flatMap(link => [link.source, link.target]));
    const filteredNodes = nodes.filter(node => connectedNodeIds.has(node.id));
    return { nodes: filteredNodes, links: filteredLinks};
 }
 type UserStatsProps = {
  topUsers: TopUser[];
  interactionGraph: InteractionGraph;
  totalUsers: number;
  mostCommentHeavyUser: { author: string; commentShare: number } | null;
  onExplore: (spec: CorpusExplorerSpec) => void;
 };
-const UserStats = (props: { data: UserAnalysisResponse }) => {
+const UserStats = ({
-  const graphData = useMemo(() => ApiToGraphData(props.data.interaction_graph), [props.data.interaction_graph]);
+  topUsers,
  interactionGraph,
  totalUsers,
  mostCommentHeavyUser,
  onExplore,
 }: UserStatsProps) => {
  const graphData = useMemo(
    () => toGraphData(interactionGraph),
    [interactionGraph],
  );
  const graphContainerRef = useRef<HTMLDivElement | null>(null);
  const [graphSize, setGraphSize] = useState({ width: 720, height: 540 });
@@ -61,88 +76,155 @@ const UserStats = (props: { data: UserAnalysisResponse }) => {
    return () => window.removeEventListener("resize", updateGraphSize);
  }, []);
  const totalUsers = props.data.users.length;
  const connectedUsers = graphData.nodes.length;
-  const totalInteractions = graphData.links.reduce((sum, link) => sum + link.value, 0);
+  const totalInteractions = graphData.links.reduce(
-  const avgInteractionsPerConnectedUser = connectedUsers ? totalInteractions / connectedUsers : 0;
+    (sum, link) => sum + link.value,
    0,
  );
  const avgInteractionsPerConnectedUser = connectedUsers
    ? totalInteractions / connectedUsers
    : 0;
-  const strongestLink = graphData.links.reduce<GraphLink | null>((best, current) => {
+  const strongestLink = graphData.links.reduce<GraphLink | null>(
-    if (!best || current.value > best.value) {
+    (best, current) => {
-      return current;
+      if (!best || current.value > best.value) {
-    }
+        return current;
-    return best;
+      }
-  }, null);
+      return best;
    },
    null,
  );
-  const highlyInteractiveUser = [...props.data.users].sort((a, b) => b.comment_share - a.comment_share)[0];
+  const mostActiveUser = topUsers.find((u) => u.author !== "[deleted]");
-
+  const strongestLinkSource = strongestLink ? toText(strongestLink.source) : "";
-  const mostActiveUser = props.data.top_users.find(u => u.author !== "[deleted]");
+  const strongestLinkTarget = strongestLink ? toText(strongestLink.target) : "";
  return (
    <div style={styles.page}>
-        <div style={{ ...styles.container, ...styles.grid }}>
+      <div style={{ ...styles.container, ...styles.grid }}>
-          <Card
+        <Card
-            label="Users"
+          label="Users"
-            value={totalUsers.toLocaleString()}
+          value={totalUsers.toLocaleString()}
-            sublabel={`${connectedUsers.toLocaleString()} users in filtered graph`}
+          sublabel={`${connectedUsers.toLocaleString()} users in filtered graph`}
-            style={{ gridColumn: "span 3" }}
+          style={{ gridColumn: "span 3" }}
-          />
+        />
-          <Card
+        <Card
-            label="Interactions"
+          label="Replies"
-            value={totalInteractions.toLocaleString()}
+          value={totalInteractions.toLocaleString()}
-            sublabel="Filtered links (2+ interactions)"
+          sublabel="Links with at least 2 replies"
-            style={{ gridColumn: "span 3" }}
+          style={{ gridColumn: "span 3" }}
-          />
+        />
-          <Card
+        <Card
-            label="Average Intensity"
+          label="Replies per Connected User"
-            value={avgInteractionsPerConnectedUser.toFixed(1)}
+          value={avgInteractionsPerConnectedUser.toFixed(1)}
-            sublabel="Interactions per connected user"
+          sublabel="Average from visible graph links"
-            style={{ gridColumn: "span 3" }}
+          style={{ gridColumn: "span 3" }}
-          />
+        />
-          <Card
+        <Card
-            label="Most Active User"
+          label="Most Active User"
-            value={mostActiveUser?.author ?? "—"}
+          value={mostActiveUser?.author ?? "-"}
-            sublabel={mostActiveUser ? `${mostActiveUser.count.toLocaleString()} events` : "No user activity found"}
+          sublabel={
-            style={{ gridColumn: "span 3" }}
+            mostActiveUser
-          />
+              ? `${mostActiveUser.count.toLocaleString()} events`
              : "No user activity found"
          }
          rightSlot={
            mostActiveUser ? (
              <button
                onClick={() => onExplore(buildUserSpec(mostActiveUser.author))}
                style={styles.buttonSecondary}
              >
                Explore
              </button>
            ) : null
          }
          style={{ gridColumn: "span 3" }}
        />
-          <Card
+        <Card
-            label="Strongest Connection"
+          label="Strongest User Link"
-            value={strongestLink ? `${strongestLink.source} -> ${strongestLink.target}` : "—"}
+          value={
-            sublabel={strongestLink ? `${strongestLink.value.toLocaleString()} interactions` : "No graph edges after filtering"}
+            strongestLinkSource && strongestLinkTarget
-            style={{ gridColumn: "span 6" }}
+              ? `${strongestLinkSource} -> ${strongestLinkTarget}`
-          />
+              : "-"
-          <Card
+          }
-            label="Most Reply-Driven User"
+          sublabel={
-            value={highlyInteractiveUser?.author ?? "—"}
+            strongestLink
-            sublabel={
+              ? `${strongestLink.value.toLocaleString()} replies`
-              highlyInteractiveUser
+              : "No graph links after filtering"
-                ? `${Math.round(highlyInteractiveUser.comment_share * 100)}% comments`
+          }
-                : "No user distribution available"
+          rightSlot={
-            }
+            strongestLinkSource && strongestLinkTarget ? (
-            style={{ gridColumn: "span 6" }}
+              <button
-          />
+                onClick={() =>
                  onExplore(buildReplyPairSpec(strongestLinkSource, strongestLinkTarget))
                }
                style={styles.buttonSecondary}
              >
                Explore
              </button>
            ) : null
          }
          style={{ gridColumn: "span 6" }}
        />
        <Card
          label="Most Comment-Heavy User"
          value={mostCommentHeavyUser?.author ?? "-"}
          sublabel={
            mostCommentHeavyUser
              ? `${Math.round(mostCommentHeavyUser.commentShare * 100)}% comments`
              : "No user distribution available"
          }
          rightSlot={
            mostCommentHeavyUser ? (
              <button
                onClick={() => onExplore(buildUserSpec(mostCommentHeavyUser.author))}
                style={styles.buttonSecondary}
              >
                Explore
              </button>
            ) : null
          }
          style={{ gridColumn: "span 6" }}
        />
-          <div style={{ ...styles.card, gridColumn: "span 12" }}>
+        <div style={{ ...styles.card, gridColumn: "span 12" }}>
-            <h2 style={styles.sectionTitle}>User Interaction Graph</h2>
+          <h2 style={styles.sectionTitle}>User Interaction Graph</h2>
-            <p style={styles.sectionSubtitle}>
+          <p style={styles.sectionSubtitle}>
-              Nodes represent users and links represent conversation interactions.
+            Each node is a user, and each link shows replies between them.
-            </p>
+          </p>
-            <div ref={graphContainerRef} style={{ width: "100%", height: graphSize.height }}>
+          <div
-              <ForceGraph3D
+            ref={graphContainerRef}
-                width={graphSize.width}
+            style={{ width: "100%", height: graphSize.height }}
-                height={graphSize.height}
+          >
-                graphData={graphData}
+            <ForceGraph3D
-                nodeAutoColorBy="id"
+              width={graphSize.width}
-                linkDirectionalParticles={1}
+              height={graphSize.height}
-                linkDirectionalParticleSpeed={0.004}
+              graphData={graphData}
-                linkWidth={(link) => Math.sqrt(Number(link.value))}
+              nodeAutoColorBy="id"
-                nodeLabel={(node) => `${node.id}`}
+              linkDirectionalParticles={1}
-              />
+              linkDirectionalParticleSpeed={0.004}
-            </div>
+              linkWidth={(link) => Math.sqrt(Number(link.value))}
              nodeLabel={(node) => `${node.id}`}
              onNodeClick={(node) => {
                const userId = toText(node.id);
                if (userId) {
                  onExplore(buildUserSpec(userId));
                }
              }}
              onLinkClick={(link) => {
                const source = toText(link.source);
                const target = toText(link.target);
                if (source && target) {
                  onExplore(buildReplyPairSpec(source, target));
                }
              }}
            />
          </div>
        </div>
      </div>
    </div>
  );
-}
+};
 export default UserStats;
--- a/frontend/src/pages/AutoFetch.tsx
+++ b/frontend/src/pages/AutoFetch.tsx
@@ -0,0 +1,530 @@
 import axios from "axios";
 import { useEffect, useState } from "react";
 import { useNavigate } from "react-router-dom";
 import StatsStyling from "../styles/stats_styling";
 const styles = StatsStyling;
 const API_BASE_URL = import.meta.env.VITE_BACKEND_URL;
 type SourceOption = {
  id: string;
  label: string;
  search_enabled?: boolean;
  categories_enabled?: boolean;
  searchEnabled?: boolean;
  categoriesEnabled?: boolean;
 };
 type SourceConfig = {
  sourceName: string;
  limit: string;
  search: string;
  category: string;
 };
 type TopicMap = Record<string, string>;
 const buildEmptySourceConfig = (sourceName = ""): SourceConfig => ({
  sourceName,
  limit: "100",
  search: "",
  category: "",
 });
 const supportsSearch = (source?: SourceOption): boolean =>
  Boolean(source?.search_enabled ?? source?.searchEnabled);
 const supportsCategories = (source?: SourceOption): boolean =>
  Boolean(source?.categories_enabled ?? source?.categoriesEnabled);
 const AutoFetchPage = () => {
  const navigate = useNavigate();
  const [datasetName, setDatasetName] = useState("");
  const [sourceOptions, setSourceOptions] = useState<SourceOption[]>([]);
  const [sourceConfigs, setSourceConfigs] = useState<SourceConfig[]>([]);
  const [returnMessage, setReturnMessage] = useState("");
  const [isLoadingSources, setIsLoadingSources] = useState(true);
  const [isSubmitting, setIsSubmitting] = useState(false);
  const [hasError, setHasError] = useState(false);
  const [useCustomTopics, setUseCustomTopics] = useState(false);
  const [customTopicsText, setCustomTopicsText] = useState("");
  useEffect(() => {
    axios
      .get<SourceOption[]>(`${API_BASE_URL}/datasets/sources`)
      .then((response) => {
        const options = response.data || [];
        setSourceOptions(options);
        setSourceConfigs([buildEmptySourceConfig(options[0]?.id || "")]);
      })
      .catch((requestError: unknown) => {
        setHasError(true);
        if (axios.isAxiosError(requestError)) {
          setReturnMessage(
            `Failed to load available sources: ${String(
              requestError.response?.data?.error || requestError.message,
            )}`,
          );
        } else {
          setReturnMessage("Failed to load available sources.");
        }
      })
      .finally(() => {
        setIsLoadingSources(false);
      });
  }, []);
  const updateSourceConfig = (
    index: number,
    field: keyof SourceConfig,
    value: string,
  ) => {
    setSourceConfigs((previous) =>
      previous.map((config, configIndex) =>
        configIndex === index
          ? field === "sourceName"
            ? { ...config, sourceName: value, search: "", category: "" }
            : { ...config, [field]: value }
          : config,
      ),
    );
  };
  const getSourceOption = (sourceName: string) =>
    sourceOptions.find((option) => option.id === sourceName);
  const addSourceConfig = () => {
    setSourceConfigs((previous) => [
      ...previous,
      buildEmptySourceConfig(sourceOptions[0]?.id || ""),
    ]);
  };
  const removeSourceConfig = (index: number) => {
    setSourceConfigs((previous) =>
      previous.filter((_, configIndex) => configIndex !== index),
    );
  };
  const autoFetch = async () => {
    const token = localStorage.getItem("access_token");
    if (!token) {
      setHasError(true);
      setReturnMessage("You must be signed in to auto fetch a dataset.");
      return;
    }
    const normalizedDatasetName = datasetName.trim();
    if (!normalizedDatasetName) {
      setHasError(true);
      setReturnMessage("Please add a dataset name before continuing.");
      return;
    }
    if (sourceConfigs.length === 0) {
      setHasError(true);
      setReturnMessage("Please add at least one source.");
      return;
    }
    const normalizedSources = sourceConfigs.map((source) => {
      const sourceOption = getSourceOption(source.sourceName);
      return {
        name: source.sourceName,
        limit: Number(source.limit || 100),
        search: supportsSearch(sourceOption)
          ? source.search.trim() || undefined
          : undefined,
        category: supportsCategories(sourceOption)
          ? source.category.trim() || undefined
          : undefined,
      };
    });
    const invalidSource = normalizedSources.find(
      (source) =>
        !source.name || !Number.isFinite(source.limit) || source.limit <= 0,
    );
    if (invalidSource) {
      setHasError(true);
      setReturnMessage(
        "Every source needs a name and a limit greater than zero.",
      );
      return;
    }
    let normalizedTopics: TopicMap | undefined;
    if (useCustomTopics) {
      const customTopicsJson = customTopicsText.trim();
      if (!customTopicsJson) {
        setHasError(true);
        setReturnMessage(
          "Custom topics are enabled, so please provide a JSON topic map.",
        );
        return;
      }
      let parsedTopics: unknown;
      try {
        parsedTopics = JSON.parse(customTopicsJson);
      } catch {
        setHasError(true);
        setReturnMessage("Custom topic list must be valid JSON.");
        return;
      }
      if (
        !parsedTopics ||
        Array.isArray(parsedTopics) ||
        typeof parsedTopics !== "object"
      ) {
        setHasError(true);
        setReturnMessage(
          "Custom topic list must be a JSON object: {\"Topic\": \"keywords\"}.",
        );
        return;
      }
      const entries = Object.entries(parsedTopics);
      if (entries.length === 0) {
        setHasError(true);
        setReturnMessage("Custom topic list cannot be empty.");
        return;
      }
      const hasInvalidTopic = entries.some(
        ([topicName, keywords]) =>
          !topicName.trim() ||
          typeof keywords !== "string" ||
          !keywords.trim(),
      );
      if (hasInvalidTopic) {
        setHasError(true);
        setReturnMessage(
          "Every custom topic must have a non-empty name and keyword string.",
        );
        return;
      }
      normalizedTopics = Object.fromEntries(
        entries.map(([topicName, keywords]) => [
          topicName.trim(),
          String(keywords).trim(),
        ]),
      );
    }
    const requestBody: {
      name: string;
      sources: Array<{
        name: string;
        limit: number;
        search?: string;
        category?: string;
      }>;
      topics?: TopicMap;
    } = {
      name: normalizedDatasetName,
      sources: normalizedSources,
    };
    if (normalizedTopics) {
      requestBody.topics = normalizedTopics;
    }
    try {
      setIsSubmitting(true);
      setHasError(false);
      setReturnMessage("");
      const response = await axios.post(
        `${API_BASE_URL}/datasets/fetch`,
        requestBody,
        {
          headers: {
            Authorization: `Bearer ${token}`,
          },
        },
      );
      const datasetId = Number(response.data.dataset_id);
      setReturnMessage(
        `Auto fetch queued successfully (dataset #${datasetId}). Redirecting to processing status...`,
      );
      setTimeout(() => {
        navigate(`/dataset/${datasetId}/status`);
      }, 400);
    } catch (requestError: unknown) {
      setHasError(true);
      if (axios.isAxiosError(requestError)) {
        const message = String(
          requestError.response?.data?.error ||
            requestError.message ||
            "Auto fetch failed.",
        );
        setReturnMessage(`Auto fetch failed: ${message}`);
      } else {
        setReturnMessage("Auto fetch failed due to an unexpected error.");
      }
    } finally {
      setIsSubmitting(false);
    }
  };
  return (
    <div style={styles.page}>
      <div style={styles.containerWide}>
        <div style={{ ...styles.card, ...styles.headerBar }}>
          <div>
            <h1 style={styles.sectionHeaderTitle}>Auto Fetch Dataset</h1>
            <p style={styles.sectionHeaderSubtitle}>
              Select sources and fetch settings, then queue processing
              automatically.
            </p>
            <p
              style={{
                ...styles.subtleBodyText,
                marginTop: 6,
                color: "#9a6700",
              }}
            >
              Warning: Fetching more than 250 posts from any single site can
              take hours due to rate limits.
            </p>
          </div>
          <button
            type="button"
            style={{
              ...styles.buttonPrimary,
              opacity: isSubmitting || isLoadingSources ? 0.75 : 1,
            }}
            onClick={autoFetch}
            disabled={isSubmitting || isLoadingSources}
          >
            {isSubmitting ? "Queueing..." : "Auto Fetch and Analyze"}
          </button>
        </div>
        <div
          style={{
            ...styles.grid,
            marginTop: 14,
            gridTemplateColumns: "repeat(auto-fit, minmax(280px, 1fr))",
          }}
        >
          <div style={{ ...styles.card, gridColumn: "auto" }}>
            <h2 style={{ ...styles.sectionTitle, color: "#24292f" }}>
              Dataset Name
            </h2>
            <p style={styles.sectionSubtitle}>
              Use a clear label so you can identify this run later.
            </p>
            <input
              style={{ ...styles.input, ...styles.inputFullWidth }}
              type="text"
              placeholder="Example: r/cork subreddit - Jan 2026"
              value={datasetName}
              onChange={(event) => setDatasetName(event.target.value)}
            />
          </div>
          <div style={{ ...styles.card, gridColumn: "auto" }}>
            <h2 style={{ ...styles.sectionTitle, color: "#24292f" }}>
              Sources
            </h2>
            <p style={styles.sectionSubtitle}>
              Configure source, limit, optional search, and optional category.
            </p>
            {isLoadingSources && (
              <p style={styles.subtleBodyText}>Loading sources...</p>
            )}
            {!isLoadingSources && sourceOptions.length === 0 && (
              <p style={styles.subtleBodyText}>
                No source connectors are currently available.
              </p>
            )}
            {!isLoadingSources && sourceOptions.length > 0 && (
              <div
                style={{ display: "flex", flexDirection: "column", gap: 10 }}
              >
                {sourceConfigs.map((source, index) => {
                  const sourceOption = getSourceOption(source.sourceName);
                  const searchEnabled = supportsSearch(sourceOption);
                  const categoriesEnabled = supportsCategories(sourceOption);
                  return (
                    <div
                      key={`source-${index}`}
                      style={{
                        border: "1px solid #d0d7de",
                        borderRadius: 8,
                        padding: 12,
                        background: "#f6f8fa",
                        display: "grid",
                        gap: 8,
                      }}
                    >
                      <select
                        value={source.sourceName}
                        style={{ ...styles.input, ...styles.inputFullWidth }}
                        onChange={(event) =>
                          updateSourceConfig(
                            index,
                            "sourceName",
                            event.target.value,
                          )
                        }
                      >
                        {sourceOptions.map((option) => (
                          <option key={option.id} value={option.id}>
                            {option.label}
                          </option>
                        ))}
                      </select>
                      <input
                        type="number"
                        min={1}
                        value={source.limit}
                        placeholder="Limit"
                        style={{ ...styles.input, ...styles.inputFullWidth }}
                        onChange={(event) =>
                          updateSourceConfig(index, "limit", event.target.value)
                        }
                      />
                      <input
                        type="text"
                        value={source.search}
                        placeholder={
                          searchEnabled
                            ? "Search term (optional)"
                            : "Search not supported for this source"
                        }
                        style={{ ...styles.input, ...styles.inputFullWidth }}
                        disabled={!searchEnabled}
                        onChange={(event) =>
                          updateSourceConfig(
                            index,
                            "search",
                            event.target.value,
                          )
                        }
                      />
                      <input
                        type="text"
                        value={source.category}
                        placeholder={
                          categoriesEnabled
                            ? "Category (optional)"
                            : "Categories not supported for this source"
                        }
                        style={{ ...styles.input, ...styles.inputFullWidth }}
                        disabled={!categoriesEnabled}
                        onChange={(event) =>
                          updateSourceConfig(
                            index,
                            "category",
                            event.target.value,
                          )
                        }
                      />
                      {sourceConfigs.length > 1 && (
                        <button
                          type="button"
                          style={styles.buttonSecondary}
                          onClick={() => removeSourceConfig(index)}
                        >
                          Remove source
                        </button>
                      )}
                    </div>
                  );
                })}
                <button
                  type="button"
                  style={styles.buttonSecondary}
                  onClick={addSourceConfig}
                >
                  Add another source
                </button>
              </div>
            )}
          </div>
          <div style={{ ...styles.card, gridColumn: "auto" }}>
            <h2 style={{ ...styles.sectionTitle, color: "#24292f" }}>
              Topic List
            </h2>
            <p style={styles.sectionSubtitle}>
              Use the default topic list, or provide your own JSON topic map.
            </p>
            <label
              style={{
                display: "flex",
                alignItems: "center",
                gap: 8,
                fontSize: 14,
                color: "#24292f",
                marginBottom: 10,
              }}
            >
              <input
                type="checkbox"
                checked={useCustomTopics}
                onChange={(event) => setUseCustomTopics(event.target.checked)}
              />
              Use custom topic list
            </label>
            <textarea
              value={customTopicsText}
              onChange={(event) => setCustomTopicsText(event.target.value)}
              disabled={!useCustomTopics}
              placeholder='{"Politics": "election, policy, government", "Housing": "rent, landlords, tenancy"}'
              style={{
                ...styles.input,
                ...styles.inputFullWidth,
                minHeight: 170,
                resize: "vertical",
                fontFamily:
                  '"IBM Plex Mono", "Fira Code", "JetBrains Mono", monospace',
              }}
            />
            <p style={styles.subtleBodyText}>
              Format: JSON object where each key is a topic and each value is a
              keyword string.
            </p>
          </div>
        </div>
        <div
          style={{
            ...styles.card,
            marginTop: 14,
            ...(hasError ? styles.alertCardError : styles.alertCardInfo),
          }}
        >
          {returnMessage ||
            "After queueing, your dataset is fetched and processed in the background automatically."}
        </div>
      </div>
    </div>
  );
 };
 export default AutoFetchPage;
--- a/frontend/src/pages/DatasetEdit.tsx
+++ b/frontend/src/pages/DatasetEdit.tsx
@@ -22,12 +22,10 @@ const DatasetEditPage = () => {
  const [isSaving, setIsSaving] = useState(false);
  const [isDeleting, setIsDeleting] = useState(false);
  const [isDeleteModalOpen, setIsDeleteModalOpen] = useState(false);
  const [hasError, setHasError] = useState(false);
  const [datasetName, setDatasetName] = useState("");
  useEffect(() => {
    if (!Number.isInteger(parsedDatasetId) || parsedDatasetId <= 0) {
      setHasError(true);
      setStatusMessage("Invalid dataset id.");
      setLoading(false);
      return;
@@ -35,7 +33,6 @@ const DatasetEditPage = () => {
    const token = localStorage.getItem("access_token");
    if (!token) {
      setHasError(true);
      setStatusMessage("You must be signed in to edit datasets.");
      setLoading(false);
      return;
@@ -49,9 +46,10 @@ const DatasetEditPage = () => {
        setDatasetName(response.data.name || "");
      })
      .catch((error: unknown) => {
        setHasError(true);
        if (axios.isAxiosError(error)) {
-          setStatusMessage(String(error.response?.data?.error || error.message));
+          setStatusMessage(
            String(error.response?.data?.error || error.message),
          );
        } else {
          setStatusMessage("Could not get dataset info.");
        }
@@ -61,40 +59,39 @@ const DatasetEditPage = () => {
      });
  }, [parsedDatasetId]);
  const saveDatasetName = async (event: FormEvent<HTMLFormElement>) => {
    event.preventDefault();
    const trimmedName = datasetName.trim();
    if (!trimmedName) {
      setHasError(true);
      setStatusMessage("Please enter a valid dataset name.");
      return;
    }
    const token = localStorage.getItem("access_token");
    if (!token) {
      setHasError(true);
      setStatusMessage("You must be signed in to save changes.");
      return;
    }
    try {
      setIsSaving(true);
      setHasError(false);
      setStatusMessage("");
      await axios.patch(
        `${API_BASE_URL}/dataset/${parsedDatasetId}`,
        { name: trimmedName },
-        { headers: { Authorization: `Bearer ${token}` } }
+        { headers: { Authorization: `Bearer ${token}` } },
      );
      navigate("/datasets", { replace: true });
    } catch (error: unknown) {
      setHasError(true);
      if (axios.isAxiosError(error)) {
-        setStatusMessage(String(error.response?.data?.error || error.message || "Save failed."));
+        setStatusMessage(
          String(
            error.response?.data?.error || error.message || "Save failed.",
          ),
        );
      } else {
        setStatusMessage("Save failed due to an unexpected error.");
      }
@@ -106,7 +103,6 @@ const DatasetEditPage = () => {
  const deleteDataset = async () => {
    const deleteToken = localStorage.getItem("access_token");
    if (!deleteToken) {
      setHasError(true);
      setStatusMessage("You must be signed in to delete datasets.");
      setIsDeleteModalOpen(false);
      return;
@@ -114,20 +110,21 @@ const DatasetEditPage = () => {
    try {
      setIsDeleting(true);
      setHasError(false);
      setStatusMessage("");
-      await axios.delete(
+      await axios.delete(`${API_BASE_URL}/dataset/${parsedDatasetId}`, {
-        `${API_BASE_URL}/dataset/${parsedDatasetId}`,
+        headers: { Authorization: `Bearer ${deleteToken}` },
-        { headers: { Authorization: `Bearer ${deleteToken}` } }
+      });
      );
      setIsDeleteModalOpen(false);
      navigate("/datasets", { replace: true });
    } catch (error: unknown) {
      setHasError(true);
      if (axios.isAxiosError(error)) {
-        setStatusMessage(String(error.response?.data?.error || error.message || "Delete failed."));
+        setStatusMessage(
          String(
            error.response?.data?.error || error.message || "Delete failed.",
          ),
        );
      } else {
        setStatusMessage("Delete failed due to an unexpected error.");
      }
@@ -142,7 +139,9 @@ const DatasetEditPage = () => {
        <div style={{ ...styles.card, ...styles.headerBar }}>
          <div>
            <h1 style={styles.sectionHeaderTitle}>Edit Dataset</h1>
-            <p style={styles.sectionHeaderSubtitle}>Update the dataset name shown in your datasets list.</p>
+            <p style={styles.sectionHeaderSubtitle}>
              Update the dataset name shown in your datasets list.
            </p>
          </div>
        </div>
@@ -173,8 +172,8 @@ const DatasetEditPage = () => {
              style={styles.buttonDanger}
              onClick={() => setIsDeleteModalOpen(true)}
              disabled={isSaving || isDeleting}
-              >
+            >
-                Delete Dataset
+              Delete Dataset
            </button>
            <button
@@ -187,15 +186,16 @@ const DatasetEditPage = () => {
            </button>
            <button
              type="submit"
-              style={{ ...styles.buttonPrimary, opacity: loading || isSaving ? 0.75 : 1 }}
+              style={{
                ...styles.buttonPrimary,
                opacity: loading || isSaving ? 0.75 : 1,
              }}
              disabled={loading || isSaving || isDeleting}
            >
              {isSaving ? "Saving..." : "Save"}
            </button>
-            {loading
+            {loading ? "Loading dataset details..." : statusMessage}
            ? "Loading dataset details..."
            : statusMessage}
          </div>
        </form>
--- a/frontend/src/pages/DatasetStatus.tsx
+++ b/frontend/src/pages/DatasetStatus.tsx
@@ -3,10 +3,10 @@ import axios from "axios";
 import { useNavigate, useParams } from "react-router-dom";
 import StatsStyling from "../styles/stats_styling";
-const API_BASE_URL = import.meta.env.VITE_BACKEND_URL
+const API_BASE_URL = import.meta.env.VITE_BACKEND_URL;
 type DatasetStatusResponse = {
-  status?: "processing" | "complete" | "error";
+  status?: "fetching" | "processing" | "complete" | "error";
  status_message?: string | null;
  completed_at?: string | null;
 };
@@ -17,7 +17,8 @@ const DatasetStatusPage = () => {
  const navigate = useNavigate();
  const { datasetId } = useParams<{ datasetId: string }>();
  const [loading, setLoading] = useState(true);
-  const [status, setStatus] = useState<DatasetStatusResponse["status"]>("processing");
+  const [status, setStatus] =
    useState<DatasetStatusResponse["status"]>("processing");
  const [statusMessage, setStatusMessage] = useState("");
  const parsedDatasetId = useMemo(() => Number(datasetId), [datasetId]);
@@ -34,7 +35,7 @@ const DatasetStatusPage = () => {
    const pollStatus = async () => {
      try {
        const response = await axios.get<DatasetStatusResponse>(
-          `${API_BASE_URL}/dataset/${parsedDatasetId}/status`
+          `${API_BASE_URL}/dataset/${parsedDatasetId}/status`,
        );
        const nextStatus = response.data.status ?? "processing";
@@ -51,7 +52,9 @@ const DatasetStatusPage = () => {
        setLoading(false);
        setStatus("error");
        if (axios.isAxiosError(error)) {
-          const message = String(error.response?.data?.error || error.message || "Request failed");
+          const message = String(
            error.response?.data?.error || error.message || "Request failed",
          );
          setStatusMessage(message);
        } else {
          setStatusMessage("Unable to fetch dataset status.");
@@ -73,7 +76,8 @@ const DatasetStatusPage = () => {
    };
  }, [navigate, parsedDatasetId, status]);
-  const isProcessing = loading || status === "processing";
+  const isProcessing =
    loading || status === "fetching" || status === "processing";
  const isError = status === "error";
  return (
@@ -81,26 +85,37 @@ const DatasetStatusPage = () => {
      <div style={styles.containerNarrow}>
        <div style={{ ...styles.card, marginTop: 28 }}>
          <h1 style={styles.sectionHeaderTitle}>
-            {isProcessing ? "Processing dataset..." : isError ? "Dataset processing failed" : "Dataset ready"}
+            {isProcessing
              ? "Processing dataset..."
              : isError
                ? "Dataset processing failed"
                : "Dataset ready"}
          </h1>
          <p style={{ ...styles.sectionSubtitle, marginTop: 10 }}>
            {isProcessing &&
              "Your dataset is being analyzed. This page will redirect to stats automatically once complete."}
-            {isError && "There was an issue while processing your dataset. Please review the error details."}
+            {isError &&
-            {status === "complete" && "Processing complete. Redirecting to your stats now..."}
+              "There was an issue while processing your dataset. Please review the error details."}
            {status === "complete" &&
              "Processing complete. Redirecting to your stats now..."}
          </p>
          <div
            style={{
              ...styles.card,
              ...styles.statusMessageCard,
-              borderColor: isError ? "rgba(185, 28, 28, 0.28)" : "rgba(0,0,0,0.06)",
+              borderColor: isError
                ? "rgba(185, 28, 28, 0.28)"
                : "rgba(0,0,0,0.06)",
              background: isError ? "#fff5f5" : "#ffffff",
              color: isError ? "#991b1b" : "#374151",
            }}
          >
-            {statusMessage || (isProcessing ? "Waiting for updates from the worker queue..." : "No details provided.")}
+            {statusMessage ||
              (isProcessing
                ? "Waiting for updates from the worker queue..."
                : "No details provided.")}
          </div>
        </div>
      </div>
--- a/frontend/src/pages/Datasets.tsx
+++ b/frontend/src/pages/Datasets.tsx
@@ -9,7 +9,7 @@ const API_BASE_URL = import.meta.env.VITE_BACKEND_URL;
 type DatasetItem = {
  id: number;
  name?: string;
-  status?: "processing" | "complete" | "error" | string;
+  status?: "processing" | "complete" | "error" | "fetching" | string;
  status_message?: string | null;
  completed_at?: string | null;
  created_at?: string | null;
@@ -39,7 +39,9 @@ const DatasetsPage = () => {
      })
      .catch((requestError: unknown) => {
        if (axios.isAxiosError(requestError)) {
-          setError(String(requestError.response?.data?.error || requestError.message));
+          setError(
            String(requestError.response?.data?.error || requestError.message),
          );
        } else {
          setError("Failed to load datasets.");
        }
@@ -50,7 +52,39 @@ const DatasetsPage = () => {
  }, []);
  if (loading) {
-    return <p style={{ ...styles.page, minHeight: "100vh" }}>Loading datasets...</p>;
+    return (
      <div style={styles.loadingPage}>
        <div style={{ ...styles.loadingCard, transform: "translateY(-100px)" }}>
          <div style={styles.loadingHeader}>
            <div style={styles.loadingSpinner} />
            <div>
              <h2 style={styles.loadingTitle}>Loading datasets</h2>
            </div>
          </div>
          <div style={styles.loadingSkeleton}>
            <div
              style={{
                ...styles.loadingSkeletonLine,
                ...styles.loadingSkeletonLineLong,
              }}
            />
            <div
              style={{
                ...styles.loadingSkeletonLine,
                ...styles.loadingSkeletonLineMed,
              }}
            />
            <div
              style={{
                ...styles.loadingSkeletonLine,
                ...styles.loadingSkeletonLineShort,
              }}
            />
          </div>
        </div>
      </div>
    );
  }
  return (
@@ -63,9 +97,22 @@ const DatasetsPage = () => {
              View and reopen datasets you previously uploaded.
            </p>
          </div>
-          <button type="button" style={styles.buttonPrimary} onClick={() => navigate("/upload")}>
+          <div style={styles.controlsWrapped}>
-            Upload New Dataset
+            <button
-          </button>
+              type="button"
              style={styles.buttonPrimary}
              onClick={() => navigate("/upload")}
            >
              Upload New Dataset
            </button>
            <button
              type="button"
              style={styles.buttonSecondary}
              onClick={() => navigate("/auto-fetch")}
            >
              Auto Fetch Dataset
            </button>
          </div>
        </div>
        {error && (
@@ -90,20 +137,25 @@ const DatasetsPage = () => {
        )}
        {!error && datasets.length > 0 && (
-          <div style={{ ...styles.card, marginTop: 14, padding: 0, overflow: "hidden" }}>
+          <div
            style={{
              ...styles.card,
              marginTop: 14,
              padding: 0,
              overflow: "hidden",
            }}
          >
            <ul style={styles.listNoBullets}>
              {datasets.map((dataset) => {
-                const isComplete = dataset.status === "complete";
+                const isComplete =
                  dataset.status === "complete" || dataset.status === "error";
                const editPath = `/dataset/${dataset.id}/edit`;
                const targetPath = isComplete
                  ? `/dataset/${dataset.id}/stats`
                  : `/dataset/${dataset.id}/status`;
                return (
-                  <li
+                  <li key={dataset.id} style={styles.datasetListItem}>
                    key={dataset.id}
                    style={styles.datasetListItem}
                  >
                    <div style={{ minWidth: 0 }}>
                      <div style={styles.datasetName}>
                        {dataset.name || `Dataset #${dataset.id}`}
@@ -119,19 +171,23 @@ const DatasetsPage = () => {
                    </div>
                    <div>
-                      { isComplete &&
+                      {isComplete && (
                        <button
                          type="button"
-                          style={{...styles.buttonSecondary, "margin": "5px"}}
+                          style={{ ...styles.buttonSecondary, margin: "5px" }}
                          onClick={() => navigate(editPath)}
                        >
                          Edit Dataset
                        </button>
-                      }
+                      )}
                      <button
                        type="button"
-                        style={isComplete ? styles.buttonPrimary : styles.buttonSecondary}
+                        style={
                          isComplete
                            ? styles.buttonPrimary
                            : styles.buttonSecondary
                        }
                        onClick={() => navigate(targetPath)}
                      >
                        {isComplete ? "Open stats" : "View status"}
--- a/frontend/src/pages/Login.tsx
+++ b/frontend/src/pages/Login.tsx
@@ -3,7 +3,7 @@ import axios from "axios";
 import { useNavigate } from "react-router-dom";
 import StatsStyling from "../styles/stats_styling";
-const API_BASE_URL = import.meta.env.VITE_BACKEND_URL
+const API_BASE_URL = import.meta.env.VITE_BACKEND_URL;
 const styles = StatsStyling;
@@ -44,13 +44,17 @@ const LoginPage = () => {
    try {
      if (isRegisterMode) {
-        await axios.post(`${API_BASE_URL}/register`, { username, email, password });
+        await axios.post(`${API_BASE_URL}/register`, {
          username,
          email,
          password,
        });
        setInfo("Account created. You can now sign in.");
        setIsRegisterMode(false);
      } else {
        const response = await axios.post<{ access_token: string }>(
          `${API_BASE_URL}/login`,
-          { username, password }
+          { username, password },
        );
        const token = response.data.access_token;
@@ -61,7 +65,11 @@ const LoginPage = () => {
    } catch (requestError: unknown) {
      if (axios.isAxiosError(requestError)) {
        setError(
-          String(requestError.response?.data?.error || requestError.message || "Request failed")
+          String(
            requestError.response?.data?.error ||
              requestError.message ||
              "Request failed",
          ),
        );
      } else {
        setError("Unexpected error occurred.");
@@ -73,90 +81,86 @@ const LoginPage = () => {
  return (
    <div style={styles.containerAuth}>
-        <div style={{ ...styles.card, ...styles.authCard }}>
+      <div style={{ ...styles.card, ...styles.authCard }}>
-          <div style={styles.headingBlock}>
+        <div style={styles.headingBlock}>
-            <h1 style={styles.headingXl}>
+          <h1 style={styles.headingXl}>
-              {isRegisterMode ? "Create your account" : "Welcome back"}
+            {isRegisterMode ? "Create your account" : "Welcome back"}
-            </h1>
+          </h1>
-            <p style={styles.mutedText}>
+          <p style={styles.mutedText}>
-              {isRegisterMode
+            {isRegisterMode
-                ? "Register to start uploading and exploring your dataset insights."
+              ? "Register to start uploading and exploring your dataset insights."
-                : "Sign in to continue to your analytics workspace."}
+              : "Sign in to continue to your analytics workspace."}
-            </p>
+          </p>
          </div>
          <form onSubmit={handleSubmit} style={styles.authForm}>
            <input
              type="text"
              placeholder="Username"
              style={{ ...styles.input, ...styles.authControl }}
              value={username}
              onChange={(event) => setUsername(event.target.value)}
              required
            />
            {isRegisterMode && (
                <input
                  type="email"
                  placeholder="Email"
                  style={{ ...styles.input, ...styles.authControl }}
                  value={email}
                  onChange={(event) => setEmail(event.target.value)}
                  required
              />
            )}
            <input
              type="password"
              placeholder="Password"
              style={{ ...styles.input, ...styles.authControl }}
              value={password}
              onChange={(event) => setPassword(event.target.value)}
              required
            />
            <button
              type="submit"
              style={{ ...styles.buttonPrimary, ...styles.authControl, marginTop: 2 }}
              disabled={loading}
            >
              {loading
                ? "Please wait..."
                : isRegisterMode
                  ? "Create account"
                  : "Sign in"}
            </button>
          </form>
          {error && (
            <p style={styles.authErrorText}>
              {error}
            </p>
          )}
          {info && (
            <p style={styles.authInfoText}>
              {info}
            </p>
          )}
          <div style={styles.authSwitchRow}>
            <span style={styles.authSwitchLabel}>
              {isRegisterMode ? "Already have an account?" : "New here?"}
            </span>
            <button
              type="button"
                style={styles.authSwitchButton}
              onClick={() => {
                setError("");
                setInfo("");
                setIsRegisterMode((value) => !value);
              }}
            >
              {isRegisterMode ? "Switch to sign in" : "Create account"}
            </button>
          </div>
        </div>
        <form onSubmit={handleSubmit} style={styles.authForm}>
          <input
            type="text"
            placeholder="Username"
            style={{ ...styles.input, ...styles.authControl }}
            value={username}
            onChange={(event) => setUsername(event.target.value)}
            required
          />
          {isRegisterMode && (
            <input
              type="email"
              placeholder="Email"
              style={{ ...styles.input, ...styles.authControl }}
              value={email}
              onChange={(event) => setEmail(event.target.value)}
              required
            />
          )}
          <input
            type="password"
            placeholder="Password"
            style={{ ...styles.input, ...styles.authControl }}
            value={password}
            onChange={(event) => setPassword(event.target.value)}
            required
          />
          <button
            type="submit"
            style={{
              ...styles.buttonPrimary,
              ...styles.authControl,
              marginTop: 2,
            }}
            disabled={loading}
          >
            {loading
              ? "Please wait..."
              : isRegisterMode
                ? "Create account"
                : "Sign in"}
          </button>
        </form>
        {error && <p style={styles.authErrorText}>{error}</p>}
        {info && <p style={styles.authInfoText}>{info}</p>}
        <div style={styles.authSwitchRow}>
          <span style={styles.authSwitchLabel}>
            {isRegisterMode ? "Already have an account?" : "New here?"}
          </span>
          <button
            type="button"
            style={styles.authSwitchButton}
            onClick={() => {
              setError("");
              setInfo("");
              setIsRegisterMode((value) => !value);
            }}
          >
            {isRegisterMode ? "Switch to sign in" : "Create account"}
          </button>
        </div>
      </div>
    </div>
  );
 };
--- a/frontend/src/pages/Stats.tsx
+++ b/frontend/src/pages/Stats.tsx
@@ -1,39 +1,276 @@
-import { useEffect, useState, useRef } from "react";
+import { useEffect, useRef, useState } from "react";
 import axios from "axios";
 import { useParams } from "react-router-dom";
 import StatsStyling from "../styles/stats_styling";
 import SummaryStats from "../components/SummaryStats";
 import EmotionalStats from "../components/EmotionalStats";
 import UserStats from "../components/UserStats";
 import LinguisticStats from "../components/LinguisticStats";
 import InteractionalStats from "../components/InteractionalStats";
 import CulturalStats from "../components/CulturalStats";
 import CorpusExplorer from "../components/CorpusExplorer";
 import {
  type SummaryResponse,
  type UserAnalysisResponse, 
  type TimeAnalysisResponse,
-  type ContentAnalysisResponse
+  type User,
-} from '../types/ApiTypes'
+  type UserEndpointResponse,
  type LinguisticAnalysisResponse,
  type EmotionalAnalysisResponse,
  type InteractionAnalysisResponse,
  type CulturalAnalysisResponse,
 } from "../types/ApiTypes";
 import {
  buildExplorerContext,
  type CorpusExplorerSpec,
  type DatasetRecord,
 } from "../utils/corpusExplorer";
-const API_BASE_URL = import.meta.env.VITE_BACKEND_URL
+const API_BASE_URL = import.meta.env.VITE_BACKEND_URL;
 const styles = StatsStyling;
 const DELETED_USERS = ["[deleted]", "automoderator"];
 const isDeletedUser = (value: string | null | undefined) =>
  DELETED_USERS.includes((value ?? "").trim().toLowerCase());
 type ActiveView =
  | "summary"
  | "emotional"
  | "user"
  | "linguistic"
  | "interactional"
  | "cultural";
 type UserStatsMeta = {
  totalUsers: number;
  mostCommentHeavyUser: { author: string; commentShare: number } | null;
 };
 type ExplorerState = {
  open: boolean;
  title: string;
  description: string;
  emptyMessage: string;
  records: DatasetRecord[];
  loading: boolean;
  error: string;
 };
 const EMPTY_EXPLORER_STATE: ExplorerState = {
  open: false,
  title: "Corpus Explorer",
  description: "",
  emptyMessage: "No records found.",
  records: [],
  loading: false,
  error: "",
 };
 const createExplorerState = (
  spec: CorpusExplorerSpec,
  patch: Partial<ExplorerState> = {},
 ): ExplorerState => ({
  open: true,
  title: spec.title,
  description: spec.description,
  emptyMessage: spec.emptyMessage ?? "No matching records found.",
  records: [],
  loading: false,
  error: "",
  ...patch,
 });
 const compareRecordsByNewest = (a: DatasetRecord, b: DatasetRecord) => {
  const aValue = String(a.dt ?? a.date ?? a.timestamp ?? "");
  const bValue = String(b.dt ?? b.date ?? b.timestamp ?? "");
  return bValue.localeCompare(aValue);
 };
 const parseJsonLikePayload = (value: string): unknown => {
  const normalized = value
    .replace(/\uFEFF/g, "")
    .replace(/,\s*([}\]])/g, "$1")
    .replace(/(:\s*)(NaN|Infinity|-Infinity)\b/g, "$1null")
    .replace(/(\[\s*)(NaN|Infinity|-Infinity)\b/g, "$1null")
    .replace(/(,\s*)(NaN|Infinity|-Infinity)\b/g, "$1null")
    .replace(/(:\s*)None\b/g, "$1null")
    .replace(/(:\s*)True\b/g, "$1true")
    .replace(/(:\s*)False\b/g, "$1false")
    .replace(/(\[\s*)None\b/g, "$1null")
    .replace(/(\[\s*)True\b/g, "$1true")
    .replace(/(\[\s*)False\b/g, "$1false")
    .replace(/(,\s*)None\b/g, "$1null")
    .replace(/(,\s*)True\b/g, "$1true")
    .replace(/(,\s*)False\b/g, "$1false");
  return JSON.parse(normalized);
 };
 const tryParseRecords = (value: string) => {
  try {
    return normalizeRecordPayload(parseJsonLikePayload(value));
  } catch {
    return null;
  }
 };
 const parseRecordStringPayload = (payload: string): DatasetRecord[] | null => {
  const trimmed = payload.trim();
  if (!trimmed) {
    return [];
  }
  const direct = tryParseRecords(trimmed);
  if (direct) {
    return direct;
  }
  const ndjsonLines = trimmed
    .split(/\r?\n/)
    .map((line) => line.trim())
    .filter(Boolean);
  if (ndjsonLines.length > 0) {
    try {
      return ndjsonLines.map((line) => parseJsonLikePayload(line)) as DatasetRecord[];
    } catch {
    }
  }
  const bracketStart = trimmed.indexOf("[");
  const bracketEnd = trimmed.lastIndexOf("]");
  if (bracketStart !== -1 && bracketEnd > bracketStart) {
    const parsed = tryParseRecords(trimmed.slice(bracketStart, bracketEnd + 1));
    if (parsed) {
      return parsed;
    }
  }
  const braceStart = trimmed.indexOf("{");
  const braceEnd = trimmed.lastIndexOf("}");
  if (braceStart !== -1 && braceEnd > braceStart) {
    const parsed = tryParseRecords(trimmed.slice(braceStart, braceEnd + 1));
    if (parsed) {
      return parsed;
    }
  }
  return null;
 };
 const normalizeRecordPayload = (payload: unknown): DatasetRecord[] => {
  if (typeof payload === "string") {
    const parsed = parseRecordStringPayload(payload);
    if (parsed) {
      return parsed;
    }
    const preview = payload.trim().slice(0, 120).replace(/\s+/g, " ");
    throw new Error(
      `Corpus endpoint returned a non-JSON string payload.${
        preview ? ` Response preview: ${preview}` : ""
      }`,
    );
  }
  if (
    payload &&
    typeof payload === "object" &&
    "error" in payload &&
    typeof (payload as { error?: unknown }).error === "string"
  ) {
    throw new Error((payload as { error: string }).error);
  }
  if (Array.isArray(payload)) {
    return payload as DatasetRecord[];
  }
  if (
    payload &&
    typeof payload === "object" &&
    "data" in payload &&
    Array.isArray((payload as { data?: unknown }).data)
  ) {
    return (payload as { data: DatasetRecord[] }).data;
  }
  if (
    payload &&
    typeof payload === "object" &&
    "records" in payload &&
    Array.isArray((payload as { records?: unknown }).records)
  ) {
    return (payload as { records: DatasetRecord[] }).records;
  }
  if (
    payload &&
    typeof payload === "object" &&
    "rows" in payload &&
    Array.isArray((payload as { rows?: unknown }).rows)
  ) {
    return (payload as { rows: DatasetRecord[] }).rows;
  }
  if (
    payload &&
    typeof payload === "object" &&
    "result" in payload &&
    Array.isArray((payload as { result?: unknown }).result)
  ) {
    return (payload as { result: DatasetRecord[] }).result;
  }
  if (payload && typeof payload === "object") {
    const values = Object.values(payload);
    if (values.length === 1 && Array.isArray(values[0])) {
      return values[0] as DatasetRecord[];
    }
    if (values.every((value) => value && typeof value === "object")) {
      return values as DatasetRecord[];
    }
  }
  throw new Error("Corpus endpoint returned an unexpected payload.");
 };
 const StatPage = () => {
  const { datasetId: routeDatasetId } = useParams<{ datasetId: string }>();
-  const [error, setError] = useState('');
+  const [error, setError] = useState("");
  const [loading, setLoading] = useState(false);
-  const [activeView, setActiveView] = useState<"summary" | "emotional" | "user">("summary");
+  const [activeView, setActiveView] = useState<ActiveView>("summary");
-  const [userData, setUserData] = useState<UserAnalysisResponse | null>(null);
+  const [userData, setUserData] = useState<UserEndpointResponse | null>(null);
  const [timeData, setTimeData] = useState<TimeAnalysisResponse | null>(null);
-  const [contentData, setContentData] = useState<ContentAnalysisResponse | null>(null);
+  const [linguisticData, setLinguisticData] =
    useState<LinguisticAnalysisResponse | null>(null);
  const [emotionalData, setEmotionalData] =
    useState<EmotionalAnalysisResponse | null>(null);
  const [interactionData, setInteractionData] =
    useState<InteractionAnalysisResponse | null>(null);
  const [culturalData, setCulturalData] =
    useState<CulturalAnalysisResponse | null>(null);
  const [summary, setSummary] = useState<SummaryResponse | null>(null);
-
+  const [userStatsMeta, setUserStatsMeta] = useState<UserStatsMeta>({
    totalUsers: 0,
    mostCommentHeavyUser: null,
  });
  const [appliedFilters, setAppliedFilters] = useState<Record<string, string>>({});
  const [allRecords, setAllRecords] = useState<DatasetRecord[] | null>(null);
  const [allRecordsKey, setAllRecordsKey] = useState("");
  const [explorerState, setExplorerState] = useState<ExplorerState>(
    EMPTY_EXPLORER_STATE,
  );
  const searchInputRef = useRef<HTMLInputElement>(null);
  const beforeDateRef = useRef<HTMLInputElement>(null);
  const afterDateRef = useRef<HTMLInputElement>(null);
  const parsedDatasetId = Number(routeDatasetId ?? "");
-  const datasetId = Number.isInteger(parsedDatasetId) && parsedDatasetId > 0 ? parsedDatasetId : null;
+  const datasetId =
    Number.isInteger(parsedDatasetId) && parsedDatasetId > 0
      ? parsedDatasetId
      : null;
  const getFilterParams = () => {
    const params: Record<string, string> = {};
@@ -67,6 +304,59 @@ const StatPage = () => {
    };
  };
  const getFilterKey = (params: Record<string, string>) =>
    JSON.stringify(Object.entries(params).sort(([a], [b]) => a.localeCompare(b)));
  const ensureFilteredRecords = async () => {
    if (!datasetId) {
      throw new Error("Missing dataset id.");
    }
    const authHeaders = getAuthHeaders();
    if (!authHeaders) {
      throw new Error("You must be signed in to load corpus records.");
    }
    const filterKey = getFilterKey(appliedFilters);
    if (allRecords && allRecordsKey === filterKey) {
      return allRecords;
    }
    const response = await axios.get<unknown>(
      `${API_BASE_URL}/dataset/${datasetId}/all`,
      {
        params: appliedFilters,
        headers: authHeaders,
      },
    );
    const normalizedRecords = normalizeRecordPayload(response.data);
    setAllRecords(normalizedRecords);
    setAllRecordsKey(filterKey);
    return normalizedRecords;
  };
  const openExplorer = async (spec: CorpusExplorerSpec) => {
    setExplorerState(createExplorerState(spec, { loading: true }));
    try {
      const records = await ensureFilteredRecords();
      const context = buildExplorerContext(records);
      const matched = records
        .filter((record) => spec.matcher(record, context))
        .sort(compareRecordsByNewest);
      setExplorerState(createExplorerState(spec, { records: matched }));
    } catch (e) {
      setExplorerState(
        createExplorerState(spec, {
          error: `Failed to load corpus records: ${String(e)}`,
        }),
      );
    }
  };
  const getStats = (params: Record<string, string> = {}) => {
    if (!datasetId) {
      setError("Missing dataset id. Open /dataset/<id>/stats.");
@@ -81,32 +371,151 @@ const StatPage = () => {
    setError("");
    setLoading(true);
    setAppliedFilters(params);
    setAllRecords(null);
    setAllRecordsKey("");
    setExplorerState((current) => ({ ...current, open: false }));
    Promise.all([
-      axios.get<TimeAnalysisResponse>(`${API_BASE_URL}/dataset/${datasetId}/time`, {
+      axios.get<TimeAnalysisResponse>(`${API_BASE_URL}/dataset/${datasetId}/temporal`, {
        params,
        headers: authHeaders,
      }),
-      axios.get<UserAnalysisResponse>(`${API_BASE_URL}/dataset/${datasetId}/user`, {
+      axios.get<UserEndpointResponse>(`${API_BASE_URL}/dataset/${datasetId}/user`, {
        params,
        headers: authHeaders,
      }),
-      axios.get<ContentAnalysisResponse>(`${API_BASE_URL}/dataset/${datasetId}/content`, {
+      axios.get<LinguisticAnalysisResponse>(
        `${API_BASE_URL}/dataset/${datasetId}/linguistic`,
        {
          params,
          headers: authHeaders,
        },
      ),
      axios.get<EmotionalAnalysisResponse>(`${API_BASE_URL}/dataset/${datasetId}/emotional`, {
        params,
        headers: authHeaders,
      }),
      axios.get<InteractionAnalysisResponse>(
        `${API_BASE_URL}/dataset/${datasetId}/interactional`,
        {
          params,
          headers: authHeaders,
        },
      ),
      axios.get<SummaryResponse>(`${API_BASE_URL}/dataset/${datasetId}/summary`, {
        params,
        headers: authHeaders,
      }),
      axios.get<CulturalAnalysisResponse>(`${API_BASE_URL}/dataset/${datasetId}/cultural`, {
        params,
        headers: authHeaders,
      }),
    ])
-      .then(([timeRes, userRes, contentRes, summaryRes]) => {
+      .then(
-        setUserData(userRes.data || null);
+        ([
-        setTimeData(timeRes.data || null);
+          timeRes,
-        setContentData(contentRes.data || null);
+          userRes,
-        setSummary(summaryRes.data || null);
+          linguisticRes,
-      })
+          emotionalRes,
-      .catch((e) => setError("Failed to load statistics: " + String(e)))
+          interactionRes,
          summaryRes,
          culturalRes,
        ]) => {
          const usersList = userRes.data.users ?? [];
          const topUsersList = userRes.data.top_users ?? [];
          const interactionGraphRaw = interactionRes.data?.interaction_graph ?? {};
          const topPairsRaw = interactionRes.data?.top_interaction_pairs ?? [];
          const filteredUsers: typeof usersList = [];
          for (const user of usersList) {
            if (isDeletedUser(user.author)) continue;
            filteredUsers.push(user);
          }
          const filteredTopUsers: typeof topUsersList = [];
          for (const user of topUsersList) {
            if (isDeletedUser(user.author)) continue;
            filteredTopUsers.push(user);
          }
          let mostCommentHeavyUser: UserStatsMeta["mostCommentHeavyUser"] = null;
          for (const user of filteredUsers) {
            const currentShare = user.comment_share ?? 0;
            if (!mostCommentHeavyUser || currentShare > mostCommentHeavyUser.commentShare) {
              mostCommentHeavyUser = {
                author: user.author,
                commentShare: currentShare,
              };
            }
          }
          const topAuthors = new Set(filteredTopUsers.map((entry) => entry.author));
          const summaryUsers: User[] = [];
          for (const user of filteredUsers) {
            if (topAuthors.has(user.author)) {
              summaryUsers.push(user);
            }
          }
          const filteredInteractionGraph: Record<string, Record<string, number>> = {};
          for (const [source, targets] of Object.entries(interactionGraphRaw)) {
            if (isDeletedUser(source)) {
              continue;
            }
            const nextTargets: Record<string, number> = {};
            for (const [target, count] of Object.entries(targets)) {
              if (isDeletedUser(target)) {
                continue;
              }
              nextTargets[target] = count;
            }
            filteredInteractionGraph[source] = nextTargets;
          }
          const filteredTopInteractionPairs: typeof topPairsRaw = [];
          for (const pairEntry of topPairsRaw) {
            const pair = pairEntry[0];
            const source = pair[0];
            const target = pair[1];
            if (isDeletedUser(source) || isDeletedUser(target)) {
              continue;
            }
            filteredTopInteractionPairs.push(pairEntry);
          }
          const filteredUserData: UserEndpointResponse = {
            users: summaryUsers,
            top_users: filteredTopUsers,
          };
          const filteredInteractionData: InteractionAnalysisResponse = {
            ...interactionRes.data,
            interaction_graph: filteredInteractionGraph,
            top_interaction_pairs: filteredTopInteractionPairs,
          };
          const filteredSummary: SummaryResponse = {
            ...summaryRes.data,
            unique_users: filteredUsers.length,
          };
          setUserData(filteredUserData);
          setUserStatsMeta({
            totalUsers: filteredUsers.length,
            mostCommentHeavyUser,
          });
          setTimeData(timeRes.data || null);
          setLinguisticData(linguisticRes.data || null);
          setEmotionalData(emotionalRes.data || null);
          setInteractionData(filteredInteractionData || null);
          setCulturalData(culturalRes.data || null);
          setSummary(filteredSummary || null);
        },
      )
      .catch((e) => setError(`Failed to load statistics: ${String(e)}`))
      .finally(() => setLoading(false));
  };
@@ -129,12 +538,15 @@ const StatPage = () => {
  useEffect(() => {
    setError("");
    setAllRecords(null);
    setAllRecordsKey("");
    setExplorerState(EMPTY_EXPLORER_STATE);
    if (!datasetId) {
      setError("Missing dataset id. Open /dataset/<id>/stats.");
      return;
    }
    getStats();
-  }, [datasetId])
+  }, [datasetId]);
  if (loading) {
    return (
@@ -144,107 +556,217 @@ const StatPage = () => {
            <div style={styles.loadingSpinner} />
            <div>
              <h2 style={styles.loadingTitle}>Loading analytics</h2>
-              <p style={styles.loadingSubtitle}>Fetching summary, timeline, user, and content insights.</p>
+              <p style={styles.loadingSubtitle}>
                Fetching summary, timeline, user, and content insights.
              </p>
            </div>
          </div>
          <div style={styles.loadingSkeleton}>
-            <div style={{ ...styles.loadingSkeletonLine, ...styles.loadingSkeletonLineLong }} />
+            <div
-            <div style={{ ...styles.loadingSkeletonLine, ...styles.loadingSkeletonLineMed }} />
+              style={{
-            <div style={{ ...styles.loadingSkeletonLine, ...styles.loadingSkeletonLineShort }} />
+                ...styles.loadingSkeletonLine,
                ...styles.loadingSkeletonLineLong,
              }}
            />
            <div
              style={{
                ...styles.loadingSkeletonLine,
                ...styles.loadingSkeletonLineMed,
              }}
            />
            <div
              style={{
                ...styles.loadingSkeletonLine,
                ...styles.loadingSkeletonLineShort,
              }}
            />
          </div>
        </div>
      </div>
    );
  }
-  if (error) return <p style={{...styles.page}}>{error}</p>;
+  if (error) return <p style={{ ...styles.page }}>{error}</p>;
-return (
+  return (
-  <div style={styles.page}>
+    <div style={styles.page}>
-    <div style={{ ...styles.container, ...styles.card, ...styles.headerBar }}>
+      <div style={{ ...styles.container, ...styles.card, ...styles.headerBar }}>
-      <div style={styles.controls}>
+        <div style={styles.controls}>
-        <input
+          <input
-          type="text"
+            type="text"
-          id="query"
+            id="query"
-          ref={searchInputRef}
+            ref={searchInputRef}
-          placeholder="Search events..."
+            placeholder="Search events..."
-          style={styles.input}
+            style={styles.input}
-        />
+          />
-        <input 
+          <input
-          type="date"
+            type="date"
-          ref={beforeDateRef}
+            ref={beforeDateRef}
-          placeholder="Search before date"
+            placeholder="Search before date"
-          style={styles.input}
+            style={styles.input}
-        />
+          />
-        <input
+          <input
            type="date"
            ref={afterDateRef}
            placeholder="Search before date"
            style={styles.input}
-        />
+          />
-        <button onClick={onSubmitFilters} style={styles.buttonPrimary}>
+          <button onClick={onSubmitFilters} style={styles.buttonPrimary}>
-          Search
+            Search
-        </button>
+          </button>
-        <button onClick={resetFilters} style={styles.buttonSecondary}>
+          <button onClick={resetFilters} style={styles.buttonSecondary}>
-          Reset
+            Reset
-        </button>
+          </button>
      </div>
          <div style={styles.dashboardMeta}>Analytics Dashboard</div>
          <div style={styles.dashboardMeta}>Dataset #{datasetId ?? "-"}</div>
        </div>
-    <div style={{ ...styles.container, ...styles.tabsRow }}>
+        <div style={styles.dashboardMeta}>Analytics Dashboard</div>
-      <button
+        <div style={styles.dashboardMeta}>Dataset #{datasetId ?? "-"}</div>
        onClick={() => setActiveView("summary")}
        style={activeView === "summary" ? styles.buttonPrimary : styles.buttonSecondary}
      >
        Summary
      </button>
      <button
        onClick={() => setActiveView("emotional")}
        style={activeView === "emotional" ? styles.buttonPrimary : styles.buttonSecondary}
      >
        Emotional
      </button>
      <button
        onClick={() => setActiveView("user")}
        style={activeView === "user" ? styles.buttonPrimary : styles.buttonSecondary}
      >
        Users
      </button>
    </div>
    {activeView === "summary" && (
      <SummaryStats
        userData={userData}
        timeData={timeData}
        contentData={contentData}
        summary={summary}
      />
    )}
    {activeView === "emotional" && contentData && (
      <EmotionalStats contentData={contentData} />
    )}
    {activeView === "emotional" && !contentData && (
      <div style={{ ...styles.container, ...styles.card, marginTop: 16 }}>
        No emotional data available.
      </div>
    )}
-    {activeView === "user" && userData && (
+      <div
-      <UserStats data={userData} />
+        style={{
-    )}
+          ...styles.container,
          ...styles.tabsRow,
          justifyContent: "center",
        }}
      >
        <button
          onClick={() => setActiveView("summary")}
          style={
            activeView === "summary" ? styles.buttonPrimary : styles.buttonSecondary
          }
        >
          Summary
        </button>
        <button
          onClick={() => setActiveView("emotional")}
          style={
            activeView === "emotional"
              ? styles.buttonPrimary
              : styles.buttonSecondary
          }
        >
          Emotional
        </button>
-  </div>
+        <button
-);
+          onClick={() => setActiveView("user")}
-}
+          style={activeView === "user" ? styles.buttonPrimary : styles.buttonSecondary}
        >
          Users
        </button>
        <button
          onClick={() => setActiveView("linguistic")}
          style={
            activeView === "linguistic"
              ? styles.buttonPrimary
              : styles.buttonSecondary
          }
        >
          Linguistic
        </button>
        <button
          onClick={() => setActiveView("interactional")}
          style={
            activeView === "interactional"
              ? styles.buttonPrimary
              : styles.buttonSecondary
          }
        >
          Interactional
        </button>
        <button
          onClick={() => setActiveView("cultural")}
          style={
            activeView === "cultural" ? styles.buttonPrimary : styles.buttonSecondary
          }
        >
          Cultural
        </button>
      </div>
      {activeView === "summary" && (
        <SummaryStats
          userData={userData}
          timeData={timeData}
          linguisticData={linguisticData}
          summary={summary}
          onExplore={openExplorer}
        />
      )}
      {activeView === "emotional" && emotionalData && (
        <EmotionalStats emotionalData={emotionalData} onExplore={openExplorer} />
      )}
      {activeView === "emotional" && !emotionalData && (
        <div style={{ ...styles.container, ...styles.card, marginTop: 16 }}>
          No emotional data available.
        </div>
      )}
      {activeView === "user" && userData && interactionData && (
        <UserStats
          topUsers={userData.top_users}
          interactionGraph={interactionData.interaction_graph}
          totalUsers={userStatsMeta.totalUsers}
          mostCommentHeavyUser={userStatsMeta.mostCommentHeavyUser}
          onExplore={openExplorer}
        />
      )}
      {activeView === "user" && (!userData || !interactionData) && (
        <div style={{ ...styles.container, ...styles.card, marginTop: 16 }}>
          No user network data available.
        </div>
      )}
      {activeView === "linguistic" && linguisticData && (
        <LinguisticStats data={linguisticData} onExplore={openExplorer} />
      )}
      {activeView === "linguistic" && !linguisticData && (
        <div style={{ ...styles.container, ...styles.card, marginTop: 16 }}>
          No linguistic data available.
        </div>
      )}
      {activeView === "interactional" && interactionData && (
        <InteractionalStats data={interactionData} />
      )}
      {activeView === "interactional" && !interactionData && (
        <div style={{ ...styles.container, ...styles.card, marginTop: 16 }}>
          No interactional data available.
        </div>
      )}
      {activeView === "cultural" && culturalData && (
        <CulturalStats data={culturalData} onExplore={openExplorer} />
      )}
      {activeView === "cultural" && !culturalData && (
        <div style={{ ...styles.container, ...styles.card, marginTop: 16 }}>
          No cultural data available.
        </div>
      )}
      <CorpusExplorer
        open={explorerState.open}
        onClose={() => setExplorerState((current) => ({ ...current, open: false }))}
        title={explorerState.title}
        description={explorerState.description}
        records={explorerState.records}
        loading={explorerState.loading}
        error={explorerState.error}
        emptyMessage={explorerState.emptyMessage}
      />
    </div>
  );
 };
 export default StatPage;
--- a/frontend/src/pages/Upload.tsx
+++ b/frontend/src/pages/Upload.tsx
@@ -4,7 +4,7 @@ import { useNavigate } from "react-router-dom";
 import StatsStyling from "../styles/stats_styling";
 const styles = StatsStyling;
-const API_BASE_URL = import.meta.env.VITE_BACKEND_URL
+const API_BASE_URL = import.meta.env.VITE_BACKEND_URL;
 const UploadPage = () => {
  const [datasetName, setDatasetName] = useState("");
@@ -40,16 +40,20 @@ const UploadPage = () => {
      setHasError(false);
      setReturnMessage("");
-      const response = await axios.post(`${API_BASE_URL}/upload`, formData, {
+      const response = await axios.post(
-        headers: {
+        `${API_BASE_URL}/datasets/upload`,
-          "Content-Type": "multipart/form-data",
+        formData,
        {
          headers: {
            "Content-Type": "multipart/form-data",
          },
        },
-      });
+      );
      const datasetId = Number(response.data.dataset_id);
      setReturnMessage(
-        `Upload queued successfully (dataset #${datasetId}). Redirecting to processing status...`
+        `Upload queued successfully (dataset #${datasetId}). Redirecting to processing status...`,
      );
      setTimeout(() => {
@@ -58,7 +62,9 @@ const UploadPage = () => {
    } catch (error: unknown) {
      setHasError(true);
      if (axios.isAxiosError(error)) {
-        const message = String(error.response?.data?.error || error.message || "Upload failed.");
+        const message = String(
          error.response?.data?.error || error.message || "Upload failed.",
        );
        setReturnMessage(`Upload failed: ${message}`);
      } else {
        setReturnMessage("Upload failed due to an unexpected error.");
@@ -75,12 +81,16 @@ const UploadPage = () => {
          <div>
            <h1 style={styles.sectionHeaderTitle}>Upload Dataset</h1>
            <p style={styles.sectionHeaderSubtitle}>
-              Name your dataset, then upload posts and topic map files to generate analytics.
+              Name your dataset, then upload posts and topic map files to
              generate analytics.
            </p>
          </div>
          <button
            type="button"
-            style={{ ...styles.buttonPrimary, opacity: isSubmitting ? 0.75 : 1 }}
+            style={{
              ...styles.buttonPrimary,
              opacity: isSubmitting ? 0.75 : 1,
            }}
            onClick={uploadFiles}
            disabled={isSubmitting}
          >
@@ -96,8 +106,12 @@ const UploadPage = () => {
          }}
        >
          <div style={{ ...styles.card, gridColumn: "auto" }}>
-            <h2 style={{ ...styles.sectionTitle, color: "#24292f" }}>Dataset Name</h2>
+            <h2 style={{ ...styles.sectionTitle, color: "#24292f" }}>
-            <p style={styles.sectionSubtitle}>Use a clear label so you can identify this upload later.</p>
+              Dataset Name
            </h2>
            <p style={styles.sectionSubtitle}>
              Use a clear label so you can identify this upload later.
            </p>
            <input
              style={{ ...styles.input, ...styles.inputFullWidth }}
              type="text"
@@ -108,8 +122,12 @@ const UploadPage = () => {
          </div>
          <div style={{ ...styles.card, gridColumn: "auto" }}>
-            <h2 style={{ ...styles.sectionTitle, color: "#24292f" }}>Posts File (.jsonl)</h2>
+            <h2 style={{ ...styles.sectionTitle, color: "#24292f" }}>
-            <p style={styles.sectionSubtitle}>Upload the raw post records export.</p>
+              Posts File (.jsonl)
            </h2>
            <p style={styles.sectionSubtitle}>
              Upload the raw post records export.
            </p>
            <input
              style={{ ...styles.input, ...styles.inputFullWidth }}
              type="file"
@@ -122,16 +140,24 @@ const UploadPage = () => {
          </div>
          <div style={{ ...styles.card, gridColumn: "auto" }}>
-            <h2 style={{ ...styles.sectionTitle, color: "#24292f" }}>Topics File (.json)</h2>
+            <h2 style={{ ...styles.sectionTitle, color: "#24292f" }}>
-            <p style={styles.sectionSubtitle}>Upload your topic bucket mapping file.</p>
+              Topics File (.json)
            </h2>
            <p style={styles.sectionSubtitle}>
              Upload your topic bucket mapping file.
            </p>
            <input
              style={{ ...styles.input, ...styles.inputFullWidth }}
              type="file"
              accept=".json"
-              onChange={(event) => setTopicBucketFile(event.target.files?.[0] ?? null)}
+              onChange={(event) =>
                setTopicBucketFile(event.target.files?.[0] ?? null)
              }
            />
            <p style={styles.subtleBodyText}>
-              {topicBucketFile ? `Selected: ${topicBucketFile.name}` : "No file selected"}
+              {topicBucketFile
                ? `Selected: ${topicBucketFile.name}`
                : "No file selected"}
            </p>
          </div>
        </div>
@@ -143,7 +169,8 @@ const UploadPage = () => {
            ...(hasError ? styles.alertCardError : styles.alertCardInfo),
          }}
        >
-          {returnMessage || "After upload, your dataset is queued for processing and you'll land on stats."}
+          {returnMessage ||
            "After upload, your dataset is queued for processing and you'll land on stats."}
        </div>
      </div>
    </div>
--- a/frontend/src/stats/ActivityHeatmap.tsx
+++ b/frontend/src/stats/ActivityHeatmap.tsx
@@ -1,4 +1,5 @@
 import { ResponsiveHeatMap } from "@nivo/heatmap";
 import { memo, useMemo } from "react";
 type ApiRow = Record<number, number>;
 type ActivityHeatmapProps = {
@@ -25,8 +26,7 @@ const DAYS = [
  "Sunday",
 ];
-const hourLabel = (h: number) =>
+const hourLabel = (h: number) => `${h.toString().padStart(2, "0")}:00`;
  `${h.toString().padStart(2, "0")}:00`;
 const convertWeeklyData = (dataset: ApiRow[]): ChartSeries[] => {
  return dataset.map((dayData, index) => ({
@@ -40,32 +40,37 @@ const convertWeeklyData = (dataset: ApiRow[]): ChartSeries[] => {
  }));
 };
 const ActivityHeatmap = ({ data }: ActivityHeatmapProps) => {
-    const convertedData = convertWeeklyData(data);
+  const convertedData = useMemo(() => convertWeeklyData(data), [data]);
-    const maxValue = Math.max(
+  const maxValue = useMemo(() => {
-    ...convertedData.flatMap(day =>
+    let max = 0;
-      day.data.map(point => point.y)
+    for (const day of convertedData) {
-    )
+      for (const point of day.data) {
        if (point.y > max) {
          max = point.y;
        }
      }
    }
    return max;
  }, [convertedData]);
  return (
    <ResponsiveHeatMap
      data={convertedData}
      valueFormat=">-.2s"
      axisTop={{ tickRotation: -90 }}
      axisRight={{ legend: "Weekday", legendOffset: 70 }}
      axisLeft={{ legend: "Weekday", legendOffset: -72 }}
      colors={{
        type: "diverging",
        scheme: "red_yellow_blue",
        divergeAt: 0.3,
        minValue: 0,
        maxValue: maxValue,
      }}
    />
  );
 };
-    return (
+export default memo(ActivityHeatmap);
            <ResponsiveHeatMap
                data={convertedData}
                valueFormat=">-.2s"
                axisTop={{ tickRotation: -90 }}
                axisRight={{ legend: 'Weekday', legendOffset: 70 }}
                axisLeft={{ legend: 'Weekday', legendOffset: -72 }}
                colors={{
                    type: 'diverging',
                    scheme: 'red_yellow_blue',
                    divergeAt: 0.3,
                    minValue: 0,
                    maxValue: maxValue
                }}
        />
    )
 }
 export default ActivityHeatmap;
--- a/frontend/src/types/ApiTypes.ts
+++ b/frontend/src/types/ApiTypes.ts
@@ -1,14 +1,28 @@
-// User Responses
+// Shared types
-type TopUser = { 
+type FrequencyWord = {
-    author: string; 
+  word: string;
-    source: string; 
+  count: number;
    count: number 
 };
-type FrequencyWord = {
+type NGram = {
-    word: string;
+  count: number;
-    count: number;
+  ngram: string;
-}
+};
 type Emotion = {
  emotion_anger: number;
  emotion_disgust: number;
  emotion_fear: number;
  emotion_joy: number;
  emotion_sadness: number;
 };
 // User
 type TopUser = {
  author: string;
  source: string;
  count: number;
 };
 type Vocab = {
  author: string;
@@ -20,66 +34,160 @@ type Vocab = {
  top_words: FrequencyWord[];
 };
 type DominantTopic = {
  topic: string;
  count: number;
 };
 type User = {
  author: string;
  post: number;
  comment: number;
  comment_post_ratio: number;
  comment_share: number;
  avg_emotions?: Record<string, number>;
  dominant_topic?: DominantTopic | null;
  vocab?: Vocab | null;
 };
 type InteractionGraph = Record<string, Record<string, number>>;
 type UserEndpointResponse = {
  top_users: TopUser[];
  users: User[];
 };
 type UserAnalysisResponse = {
  top_users: TopUser[];
  users: User[];
  interaction_graph: InteractionGraph;
 };
-// Time Analysis
+// Time
 type EventsPerDay = {
-    date: Date;
+  date: Date;
-    count: number;
+  count: number;
 }
 type HeatmapCell = {
    date: Date;
    hour: number;
    count: number;
 }
 type TimeAnalysisResponse = {
    events_per_day: EventsPerDay[];
    weekday_hour_heatmap: HeatmapCell[];
 }
 // Content Analysis
 type Emotion = {
  emotion_anger: number;
  emotion_disgust: number;
  emotion_fear: number;
  emotion_joy: number;
  emotion_sadness: number;
 };
-type NGram = {
+type HeatmapCell = {
-    count: number;
+  date: Date;
-    ngram: string;
+  hour: number;
-}
+  count: number;
 };
 type TimeAnalysisResponse = {
  events_per_day: EventsPerDay[];
  weekday_hour_heatmap: HeatmapCell[];
 };
 // Content (combines emotional and linguistic)
 type AverageEmotionByTopic = Emotion & {
  n: number;
  topic: string;
  [key: string]: string | number;
 };
 type OverallEmotionAverage = {
  emotion: string;
  score: number;
 };
 type DominantEmotionDistribution = {
  emotion: string;
  count: number;
  ratio: number;
 };
 type EmotionBySource = {
  source: string;
  dominant_emotion: string;
  dominant_score: number;
  event_count: number;
 };
 type ContentAnalysisResponse = {
-    word_frequencies: FrequencyWord[];
+  word_frequencies: FrequencyWord[];
-    average_emotion_by_topic: AverageEmotionByTopic[];
+  average_emotion_by_topic: AverageEmotionByTopic[];
-    common_three_phrases: NGram[];
+  common_three_phrases: NGram[];
-    common_two_phrases: NGram[];
+  common_two_phrases: NGram[];
-}
+  overall_emotion_average?: OverallEmotionAverage[];
  dominant_emotion_distribution?: DominantEmotionDistribution[];
  emotion_by_source?: EmotionBySource[];
 };
 // Linguistic
 type LinguisticAnalysisResponse = {
  word_frequencies: FrequencyWord[];
  common_two_phrases: NGram[];
  common_three_phrases: NGram[];
  lexical_diversity?: Record<string, number>;
 };
 // Emotional
 type EmotionalAnalysisResponse = {
  average_emotion_by_topic: AverageEmotionByTopic[];
  overall_emotion_average?: OverallEmotionAverage[];
  dominant_emotion_distribution?: DominantEmotionDistribution[];
  emotion_by_source?: EmotionBySource[];
 };
 // Interactional
 type ConversationConcentration = {
  total_commenting_authors: number;
  top_10pct_author_count: number;
  top_10pct_comment_share: number;
  single_comment_authors: number;
  single_comment_author_ratio: number;
 };
 type InteractionAnalysisResponse = {
  top_interaction_pairs?: [[string, string], number][];
  conversation_concentration?: ConversationConcentration;
  interaction_graph: InteractionGraph;
 };
 // Cultural
 type IdentityMarkers = {
  in_group_usage: number;
  out_group_usage: number;
  in_group_ratio: number;
  out_group_ratio: number;
  in_group_posts: number;
  out_group_posts: number;
  tie_posts: number;
  in_group_emotion_avg?: Record<string, number>;
  out_group_emotion_avg?: Record<string, number>;
 };
 type StanceMarkers = {
  hedge_total: number;
  certainty_total: number;
  deontic_total: number;
  permission_total: number;
  hedge_per_1k_tokens: number;
  certainty_per_1k_tokens: number;
  deontic_per_1k_tokens: number;
  permission_per_1k_tokens: number;
  hedge_emotion_avg?: Record<string, number>;
  certainty_emotion_avg?: Record<string, number>;
  deontic_emotion_avg?: Record<string, number>;
  permission_emotion_avg?: Record<string, number>;
 };
 type EntityEmotionAggregate = {
  post_count: number;
  emotion_avg: Record<string, number>;
 };
 type AverageEmotionPerEntity = {
  entity_emotion_avg: Record<string, EntityEmotionAggregate>;
 };
 type CulturalAnalysisResponse = {
  identity_markers?: IdentityMarkers;
  stance_markers?: StanceMarkers;
  avg_emotion_per_entity?: AverageEmotionPerEntity;
 };
 // Summary
 type SummaryResponse = {
@@ -96,22 +204,36 @@ type SummaryResponse = {
  sources: string[];
 };
-// Filtering Response
+// Filter
 type FilterResponse = {
-    rows: number
+  rows: number;
-    data: any;
+  data: any;
-}
+};
 export type {
-    TopUser,
+  TopUser,
-    Vocab,
+  DominantTopic,
-    User,
+  Vocab,
-    InteractionGraph,
+  User,
-    UserAnalysisResponse,
+  InteractionGraph,
-    FrequencyWord,
+  ConversationConcentration,
-    AverageEmotionByTopic,
+  UserAnalysisResponse,
-    SummaryResponse,
+  UserEndpointResponse,
-    TimeAnalysisResponse,
+  FrequencyWord,
-    ContentAnalysisResponse,
+  AverageEmotionByTopic,
-    FilterResponse
+  OverallEmotionAverage,
-}
+  DominantEmotionDistribution,
  EmotionBySource,
  SummaryResponse,
  TimeAnalysisResponse,
  ContentAnalysisResponse,
  LinguisticAnalysisResponse,
  EmotionalAnalysisResponse,
  InteractionAnalysisResponse,
  IdentityMarkers,
  StanceMarkers,
  EntityEmotionAggregate,
  AverageEmotionPerEntity,
  CulturalAnalysisResponse,
  FilterResponse,
 };
--- a/frontend/src/utils/corpusExplorer.ts
+++ b/frontend/src/utils/corpusExplorer.ts
@@ -0,0 +1,371 @@
 type EntityRecord = {
  text?: string;
  [key: string]: unknown;
 };
 type DatasetRecord = {
  id?: string | number;
  post_id?: string | number | null;
  parent_id?: string | number | null;
  author?: string | null;
  title?: string | null;
  content?: string | null;
  timestamp?: string | number | null;
  date?: string | null;
  dt?: string | null;
  hour?: number | null;
  weekday?: string | null;
  reply_to?: string | number | null;
  source?: string | null;
  topic?: string | null;
  topic_confidence?: number | null;
  type?: string | null;
  ner_entities?: EntityRecord[] | null;
  emotion_anger?: number | null;
  emotion_disgust?: number | null;
  emotion_fear?: number | null;
  emotion_joy?: number | null;
  emotion_sadness?: number | null;
  [key: string]: unknown;
 };
 type CorpusExplorerContext = {
  authorByPostId: Map<string, string>;
  authorEventCounts: Map<string, number>;
  authorCommentCounts: Map<string, number>;
 };
 type CorpusExplorerSpec = {
  title: string;
  description: string;
  emptyMessage?: string;
  matcher: (record: DatasetRecord, context: CorpusExplorerContext) => boolean;
 };
 const IN_GROUP_PATTERN = /\b(we|us|our|ourselves)\b/gi;
 const OUT_GROUP_PATTERN = /\b(they|them|their|themselves)\b/gi;
 const HEDGE_PATTERN = /\b(maybe|perhaps|possibly|probably|likely|seems|seem|i think|i feel|i guess|kind of|sort of|somewhat)\b/i;
 const CERTAINTY_PATTERN = /\b(definitely|certainly|clearly|obviously|undeniably|always|never)\b/i;
 const DEONTIC_PATTERN = /\b(must|should|need|needs|have to|has to|ought|required|require)\b/i;
 const PERMISSION_PATTERN = /\b(can|allowed|okay|ok|permitted)\b/i;
 const EMOTION_KEYS = [
  "emotion_anger",
  "emotion_disgust",
  "emotion_fear",
  "emotion_joy",
  "emotion_sadness",
 ] as const;
 const toText = (value: unknown) => {
  if (typeof value === "string") {
    return value;
  }
  if (typeof value === "number" || typeof value === "boolean") {
    return String(value);
  }
  if (value && typeof value === "object" && "id" in value) {
    const id = (value as { id?: unknown }).id;
    if (typeof id === "string" || typeof id === "number") {
      return String(id);
    }
  }
  return "";
 };
 const normalize = (value: unknown) => toText(value).trim().toLowerCase();
 const getAuthor = (record: DatasetRecord) => toText(record.author).trim();
 const getRecordText = (record: DatasetRecord) =>
  `${record.title ?? ""} ${record.content ?? ""}`.trim();
 const escapeRegExp = (value: string) =>
  value.replace(/[.*+?^${}()|[\]\\]/g, "\\$&");
 const buildPhrasePattern = (phrase: string) => {
  const tokens = phrase
    .toLowerCase()
    .trim()
    .split(/\s+/)
    .filter(Boolean)
    .map(escapeRegExp);
  if (!tokens.length) {
    return null;
  }
  return new RegExp(`\\b${tokens.join("\\s+")}\\b`, "i");
 };
 const countMatches = (pattern: RegExp, text: string) =>
  Array.from(text.matchAll(new RegExp(pattern.source, "gi"))).length;
 const getDateBucket = (record: DatasetRecord) => {
  if (typeof record.date === "string" && record.date) {
    return record.date.slice(0, 10);
  }
  if (typeof record.dt === "string" && record.dt) {
    return record.dt.slice(0, 10);
  }
  if (typeof record.timestamp === "number") {
    return new Date(record.timestamp * 1000).toISOString().slice(0, 10);
  }
  if (typeof record.timestamp === "string" && record.timestamp) {
    const numeric = Number(record.timestamp);
    if (Number.isFinite(numeric)) {
      return new Date(numeric * 1000).toISOString().slice(0, 10);
    }
  }
  return "";
 };
 const getDominantEmotion = (record: DatasetRecord) => {
  let bestKey = "";
  let bestValue = Number.NEGATIVE_INFINITY;
  for (const key of EMOTION_KEYS) {
    const value = Number(record[key] ?? Number.NEGATIVE_INFINITY);
    if (value > bestValue) {
      bestValue = value;
      bestKey = key;
    }
  }
  return bestKey.replace("emotion_", "");
 };
 const matchesPhrase = (record: DatasetRecord, phrase: string) => {
  const pattern = buildPhrasePattern(phrase);
  if (!pattern) {
    return false;
  }
  return pattern.test(getRecordText(record));
 };
 const recordIdentityBucket = (record: DatasetRecord) => {
  const text = getRecordText(record);
  const inHits = countMatches(IN_GROUP_PATTERN, text);
  const outHits = countMatches(OUT_GROUP_PATTERN, text);
  if (inHits > outHits) {
    return "in";
  }
  if (outHits > inHits) {
    return "out";
  }
  return "tie";
 };
 const buildExplorerContext = (records: DatasetRecord[]): CorpusExplorerContext => {
  const authorByPostId = new Map<string, string>();
  const authorEventCounts = new Map<string, number>();
  const authorCommentCounts = new Map<string, number>();
  for (const record of records) {
    const author = getAuthor(record);
    if (!author) {
      continue;
    }
    authorEventCounts.set(author, (authorEventCounts.get(author) ?? 0) + 1);
    if (record.type === "comment") {
      authorCommentCounts.set(author, (authorCommentCounts.get(author) ?? 0) + 1);
    }
    if (record.post_id !== null && record.post_id !== undefined) {
      authorByPostId.set(String(record.post_id), author);
    }
  }
  return { authorByPostId, authorEventCounts, authorCommentCounts };
 };
 const buildAllRecordsSpec = (): CorpusExplorerSpec => ({
  title: "Corpus Explorer",
  description: "All records in the current filtered dataset.",
  emptyMessage: "No records match the current filters.",
  matcher: () => true,
 });
 const buildUserSpec = (author: string): CorpusExplorerSpec => {
  const target = normalize(author);
  return {
    title: `User: ${author}`,
    description: `All records authored by ${author}.`,
    emptyMessage: `No records found for ${author}.`,
    matcher: (record) => normalize(record.author) === target,
  };
 };
 const buildTopicSpec = (topic: string): CorpusExplorerSpec => {
  const target = normalize(topic);
  return {
    title: `Topic: ${topic}`,
    description: `Records assigned to the ${topic} topic bucket.`,
    emptyMessage: `No records found in the ${topic} topic bucket.`,
    matcher: (record) => normalize(record.topic) === target,
  };
 };
 const buildDateBucketSpec = (date: string): CorpusExplorerSpec => ({
  title: `Date Bucket: ${date}`,
  description: `Records from the ${date} activity bucket.`,
  emptyMessage: `No records found on ${date}.`,
  matcher: (record) => getDateBucket(record) === date,
 });
 const buildWordSpec = (word: string): CorpusExplorerSpec => ({
  title: `Word: ${word}`,
  description: `Records containing the word ${word}.`,
  emptyMessage: `No records mention ${word}.`,
  matcher: (record) => matchesPhrase(record, word),
 });
 const buildNgramSpec = (ngram: string): CorpusExplorerSpec => ({
  title: `N-gram: ${ngram}`,
  description: `Records containing the phrase ${ngram}.`,
  emptyMessage: `No records contain the phrase ${ngram}.`,
  matcher: (record) => matchesPhrase(record, ngram),
 });
 const buildEntitySpec = (entity: string): CorpusExplorerSpec => {
  const target = normalize(entity);
  return {
    title: `Entity: ${entity}`,
    description: `Records mentioning the ${entity} entity.`,
    emptyMessage: `No records found for the ${entity} entity.`,
    matcher: (record) => {
      const entities = Array.isArray(record.ner_entities) ? record.ner_entities : [];
      return entities.some((item) => normalize(item?.text) === target) || matchesPhrase(record, entity);
    },
  };
 };
 const buildSourceSpec = (source: string): CorpusExplorerSpec => {
  const target = normalize(source);
  return {
    title: `Source: ${source}`,
    description: `Records from the ${source} source.`,
    emptyMessage: `No records found for ${source}.`,
    matcher: (record) => normalize(record.source) === target,
  };
 };
 const buildDominantEmotionSpec = (emotion: string): CorpusExplorerSpec => {
  const target = normalize(emotion);
  return {
    title: `Dominant Emotion: ${emotion}`,
    description: `Records where ${emotion} is the strongest emotion score.`,
    emptyMessage: `No records found with dominant emotion ${emotion}.`,
    matcher: (record) => getDominantEmotion(record) === target,
  };
 };
 const buildReplyPairSpec = (source: string, target: string): CorpusExplorerSpec => {
  const sourceName = normalize(source);
  const targetName = normalize(target);
  return {
    title: `Reply Path: ${source} -> ${target}`,
    description: `Reply records authored by ${source} in response to ${target}.`,
    emptyMessage: `No reply records found for ${source} -> ${target}.`,
    matcher: (record, context) => {
      if (normalize(record.author) !== sourceName) {
        return false;
      }
      const replyTo = record.reply_to;
      if (replyTo === null || replyTo === undefined || replyTo === "") {
        return false;
      }
      return normalize(context.authorByPostId.get(String(replyTo))) === targetName;
    },
  };
 };
 const buildOneTimeUsersSpec = (): CorpusExplorerSpec => ({
  title: "One-Time Users",
  description: "Records written by authors who appear exactly once in the filtered corpus.",
  emptyMessage: "No one-time-user records found.",
  matcher: (record, context) => {
    const author = getAuthor(record);
    return !!author && context.authorEventCounts.get(author) === 1;
  },
 });
 const buildIdentityBucketSpec = (bucket: "in" | "out" | "tie"): CorpusExplorerSpec => {
  const labels = {
    in: "In-Group Posts",
    out: "Out-Group Posts",
    tie: "Balanced Posts",
  } as const;
  return {
    title: labels[bucket],
    description: `Records in the ${labels[bucket].toLowerCase()} cultural bucket.`,
    emptyMessage: `No records found for ${labels[bucket].toLowerCase()}.`,
    matcher: (record) => recordIdentityBucket(record) === bucket,
  };
 };
 const buildPatternSpec = (
  title: string,
  description: string,
  pattern: RegExp,
 ): CorpusExplorerSpec => ({
  title,
  description,
  emptyMessage: `No records found for ${title.toLowerCase()}.`,
  matcher: (record) => pattern.test(getRecordText(record)),
 });
 const buildHedgeSpec = () =>
  buildPatternSpec("Hedging Words", "Records containing hedging language.", HEDGE_PATTERN);
 const buildCertaintySpec = () =>
  buildPatternSpec("Certainty Words", "Records containing certainty language.", CERTAINTY_PATTERN);
 const buildDeonticSpec = () =>
  buildPatternSpec("Need/Should Words", "Records containing deontic language.", DEONTIC_PATTERN);
 const buildPermissionSpec = () =>
  buildPatternSpec("Permission Words", "Records containing permission language.", PERMISSION_PATTERN);
 export type { DatasetRecord, CorpusExplorerSpec };
 export {
  buildAllRecordsSpec,
  buildCertaintySpec,
  buildDateBucketSpec,
  buildDeonticSpec,
  buildDominantEmotionSpec,
  buildEntitySpec,
  buildExplorerContext,
  buildHedgeSpec,
  buildIdentityBucketSpec,
  buildNgramSpec,
  buildOneTimeUsersSpec,
  buildPermissionSpec,
  buildReplyPairSpec,
  buildSourceSpec,
  buildTopicSpec,
  buildUserSpec,
  buildWordSpec,
  getDateBucket,
  toText,
 };
--- a/frontend/src/utils/documentTitle.ts
+++ b/frontend/src/utils/documentTitle.ts
@@ -3,6 +3,7 @@ const DEFAULT_TITLE = "Ethnograph View";
 const STATIC_TITLES: Record<string, string> = {
  "/login": "Sign In",
  "/upload": "Upload Dataset",
  "/auto-fetch": "Auto Fetch Dataset",
  "/datasets": "My Datasets",
 };
@@ -12,7 +13,7 @@ export const getDocumentTitle = (pathname: string) => {
  }
  if (pathname.includes("stats")) {
-    return "Ethnography Analysis"
+    return "Ethnography Analysis";
  }
  return STATIC_TITLES[pathname] ?? DEFAULT_TITLE;
--- a/main.py
+++ b/main.py
@@ -1,4 +0,0 @@
 import server.app
 if __name__ == "__main__":
    server.app.app.run(debug=True)
--- a/report/img/analysis_bar.png
+++ b/report/img/analysis_bar.png
--- a/report/img/architecture.png
+++ b/report/img/architecture.png
--- a/report/img/cork_temporal.png
+++ b/report/img/cork_temporal.png
--- a/report/img/flooding_posts.png
+++ b/report/img/flooding_posts.png
--- a/report/img/frontend.png
+++ b/report/img/frontend.png
--- a/report/img/gantt.png
+++ b/report/img/gantt.png
--- a/report/img/heatmap.png
+++ b/report/img/heatmap.png
--- a/report/img/interaction_graph.png
+++ b/report/img/interaction_graph.png
--- a/report/img/kpi_card.png
+++ b/report/img/kpi_card.png
--- a/report/img/moods.png
+++ b/report/img/moods.png
--- a/report/img/navbar.png
+++ b/report/img/navbar.png
--- a/report/img/ngrams.png
+++ b/report/img/ngrams.png
--- a/report/img/nlp_backoff.png
+++ b/report/img/nlp_backoff.png
--- a/report/img/pipeline.png
+++ b/report/img/pipeline.png
--- a/report/img/reddit_bot.png
+++ b/report/img/reddit_bot.png
--- a/report/img/schema.png
+++ b/report/img/schema.png
--- a/report/img/signature.jpg
+++ b/report/img/signature.jpg
--- a/report/img/stance_markers.png
+++ b/report/img/stance_markers.png
--- a/report/img/topic_emotions.png
+++ b/report/img/topic_emotions.png
--- a/report/img/ucc_crest.png
+++ b/report/img/ucc_crest.png
--- a/report/main.tex
+++ b/report/main.tex
--- a/report/references.bib
+++ b/report/references.bib
@@ -0,0 +1,149 @@
@online{reddit_api,
  author  = {{Reddit Inc.}},
  title   = {Reddit API Documentation},
  year    = {2025},
  url     = {https://www.reddit.com/dev/api/},
  urldate = {2026-04-08}
 }
@misc{hartmann2022emotionenglish,
  author={Hartmann, Jochen},
  title={Emotion English DistilRoBERTa-base},
  year={2022},
  howpublished = {\url{https://huggingface.co/j-hartmann/emotion-english-distilroberta-base/}},
 }
@misc{all_mpnet_base_v2,
  author={Microsoft Research},
  title={All-MPNet-Base-V2},
  year={2021},
  howpublished = {\url{https://huggingface.co/sentence-transformers/all-mpnet-base-v2}},
 }
@misc{minilm_l6_v2,
  author={Microsoft Research},
  title={MiniLM-L6-V2},
  year={2021},
  howpublished = {\url{https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2}},
 }
@misc{dslim_bert_base_ner,
  author={deepset},
  title={dslim/bert-base-NER},
  year={2018},
  howpublished = {\url{https://huggingface.co/dslim/bert-base-NER}},
 }
@inproceedings{demszky2020goemotions,
 author = {Demszky, Dorottya and Movshovitz-Attias, Dana and Ko, Jeongwoo and Cowen, Alan and Nemade, Gaurav and Ravi, Sujith},
 booktitle = {58th Annual Meeting of the Association for Computational Linguistics (ACL)},
 title = {{GoEmotions: A Dataset of Fine-Grained Emotions}},
 year = {2020}
 }
@article{dominguez2007virtual,
  author    = {Domínguez, Daniel and Beaulieu, Anne and Estalella, Adolfo and Gómez, Edgar and Schnettler, Bernt and Read, Rosie},
  title     = {Virtual Ethnography},
  journal   = {Forum Qualitative Sozialforschung / Forum: Qualitative Social Research},
  year      = {2007},
  volume    = {8},
  number    = {3},
  url       = {http://nbn-resolving.de/urn:nbn:de:0114-fqs0703E19}
 }
@article{sun2014lurkers,
  author  = {Sun, Na and Rau, Pei-Luen Patrick and Ma, Liang},
  title   = {Understanding Lurkers in Online Communities: A Literature Review},
  journal = {Computers in Human Behavior},
  year    = {2014},
  volume  = {38},
  pages   = {110--117},
  doi     = {10.1016/j.chb.2014.05.022}
 }
@article{ahmad2024sentiment,
  author  = {Ahmad, Waqar and others},
  title   = {Recent Advancements and Challenges of NLP-based Sentiment Analysis: A State-of-the-art Review},
  journal = {Natural Language Processing Journal},
  year    = {2024},
  doi     = {10.1016/j.nlp.2024.100059}
 }
@article{coleman2010ethnographic,
  ISSN = {00846570},
  URL = {http://www.jstor.org/stable/25735124},
  abstract = {This review surveys and divides the ethnographic corpus on digital media into three broad but overlapping categories: the cultural politics of digital media, the vernacular cultures of digital media, and the prosaics of digital media. Engaging these three categories of scholarship on digital media, I consider how ethnographers are exploring the complex relationships between the local practices and global implications of digital media, their materiality and politics, and thier banal, as well as profound, presence in cultural life and modes of communication. I consider the way these media have become central to the articulation of cherished beliefs, ritual practices, and modes of being in the world; the fact that digital media culturally matters is undeniable but showing how, where, and why it matters is necessary to push against peculiarly narrow presumptions about the universality of digital experience.},
  author = {E. Gabriella Coleman},
  journal = {Annual Review of Anthropology},
  pages = {487--505},
  publisher = {Annual Reviews},
  title = {Ethnographic Approaches to Digital Media},
  urldate = {2026-04-15},
  volume = {39},
  year = {2010}
 }
@article{shen2021stance,
  author  = {Shen, Qian and Tao, Yating},
  title   = {Stance Markers in {English} Medical Research Articles and Newspaper Opinion Columns: A Comparative Corpus-Based Study},
  journal = {PLOS ONE},
  volume  = {16},
  number  = {3},
  pages   = {e0247981},
  year    = {2021},
  doi     = {10.1371/journal.pone.0247981}
 }
@incollection{medvedev2019anatomy,
  author    = {Medvedev, Alexey N. and Lambiotte, Renaud and Delvenne, Jean-Charles},
  title     = {The Anatomy of Reddit: An Overview of Academic Research},
  booktitle = {Dynamics On and Of Complex Networks III},
  series    = {Springer Proceedings in Complexity},
  publisher = {Springer},
  year      = {2019},
  pages     = {183--204}
 }
@misc{cook2023ethnography,
  author       = {Cook, Chloe},
  title        = {What is the Difference Between Ethnography and Digital Ethnography?},
  year         = {2023},
  month        = jan,
  day          = {19},
  howpublished = {\url{https://ethosapp.com/blog/what-is-the-difference-between-ethnography-and-digital-ethnography/}},
  note         = {Accessed: 2026-04-16},
  organization = {EthOS}
 }
@misc{giuffre2026sentiment,
  author       = {Giuffre, Steven},
  title        = {What is Sentiment Analysis?},
  year         = {2026},
  month        = mar,
  howpublished = {\url{https://www.vonage.com/resources/articles/sentiment-analysis/}},
  note         = {Accessed: 2026-04-16},
  organization = {Vonage}
 }
@misc{mungalpara2022stemming,
  author       = {Mungalpara, Jaimin},
  title        = {Stemming Lemmatization Stopwords and {N}-Grams in {NLP}},
  year         = {2022},
  month        = jul,
  day          = {26},
  howpublished = {\url{https://jaimin-ml2001.medium.com/stemming-lemmatization-stopwords-and-n-grams-in-nlp-96f8e8b6aa6f}},
  note         = {Accessed: 2026-04-16},
  organization = {Medium}
 }
@misc{chugani2025ethicalscraping,
  author       = {Chugani, Vinod},
  title        = {Ethical Web Scraping: Principles and Practices},
  year         = {2025},
  month        = apr,
  day          = {21},
  howpublished = {\url{https://www.datacamp.com/blog/ethical-web-scraping}},
  note         = {Accessed: 2026-04-16},
  organization = {DataCamp}
 }
--- a/requirements.txt
+++ b/requirements.txt
@@ -16,3 +16,4 @@ Requests==2.32.5
 sentence_transformers==5.2.2
 torch==2.10.0
 transformers==5.1.0
 gunicorn==25.3.0
--- a/server/analysis/cultural.py
+++ b/server/analysis/cultural.py
@@ -15,7 +15,8 @@ class CulturalAnalysis:
        emotion_exclusions = {"emotion_neutral", "emotion_surprise"}
        emotion_cols = [
-            c for c in df.columns
+            c
            for c in df.columns
            if c.startswith("emotion_") and c not in emotion_exclusions
        ]
@@ -40,7 +41,6 @@ class CulturalAnalysis:
            "out_group_usage": out_count,
            "in_group_ratio": round(in_count / max(total_tokens, 1), 5),
            "out_group_ratio": round(out_count / max(total_tokens, 1), 5),
            "in_group_posts": int(in_mask.sum()),
            "out_group_posts": int(out_mask.sum()),
            "tie_posts": int(tie_mask.sum()),
@@ -49,8 +49,16 @@ class CulturalAnalysis:
        if emotion_cols:
            emo = df[emotion_cols].apply(pd.to_numeric, errors="coerce").fillna(0.0)
-            in_avg = emo.loc[in_mask].mean() if in_mask.any() else pd.Series(0.0, index=emotion_cols)
+            in_avg = (
-            out_avg = emo.loc[out_mask].mean() if out_mask.any() else pd.Series(0.0, index=emotion_cols)
+                emo.loc[in_mask].mean()
                if in_mask.any()
                else pd.Series(0.0, index=emotion_cols)
            )
            out_avg = (
                emo.loc[out_mask].mean()
                if out_mask.any()
                else pd.Series(0.0, index=emotion_cols)
            )
            result["in_group_emotion_avg"] = in_avg.to_dict()
            result["out_group_emotion_avg"] = out_avg.to_dict()
@@ -59,10 +67,22 @@ class CulturalAnalysis:
    def get_stance_markers(self, df: pd.DataFrame) -> dict[str, Any]:
        s = df[self.content_col].fillna("").astype(str)
        emotion_exclusions = {"emotion_neutral", "emotion_surprise"}
        emotion_cols = [
            c
            for c in df.columns
            if c.startswith("emotion_") and c not in emotion_exclusions
        ]
-        hedge_pattern = re.compile(r"\b(maybe|perhaps|possibly|probably|likely|seems|seem|i think|i feel|i guess|kind of|sort of|somewhat)\b")
+        hedge_pattern = re.compile(
-        certainty_pattern = re.compile(r"\b(definitely|certainly|clearly|obviously|undeniably|always|never)\b")
+            r"\b(maybe|perhaps|possibly|probably|likely|seems|seem|i think|i feel|i guess|kind of|sort of|somewhat)\b"
-        deontic_pattern = re.compile(r"\b(must|should|need|needs|have to|has to|ought|required|require)\b")
+        )
        certainty_pattern = re.compile(
            r"\b(definitely|certainly|clearly|obviously|undeniably|always|never)\b"
        )
        deontic_pattern = re.compile(
            r"\b(must|should|need|needs|have to|has to|ought|required|require)\b"
        )
        permission_pattern = re.compile(r"\b(can|allowed|okay|ok|permitted)\b")
        hedge_counts = s.str.count(hedge_pattern)
@@ -70,31 +90,73 @@ class CulturalAnalysis:
        deontic_counts = s.str.count(deontic_pattern)
        perm_counts = s.str.count(permission_pattern)
-        token_counts = s.apply(lambda t: len(re.findall(r"\b[a-z]{2,}\b", t))).replace(0, 1)
+        token_counts = s.apply(lambda t: len(re.findall(r"\b[a-z]{2,}\b", t))).replace(
            0, 1
        )
-        return {
+        result = {
            "hedge_total": int(hedge_counts.sum()),
            "certainty_total": int(certainty_counts.sum()),
            "deontic_total": int(deontic_counts.sum()),
            "permission_total": int(perm_counts.sum()),
-            "hedge_per_1k_tokens": round(1000 * hedge_counts.sum() / token_counts.sum(), 3),
+            "hedge_per_1k_tokens": round(
-            "certainty_per_1k_tokens": round(1000 * certainty_counts.sum() / token_counts.sum(), 3),
+                1000 * hedge_counts.sum() / token_counts.sum(), 3
-            "deontic_per_1k_tokens": round(1000 * deontic_counts.sum() / token_counts.sum(), 3),
+            ),
-            "permission_per_1k_tokens": round(1000 * perm_counts.sum() / token_counts.sum(), 3),
+            "certainty_per_1k_tokens": round(
                1000 * certainty_counts.sum() / token_counts.sum(), 3
            ),
            "deontic_per_1k_tokens": round(
                1000 * deontic_counts.sum() / token_counts.sum(), 3
            ),
            "permission_per_1k_tokens": round(
                1000 * perm_counts.sum() / token_counts.sum(), 3
            ),
        }
-    def get_avg_emotions_per_entity(self, df: pd.DataFrame, top_n: int = 25, min_posts: int = 10) -> dict[str, Any]:
+        if emotion_cols:
-        if "entities" not in df.columns:
+            emo = df[emotion_cols].apply(pd.to_numeric, errors="coerce").fillna(0.0)
            result["hedge_emotion_avg"] = (
                emo.loc[hedge_counts > 0].mean()
                if (hedge_counts > 0).any()
                else pd.Series(0.0, index=emotion_cols)
            ).to_dict()
            result["certainty_emotion_avg"] = (
                emo.loc[certainty_counts > 0].mean()
                if (certainty_counts > 0).any()
                else pd.Series(0.0, index=emotion_cols)
            ).to_dict()
            result["deontic_emotion_avg"] = (
                emo.loc[deontic_counts > 0].mean()
                if (deontic_counts > 0).any()
                else pd.Series(0.0, index=emotion_cols)
            ).to_dict()
            result["permission_emotion_avg"] = (
                emo.loc[perm_counts > 0].mean()
                if (perm_counts > 0).any()
                else pd.Series(0.0, index=emotion_cols)
            ).to_dict()
        return result
    def get_avg_emotions_per_entity(
        self, df: pd.DataFrame, top_n: int = 25, min_posts: int = 10
    ) -> dict[str, Any]:
        if "ner_entities" not in df.columns:
            return {"entity_emotion_avg": {}}
        emotion_cols = [c for c in df.columns if c.startswith("emotion_")]
-        entity_df = df[["entities"] + emotion_cols].explode("entities")
+        entity_df = df[["ner_entities"] + emotion_cols].explode("ner_entities")
-        entity_df["entity_text"] = entity_df["entities"].apply(
+        entity_df["entity_text"] = entity_df["ner_entities"].apply(
-            lambda e: e.get("text").strip()
+            lambda e: (
-            if isinstance(e, dict) and isinstance(e.get("text"), str) and len(e.get("text")) >= 3
+                e.get("text").strip()
-            else None
+                if isinstance(e, dict)
                and isinstance(e.get("text"), str)
                and len(e.get("text")) >= 3
                else None
            )
        )
        entity_df = entity_df.dropna(subset=["entity_text"])
--- a/server/analysis/emotional.py
+++ b/server/analysis/emotional.py
@@ -1,33 +1,86 @@
 import pandas as pd
 class EmotionalAnalysis:
-    def avg_emotion_by_topic(self, df: pd.DataFrame) -> dict:
+    def _emotion_cols(self, df: pd.DataFrame) -> list[str]:
-        emotion_cols = [
+        return [col for col in df.columns if col.startswith("emotion_")]
-            col for col in df.columns
+
-            if col.startswith("emotion_")
+    def avg_emotion_by_topic(self, df: pd.DataFrame) -> list[dict]:
-        ]
+        emotion_cols = self._emotion_cols(df)
        if not emotion_cols:
            return []
        counts = (
-            df[
+            df[(df["topic"] != "Misc")].groupby("topic").size().reset_index(name="n")
                (df["topic"] != "Misc")
            ]
            .groupby("topic")
            .size()
            .rename("n")
        )
        avg_emotion_by_topic = (
-            df[
+            df[(df["topic"] != "Misc")]
                (df["topic"] != "Misc")
            ]
            .groupby("topic")[emotion_cols]
            .mean()
            .reset_index()
        )
-        avg_emotion_by_topic = avg_emotion_by_topic.merge(
+        avg_emotion_by_topic = avg_emotion_by_topic.merge(counts, on="topic")
            counts,
            on="topic"
        )
-        return avg_emotion_by_topic.to_dict(orient='records')
+        return avg_emotion_by_topic.to_dict(orient="records")
    def overall_emotion_average(self, df: pd.DataFrame) -> list[dict]:
        emotion_cols = self._emotion_cols(df)
        if not emotion_cols:
            return []
        means = df[emotion_cols].mean()
        return [
            {
                "emotion": col.replace("emotion_", ""),
                "score": float(means[col]),
            }
            for col in emotion_cols
        ]
    def dominant_emotion_distribution(self, df: pd.DataFrame) -> list[dict]:
        emotion_cols = self._emotion_cols(df)
        if not emotion_cols or df.empty:
            return []
        dominant_per_row = df[emotion_cols].idxmax(axis=1)
        counts = dominant_per_row.value_counts()
        total = max(len(dominant_per_row), 1)
        return [
            {
                "emotion": col.replace("emotion_", ""),
                "count": int(count),
                "ratio": round(float(count / total), 4),
            }
            for col, count in counts.items()
        ]
    def emotion_by_source(self, df: pd.DataFrame) -> list[dict]:
        emotion_cols = self._emotion_cols(df)
        if not emotion_cols or "source" not in df.columns or df.empty:
            return []
        source_counts = df.groupby("source").size()
        source_means = df.groupby("source")[emotion_cols].mean().reset_index()
        rows = source_means.to_dict(orient="records")
        output = []
        for row in rows:
            source = row["source"]
            dominant_col = max(emotion_cols, key=lambda col: float(row.get(col, 0)))
            output.append(
                {
                    "source": str(source),
                    "dominant_emotion": dominant_col.replace("emotion_", ""),
                    "dominant_score": round(float(row.get(dominant_col, 0)), 4),
                    "event_count": int(source_counts.get(source, 0)),
                }
            )
        return output
--- a/server/analysis/enrichment.py
+++ b/server/analysis/enrichment.py
@@ -2,6 +2,7 @@ import pandas as pd
 from server.analysis.nlp import NLP
 class DatasetEnrichment:
    def __init__(self, df: pd.DataFrame, topics: dict):
        self.df = self._explode_comments(df)
@@ -10,7 +11,9 @@ class DatasetEnrichment:
    def _explode_comments(self, df) -> pd.DataFrame:
        comments_df = df[["id", "comments"]].explode("comments")
-        comments_df = comments_df[comments_df["comments"].apply(lambda x: isinstance(x, dict))]
+        comments_df = comments_df[
            comments_df["comments"].apply(lambda x: isinstance(x, dict))
        ]
        comments_df = pd.json_normalize(comments_df["comments"])
        posts_df = df.drop(columns=["comments"])
@@ -26,8 +29,8 @@ class DatasetEnrichment:
        return df
    def enrich(self) -> pd.DataFrame:
-        self.df['timestamp'] = pd.to_numeric(self.df['timestamp'], errors='raise')
+        self.df["timestamp"] = pd.to_numeric(self.df["timestamp"], errors="raise")
-        self.df['date'] = pd.to_datetime(self.df['timestamp'], unit='s').dt.date
+        self.df["date"] = pd.to_datetime(self.df["timestamp"], unit="s").dt.date
        self.df["dt"] = pd.to_datetime(self.df["timestamp"], unit="s", utc=True)
        self.df["hour"] = self.df["dt"].dt.hour
        self.df["weekday"] = self.df["dt"].dt.day_name()
--- a/server/analysis/interactional.py
+++ b/server/analysis/interactional.py
@@ -1,8 +1,6 @@
 import pandas as pd
 import re
 from collections import Counter
 class InteractionAnalysis:
    def __init__(self, word_exclusions: set[str]):
@@ -12,118 +10,6 @@ class InteractionAnalysis:
        tokens = re.findall(r"\b[a-z]{3,}\b", text)
        return [t for t in tokens if t not in self.word_exclusions]
    def _vocab_richness_per_user(
        self, df: pd.DataFrame, min_words: int = 20, top_most_used_words: int = 100
    ) -> list:
        df = df.copy()
        df["content"] = df["content"].fillna("").astype(str).str.lower()
        df["tokens"] = df["content"].apply(self._tokenize)
        rows = []
        for author, group in df.groupby("author"):
            all_tokens = [t for tokens in group["tokens"] for t in tokens]
            total_words = len(all_tokens)
            unique_words = len(set(all_tokens))
            events = len(group)
            # Min amount of words for a user, any less than this might give weird results
            if total_words < min_words:
                continue
            # 100% = they never reused a word (excluding stop words)
            vocab_richness = unique_words / total_words
            avg_words = total_words / max(events, 1)
            counts = Counter(all_tokens)
            top_words = [
                {"word": w, "count": int(c)}
                for w, c in counts.most_common(top_most_used_words)
            ]
            rows.append(
                {
                    "author": author,
                    "events": int(events),
                    "total_words": int(total_words),
                    "unique_words": int(unique_words),
                    "vocab_richness": round(vocab_richness, 3),
                    "avg_words_per_event": round(avg_words, 2),
                    "top_words": top_words,
                }
            )
        rows = sorted(rows, key=lambda x: x["vocab_richness"], reverse=True)
        return rows
    def top_users(self, df: pd.DataFrame) -> list:
        counts = df.groupby(["author", "source"]).size().sort_values(ascending=False)
        top_users = [
            {"author": author, "source": source, "count": int(count)}
            for (author, source), count in counts.items()
        ]
        return top_users
    def per_user_analysis(self, df: pd.DataFrame) -> dict:
        per_user = df.groupby(["author", "type"]).size().unstack(fill_value=0)
        emotion_cols = [col for col in df.columns if col.startswith("emotion_")]
        avg_emotions_by_author = {}
        if emotion_cols:
            avg_emotions = df.groupby("author")[emotion_cols].mean().fillna(0.0)
            avg_emotions_by_author = {
                author: {emotion: float(score) for emotion, score in row.items()}
                for author, row in avg_emotions.iterrows()
            }
        # ensure columns always exist
        for col in ("post", "comment"):
            if col not in per_user.columns:
                per_user[col] = 0
        per_user["comment_post_ratio"] = per_user["comment"] / per_user["post"].replace(
            0, 1
        )
        per_user["comment_share"] = per_user["comment"] / (
            per_user["post"] + per_user["comment"]
        ).replace(0, 1)
        per_user = per_user.sort_values("comment_post_ratio", ascending=True)
        per_user_records = per_user.reset_index().to_dict(orient="records")
        vocab_rows = self._vocab_richness_per_user(df)
        vocab_by_author = {row["author"]: row for row in vocab_rows}
        # merge vocab richness + per_user information
        merged_users = []
        for row in per_user_records:
            author = row["author"]
            merged_users.append(
                {
                    "author": author,
                    "post": int(row.get("post", 0)),
                    "comment": int(row.get("comment", 0)),
                    "comment_post_ratio": float(row.get("comment_post_ratio", 0)),
                    "comment_share": float(row.get("comment_share", 0)),
                    "avg_emotions": avg_emotions_by_author.get(author, {}),
                    "vocab": vocab_by_author.get(
                        author,
                        {
                            "vocab_richness": 0,
                            "avg_words_per_event": 0,
                            "top_words": [],
                        },
                    ),
                }
            )
        merged_users.sort(key=lambda u: u["comment_post_ratio"])
        return merged_users
    def interaction_graph(self, df: pd.DataFrame):
        interactions = {a: {} for a in df["author"].dropna().unique()}
@@ -145,89 +31,40 @@ class InteractionAnalysis:
        return interactions
-    def average_thread_depth(self, df: pd.DataFrame):
+    def top_interaction_pairs(self, df: pd.DataFrame, top_n=10):
-        depths = []
+        graph = self.interaction_graph(df)
-        id_to_reply = df.set_index("id")["reply_to"].to_dict()
+        pairs = []
        for _, row in df.iterrows():
            depth = 0
            current_id = row["id"]
-            while True:
+        for a, targets in graph.items():
-                reply_to = id_to_reply.get(current_id)
+            for b, count in targets.items():
-                if pd.isna(reply_to) or reply_to == "":
+                pairs.append(((a, b), count))
                    break
-                depth += 1
+        pairs.sort(key=lambda x: x[1], reverse=True)
-                current_id = reply_to
+        return pairs[:top_n]
-            depths.append(depth)
+    def conversation_concentration(self, df: pd.DataFrame) -> dict:
        if "type" not in df.columns:
            return {}
-        if not depths:
+        comments = df[df["type"] == "comment"]
-            return 0
+        if comments.empty:
            return {}
-        return round(sum(depths) / len(depths), 2)
+        author_counts = comments["author"].value_counts()
        total_comments = len(comments)
        total_authors = len(author_counts)
-    def average_thread_length_by_emotion(self, df: pd.DataFrame):
+        top_10_pct_n = max(1, int(total_authors * 0.1))
-        emotion_exclusions = {"emotion_neutral", "emotion_surprise"}
+        top_10_pct_share = round(
-
+            author_counts.head(top_10_pct_n).sum() / total_comments, 4
-        emotion_cols = [
+        )
            c
            for c in df.columns
            if c.startswith("emotion_") and c not in emotion_exclusions
        ]
        id_to_reply = df.set_index("id")["reply_to"].to_dict()
        length_cache = {}
        def thread_length_from(start_id):
            if start_id in length_cache:
                return length_cache[start_id]
            seen = set()
            length = 1
            current = start_id
            while True:
                if current in seen:
                    # infinite loop shouldn't happen, but just in case
                    break
                seen.add(current)
                reply_to = id_to_reply.get(current)
                if (
                    reply_to is None
                    or (isinstance(reply_to, float) and pd.isna(reply_to))
                    or reply_to == ""
                ):
                    break
                length += 1
                current = reply_to
                if current in length_cache:
                    length += length_cache[current] - 1
                    break
            length_cache[start_id] = length
            return length
        emotion_to_lengths = {}
        # Fill NaNs in emotion cols to avoid max() issues
        emo_df = df[["id"] + emotion_cols].copy()
        emo_df[emotion_cols] = emo_df[emotion_cols].fillna(0)
        for _, row in emo_df.iterrows():
            msg_id = row["id"]
            length = thread_length_from(msg_id)
            emotions = {c: row[c] for c in emotion_cols}
            dominant = max(emotions, key=emotions.get)
            emotion_to_lengths.setdefault(dominant, []).append(length)
        return {
-            emotion: round(sum(lengths) / len(lengths), 2)
+            "total_commenting_authors": total_authors,
-            for emotion, lengths in emotion_to_lengths.items()
+            "top_10pct_author_count": top_10_pct_n,
            "top_10pct_comment_share": float(top_10_pct_share),
            "single_comment_authors": int((author_counts == 1).sum()),
            "single_comment_author_ratio": float(
                round((author_counts == 1).sum() / total_authors, 4)
            ),
        }
--- a/server/analysis/linguistic.py
+++ b/server/analysis/linguistic.py
@@ -1,17 +1,30 @@
 import pandas as pd
 import re
 from collections import Counter
-from itertools import islice
+from dataclasses import dataclass
 import pandas as pd
@dataclass(frozen=True)
 class NGramConfig:
    min_token_length: int = 3
    min_count: int = 2
    max_results: int = 100
 class LinguisticAnalysis:
    def __init__(self, word_exclusions: set[str]):
        self.word_exclusions = word_exclusions
        self.ngram_config = NGramConfig()
-    def _tokenize(self, text: str):
+    def _tokenize(self, text: str, *, include_exclusions: bool = False) -> list[str]:
-        tokens = re.findall(r"\b[a-z]{3,}\b", text)
+        pattern = rf"\b[a-z]{{{self.ngram_config.min_token_length},}}\b"
-        return [t for t in tokens if t not in self.word_exclusions]
+        tokens = re.findall(pattern, text)
        if include_exclusions:
            return tokens
        return [token for token in tokens if token not in self.word_exclusions]
    def _clean_text(self, text: str) -> str:
        text = re.sub(r"http\S+", "", text)  # remove URLs
@@ -21,13 +34,24 @@ class LinguisticAnalysis:
        text = re.sub(r"\S+\.(jpg|jpeg|png|webp|gif)", "", text)
        return text
    def _content_texts(self, df: pd.DataFrame) -> pd.Series:
        return df["content"].dropna().astype(str).apply(self._clean_text).str.lower()
    def _valid_ngram(self, tokens: tuple[str, ...]) -> bool:
        if any(token in self.word_exclusions for token in tokens):
            return False
        if len(set(tokens)) == 1:
            return False
        return True
    def word_frequencies(self, df: pd.DataFrame, limit: int = 100) -> list[dict]:
-        texts = df["content"].dropna().astype(str).str.lower()
+        texts = self._content_texts(df)
        words = []
        for text in texts:
-            tokens = re.findall(r"\b[a-z]{3,}\b", text)
+            words.extend(self._tokenize(text))
            words.extend(w for w in tokens if w not in self.word_exclusions)
        counts = Counter(words)
@@ -40,24 +64,57 @@ class LinguisticAnalysis:
        return word_frequencies.to_dict(orient="records")
-    def ngrams(self, df: pd.DataFrame, n=2, limit=100):
+    def ngrams(self, df: pd.DataFrame, n: int = 2, limit: int | None = None) -> list[dict]:
-        texts = df["content"].dropna().astype(str).apply(self._clean_text).str.lower()
+        if n < 2:
            raise ValueError("n must be at least 2")
        texts = self._content_texts(df)
        all_ngrams = []
        result_limit = limit or self.ngram_config.max_results
        for text in texts:
-            tokens = re.findall(r"\b[a-z]{3,}\b", text)
+            tokens = self._tokenize(text, include_exclusions=True)
-            # stop word removal causes strange behaviors in ngrams
+            if len(tokens) < n:
-            # tokens = [w for w in tokens if w not in self.word_exclusions]
+                continue
-            ngrams = zip(*(islice(tokens, i, None) for i in range(n)))
+            for index in range(len(tokens) - n + 1):
-            all_ngrams.extend([" ".join(ng) for ng in ngrams])
+                ngram_tokens = tuple(tokens[index : index + n])
                if self._valid_ngram(ngram_tokens):
                    all_ngrams.append(" ".join(ngram_tokens))
        counts = Counter(all_ngrams)
        filtered_counts = [
            (ngram, count)
            for ngram, count in counts.items()
            if count >= self.ngram_config.min_count
        ]
        if not filtered_counts:
            return []
        return (
-            pd.DataFrame(counts.items(), columns=["ngram", "count"])
+            pd.DataFrame(filtered_counts, columns=["ngram", "count"])
-            .sort_values("count", ascending=False)
+            .sort_values(["count", "ngram"], ascending=[False, True])
-            .head(limit)
+            .head(result_limit)
            .to_dict(orient="records")
        )
    def lexical_diversity(self, df: pd.DataFrame) -> dict:
        tokens = (
            df["content"]
            .fillna("")
            .astype(str)
            .str.lower()
            .str.findall(r"\b[a-z]{2,}\b")
            .explode()
        )
        tokens = tokens[~tokens.isin(self.word_exclusions)]
        total = max(len(tokens), 1)
        unique = int(tokens.nunique())
        return {
            "total_tokens": total,
            "unique_tokens": unique,
            "ttr": round(unique / total, 4),
        }
--- a/server/analysis/nlp.py
+++ b/server/analysis/nlp.py
@@ -6,6 +6,7 @@ from typing import Any
 from transformers import pipeline
 from sentence_transformers import SentenceTransformer
 class NLP:
    _topic_models: dict[str, SentenceTransformer] = {}
    _emotion_classifiers: dict[str, Any] = {}
@@ -207,8 +208,7 @@ class NLP:
        self.df.drop(columns=existing_drop, inplace=True)
        remaining_emotion_cols = [
-            c for c in self.df.columns
+            c for c in self.df.columns if c.startswith("emotion_")
            if c.startswith("emotion_")
        ]
        if remaining_emotion_cols:
@@ -227,8 +227,6 @@ class NLP:
            self.df[remaining_emotion_cols] = normalized.values
    def add_topic_col(self, confidence_threshold: float = 0.3) -> None:
        titles = self.df[self.title_col].fillna("").astype(str)
        contents = self.df[self.content_col].fillna("").astype(str)
@@ -302,8 +300,4 @@ class NLP:
        for label in all_labels:
            col_name = f"entity_{label}"
-            self.df[col_name] = [
+            self.df[col_name] = [d.get(label, 0) for d in entity_count_dicts]
                d.get(label, 0) for d in entity_count_dicts
            ]
--- a/server/analysis/stat_gen.py
+++ b/server/analysis/stat_gen.py
@@ -1,4 +1,5 @@
 import nltk
 import json
 import pandas as pd
 from nltk.corpus import stopwords
@@ -6,7 +7,9 @@ from server.analysis.cultural import CulturalAnalysis
 from server.analysis.emotional import EmotionalAnalysis
 from server.analysis.interactional import InteractionAnalysis
 from server.analysis.linguistic import LinguisticAnalysis
 from server.analysis.summary import SummaryAnalysis
 from server.analysis.temporal import TemporalAnalysis
 from server.analysis.user import UserAnalysis
 DOMAIN_STOPWORDS = {
    "www",
@@ -25,6 +28,8 @@ DOMAIN_STOPWORDS = {
    "one",
 }
 EXCLUDED_AUTHORS = {"[deleted]", "automoderator"}
 nltk.download("stopwords")
 EXCLUDE_WORDS = set(stopwords.words("english")) | DOMAIN_STOPWORDS
@@ -36,25 +41,29 @@ class StatGen:
        self.interaction_analysis = InteractionAnalysis(EXCLUDE_WORDS)
        self.linguistic_analysis = LinguisticAnalysis(EXCLUDE_WORDS)
        self.cultural_analysis = CulturalAnalysis()
        self.summary_analysis = SummaryAnalysis()
        self.user_analysis = UserAnalysis(EXCLUDE_WORDS)
    ## Private Methods
-    def _prepare_filtered_df(self, 
+    def _prepare_filtered_df(self, df: pd.DataFrame, filters: dict | None = None) -> pd.DataFrame:
                             df: pd.DataFrame, 
                             filters: dict | None = None
                             ) -> pd.DataFrame:
        filters = filters or {}
        filtered_df = df.copy()
        if "author" in filtered_df.columns:
            normalized_authors = (
                filtered_df["author"].fillna("").astype(str).str.strip().str.lower()
            )
            filtered_df = filtered_df[~normalized_authors.isin(EXCLUDED_AUTHORS)]
        search_query = filters.get("search_query", None)
        start_date_filter = filters.get("start_date", None)
        end_date_filter = filters.get("end_date", None)
        data_source_filter = filters.get("data_sources", None)
        if search_query:
-            mask = (
+            mask = filtered_df["content"].str.contains(
-                filtered_df["content"].str.contains(search_query, case=False, na=False)
+                search_query, case=False, na=False
-                | filtered_df["author"].str.contains(search_query, case=False, na=False)
+            ) | filtered_df["author"].str.contains(search_query, case=False, na=False)
            )
            # Only include title if the column exists
            if "title" in filtered_df.columns:
@@ -75,11 +84,22 @@ class StatGen:
        return filtered_df
-    ## Public Methods
+    def _json_ready_records(self, df: pd.DataFrame) -> list[dict]:
-    def filter_dataset(self, df: pd.DataFrame, filters: dict | None = None) -> dict:
+        return json.loads(
-        return self._prepare_filtered_df(df, filters).to_dict(orient="records")
+            df.to_json(orient="records", date_format="iso", date_unit="s")
        )
-    def get_time_analysis(self, df: pd.DataFrame, filters: dict | None = None) -> dict:
+    ## Public Methods
    def filter_dataset(self, df: pd.DataFrame, filters: dict | None = None) -> list[dict]:
        filtered_df = self._prepare_filtered_df(df, filters)
        return self._json_ready_records(filtered_df)
    def temporal(
        self,
        df: pd.DataFrame,
        filters: dict | None = None,
        dataset_id: int | None = None,
    ) -> dict:
        filtered_df = self._prepare_filtered_df(df, filters)
        return {
@@ -87,84 +107,83 @@ class StatGen:
            "weekday_hour_heatmap": self.temporal_analysis.heatmap(filtered_df),
        }
-    def get_content_analysis(self, df: pd.DataFrame, filters: dict | None = None) -> dict:
+    def linguistic(
        self,
        df: pd.DataFrame,
        filters: dict | None = None,
        dataset_id: int | None = None,
    ) -> dict:
        filtered_df = self._prepare_filtered_df(df, filters)
        return {
            "word_frequencies": self.linguistic_analysis.word_frequencies(filtered_df),
            "common_two_phrases": self.linguistic_analysis.ngrams(filtered_df),
            "common_three_phrases": self.linguistic_analysis.ngrams(filtered_df, n=3),
-            "average_emotion_by_topic": self.emotional_analysis.avg_emotion_by_topic(
+            "lexical_diversity": self.linguistic_analysis.lexical_diversity(filtered_df)
                filtered_df
            )
        }
-    def get_user_analysis(self, df: pd.DataFrame, filters: dict | None = None) -> dict:
+    def emotional(
        self,
        df: pd.DataFrame,
        filters: dict | None = None,
        dataset_id: int | None = None,
    ) -> dict:
        filtered_df = self._prepare_filtered_df(df, filters)
        return {
-            "top_users": self.interaction_analysis.top_users(filtered_df),
+            "average_emotion_by_topic": self.emotional_analysis.avg_emotion_by_topic(filtered_df),
-            "users": self.interaction_analysis.per_user_analysis(filtered_df),
+            "overall_emotion_average": self.emotional_analysis.overall_emotion_average(filtered_df),
-            "interaction_graph": self.interaction_analysis.interaction_graph(filtered_df)
+            "dominant_emotion_distribution": self.emotional_analysis.dominant_emotion_distribution(filtered_df),
            "emotion_by_source": self.emotional_analysis.emotion_by_source(filtered_df)
        }
-    def get_interactional_analysis(self, df: pd.DataFrame, filters: dict | None = None) -> dict:
+    def user(
        self,
        df: pd.DataFrame,
        filters: dict | None = None,
        dataset_id: int | None = None,
    ) -> dict:
        filtered_df = self._prepare_filtered_df(df, filters)
        return {
-            "average_thread_depth": self.interaction_analysis.average_thread_depth(
+            "top_users": self.user_analysis.top_users(filtered_df),
-                filtered_df
+            "users": self.user_analysis.per_user_analysis(filtered_df)
            ),
            "average_thread_length_by_emotion": self.interaction_analysis.average_thread_length_by_emotion(
                filtered_df
            ),
        }
-    def get_cultural_analysis(self, df: pd.DataFrame, filters: dict | None = None) -> dict:
+    def interactional(
        self,
        df: pd.DataFrame,
        filters: dict | None = None,
        dataset_id: int | None = None,
    ) -> dict:
        filtered_df = self._prepare_filtered_df(df, filters)
        return {
-            "identity_markers": self.cultural_analysis.get_identity_markers(
+            "top_interaction_pairs": self.interaction_analysis.top_interaction_pairs(filtered_df, top_n=100),
-                filtered_df
+            "interaction_graph": self.interaction_analysis.interaction_graph(filtered_df),
-            ),
+            "conversation_concentration": self.interaction_analysis.conversation_concentration(filtered_df)
        }
    def cultural(
        self,
        df: pd.DataFrame,
        filters: dict | None = None,
        dataset_id: int | None = None,
    ) -> dict:
        filtered_df = self._prepare_filtered_df(df, filters)
        return {
            "identity_markers": self.cultural_analysis.get_identity_markers(filtered_df),
            "stance_markers": self.cultural_analysis.get_stance_markers(filtered_df),
-            "entity_salience": self.cultural_analysis.get_avg_emotions_per_entity(
+            "avg_emotion_per_entity": self.cultural_analysis.get_avg_emotions_per_entity(filtered_df)
                filtered_df
            ),
        }
-    def summary(self, df: pd.DataFrame, filters: dict | None = None) -> dict:
+    def summary(
        self,
        df: pd.DataFrame,
        filters: dict | None = None,
        dataset_id: int | None = None,
    ) -> dict:
        filtered_df = self._prepare_filtered_df(df, filters)
-        total_posts = (filtered_df["type"] == "post").sum()
+        return self.summary_analysis.summary(filtered_df)
        total_comments = (filtered_df["type"] == "comment").sum()
        events_per_user = filtered_df.groupby("author").size()
        if filtered_df.empty:
            return {
                "total_events": 0,
                "total_posts": 0,
                "total_comments": 0,
                "unique_users": 0,
                "comments_per_post": 0,
                "lurker_ratio": 0,
                "time_range": {
                    "start": None,
                    "end": None,
                },
                "sources": [],
            }
        return {
            "total_events": int(len(filtered_df)),
            "total_posts": int(total_posts),
            "total_comments": int(total_comments),
            "unique_users": int(events_per_user.count()),
            "comments_per_post": round(total_comments / max(total_posts, 1), 2),
            "lurker_ratio": round((events_per_user == 1).mean(), 2),
            "time_range": {
                "start": int(filtered_df["dt"].min().timestamp()),
                "end": int(filtered_df["dt"].max().timestamp()),
            },
            "sources": filtered_df["source"].dropna().unique().tolist(),
        }
--- a/server/analysis/summary.py
+++ b/server/analysis/summary.py
@@ -0,0 +1,64 @@
 import pandas as pd
 class SummaryAnalysis:
    def total_events(self, df: pd.DataFrame) -> int:
        return int(len(df))
    def total_posts(self, df: pd.DataFrame) -> int:
        return int(len(df[df["type"] == "post"]))
    def total_comments(self, df: pd.DataFrame) -> int:
        return int(len(df[df["type"] == "comment"]))
    def unique_users(self, df: pd.DataFrame) -> int:
        return int(len(df["author"].dropna().unique()))
    def comments_per_post(self, total_comments: int, total_posts: int) -> float:
        return round(total_comments / max(total_posts, 1), 2)
    def lurker_ratio(self, df: pd.DataFrame) -> float:
        events_per_user = df.groupby("author").size()
        return round((events_per_user == 1).mean(), 2)
    def time_range(self, df: pd.DataFrame) -> dict:
        return {
            "start": int(df["dt"].min().timestamp()),
            "end": int(df["dt"].max().timestamp()),
        }
    def sources(self, df: pd.DataFrame) -> list:
        return df["source"].dropna().unique().tolist()
    def empty_summary(self) -> dict:
        return {
            "total_events": 0,
            "total_posts": 0,
            "total_comments": 0,
            "unique_users": 0,
            "comments_per_post": 0,
            "lurker_ratio": 0,
            "time_range": {
                "start": None,
                "end": None,
            },
            "sources": [],
        }
    def summary(self, df: pd.DataFrame) -> dict:
        if df.empty:
            return self.empty_summary()
        total_posts = self.total_posts(df)
        total_comments = self.total_comments(df)
        return {
            "total_events": self.total_events(df),
            "total_posts": total_posts,
            "total_comments": total_comments,
            "unique_users": self.unique_users(df),
            "comments_per_post": self.comments_per_post(total_comments, total_posts),
            "lurker_ratio": self.lurker_ratio(df),
            "time_range": self.time_range(df),
            "sources": self.sources(df),
        }
--- a/server/analysis/user.py
+++ b/server/analysis/user.py
@@ -0,0 +1,152 @@
 import pandas as pd
 import re
 from collections import Counter
 class UserAnalysis:
    def __init__(self, word_exclusions: set[str]):
        self.word_exclusions = word_exclusions
    def _tokenize(self, text: str):
        tokens = re.findall(r"\b[a-z]{3,}\b", text)
        return [t for t in tokens if t not in self.word_exclusions]
    def _vocab_richness_per_user(
        self, df: pd.DataFrame, min_words: int = 20, top_most_used_words: int = 100
    ) -> list:
        df = df.copy()
        df["content"] = df["content"].fillna("").astype(str).str.lower()
        df["tokens"] = df["content"].apply(self._tokenize)
        rows = []
        for author, group in df.groupby("author"):
            all_tokens = [t for tokens in group["tokens"] for t in tokens]
            total_words = len(all_tokens)
            unique_words = len(set(all_tokens))
            events = len(group)
            # Min amount of words for a user, any less than this might give weird results
            if total_words < min_words:
                continue
            # 100% = they never reused a word (excluding stop words)
            vocab_richness = unique_words / total_words
            avg_words = total_words / max(events, 1)
            counts = Counter(all_tokens)
            top_words = [
                {"word": w, "count": int(c)}
                for w, c in counts.most_common(top_most_used_words)
            ]
            rows.append(
                {
                    "author": author,
                    "events": int(events),
                    "total_words": int(total_words),
                    "unique_words": int(unique_words),
                    "vocab_richness": round(vocab_richness, 3),
                    "avg_words_per_event": round(avg_words, 2),
                    "top_words": top_words,
                }
            )
        rows = sorted(rows, key=lambda x: x["vocab_richness"], reverse=True)
        return rows
    def top_users(self, df: pd.DataFrame) -> list:
        counts = df.groupby(["author", "source"]).size().sort_values(ascending=False)
        top_users = [
            {"author": author, "source": source, "count": int(count)}
            for (author, source), count in counts.items()
        ]
        return top_users
    def per_user_analysis(self, df: pd.DataFrame) -> dict:
        per_user = df.groupby(["author", "type"]).size().unstack(fill_value=0)
        emotion_cols = [col for col in df.columns if col.startswith("emotion_")]
        dominant_topic_by_author = {}
        avg_emotions_by_author = {}
        if emotion_cols:
            avg_emotions = df.groupby("author")[emotion_cols].mean().fillna(0.0)
            avg_emotions_by_author = {
                author: {emotion: float(score) for emotion, score in row.items()}
                for author, row in avg_emotions.iterrows()
            }
        if "topic" in df.columns:
            topic_df = df[
                df["topic"].notna()
                & (df["topic"] != "")
                & (df["topic"] != "Misc")
            ]
            if not topic_df.empty:
                topic_counts = (
                    topic_df.groupby(["author", "topic"])
                    .size()
                    .reset_index(name="count")
                    .sort_values(
                        ["author", "count", "topic"],
                        ascending=[True, False, True],
                    )
                    .drop_duplicates(subset=["author"])
                )
                dominant_topic_by_author = {
                    row["author"]: {
                        "topic": row["topic"],
                        "count": int(row["count"]),
                    }
                    for _, row in topic_counts.iterrows()
                }
        # ensure columns always exist
        for col in ("post", "comment"):
            if col not in per_user.columns:
                per_user[col] = 0
        per_user["comment_post_ratio"] = per_user["comment"] / per_user["post"].replace(
            0, 1
        )
        per_user["comment_share"] = per_user["comment"] / (
            per_user["post"] + per_user["comment"]
        ).replace(0, 1)
        per_user = per_user.sort_values("comment_post_ratio", ascending=True)
        per_user_records = per_user.reset_index().to_dict(orient="records")
        vocab_rows = self._vocab_richness_per_user(df)
        vocab_by_author = {row["author"]: row for row in vocab_rows}
        # merge vocab richness + per_user information
        merged_users = []
        for row in per_user_records:
            author = row["author"]
            merged_users.append(
                {
                    "author": author,
                    "post": int(row.get("post", 0)),
                    "comment": int(row.get("comment", 0)),
                    "comment_post_ratio": float(row.get("comment_post_ratio", 0)),
                    "comment_share": float(row.get("comment_share", 0)),
                    "avg_emotions": avg_emotions_by_author.get(author, {}),
                    "dominant_topic": dominant_topic_by_author.get(author),
                    "vocab": vocab_by_author.get(
                        author,
                        {
                            "vocab_richness": 0,
                            "avg_words_per_event": 0,
                            "top_words": [],
                        },
                    ),
                }
            )
        merged_users.sort(key=lambda u: u["comment_post_ratio"])
        return merged_users
--- a/server/app.py
+++ b/server/app.py
@@ -19,15 +19,17 @@ from server.exceptions import NotAuthorisedException, NonExistentDatasetExceptio
 from server.db.database import PostgresConnector
 from server.core.auth import AuthManager
 from server.core.datasets import DatasetManager
-from server.utils import get_request_filters
+from server.utils import get_request_filters, get_env
-from server.queue.tasks import process_dataset
+from server.queue.tasks import process_dataset, fetch_and_process_dataset
 from server.connectors.registry import get_available_connectors, get_connector_metadata
 app = Flask(__name__)
 # Env Variables
 load_dotenv()
-frontend_url = os.getenv("FRONTEND_URL", "http://localhost:5173")
+max_fetch_limit = int(get_env("MAX_FETCH_LIMIT"))
-jwt_secret_key = os.getenv("JWT_SECRET_KEY", "super-secret-change-this")
+frontend_url = get_env("FRONTEND_URL")
 jwt_secret_key = get_env("JWT_SECRET_KEY")
 jwt_access_token_expires = int(
    os.getenv("JWT_ACCESS_TOKEN_EXPIRES", 1200)
 )  # Default to 20 minutes
@@ -37,13 +39,41 @@ CORS(app, resources={r"/*": {"origins": frontend_url}})
 app.config["JWT_SECRET_KEY"] = jwt_secret_key
 app.config["JWT_ACCESS_TOKEN_EXPIRES"] = jwt_access_token_expires
 # Security
 bcrypt = Bcrypt(app)
 jwt = JWTManager(app)
 # Helper Objects
 db = PostgresConnector()
 auth_manager = AuthManager(db, bcrypt)
 dataset_manager = DatasetManager(db)
 stat_gen = StatGen()
 connectors = get_available_connectors()
 # Default Files
 with open("server/topics.json") as f:
    default_topic_list = json.load(f)
 def normalize_topics(topics):
    if not isinstance(topics, dict) or len(topics) == 0:
        return None
    normalized = {}
    for topic_name, topic_keywords in topics.items():
        if not isinstance(topic_name, str) or not isinstance(topic_keywords, str):
            return None
        clean_name = topic_name.strip()
        clean_keywords = topic_keywords.strip()
        if not clean_name or not clean_keywords:
            return None
        normalized[clean_name] = clean_keywords
    return normalized
@app.route("/register", methods=["POST"])
@@ -68,7 +98,7 @@ def register_user():
        return jsonify({"error": str(e)}), 400
    except Exception as e:
        print(traceback.format_exc())
-        return jsonify({"error": f"An unexpected error occurred: {str(e)}"}), 500
+        return jsonify({"error": f"An unexpected error occurred"}), 500
    print(f"Registered new user: {username}")
    return jsonify({"message": f"User '{username}' registered successfully"}), 200
@@ -93,7 +123,7 @@ def login_user():
            return jsonify({"error": "Invalid username or password"}), 401
    except Exception as e:
        print(traceback.format_exc())
-        return jsonify({"error": f"An unexpected error occurred: {str(e)}"}), 500
+        return jsonify({"error": f"An unexpected error occurred"}), 500
@app.route("/profile", methods=["GET"])
@@ -101,9 +131,13 @@ def login_user():
 def profile():
    current_user = get_jwt_identity()
-    return jsonify(
+    return (
-        message="Access granted", user=auth_manager.get_user_by_id(current_user)
+        jsonify(
-    ), 200
+            message="Access granted", user=auth_manager.get_user_by_id(current_user)
        ),
        200,
    )
@app.route("/user/datasets")
@jwt_required()
@@ -111,7 +145,112 @@ def get_user_datasets():
    current_user = int(get_jwt_identity())
    return jsonify(dataset_manager.get_user_datasets(current_user)), 200
-@app.route("/upload", methods=["POST"])
+
@app.route("/datasets/sources", methods=["GET"])
 def get_dataset_sources():
    list_metadata = list(get_connector_metadata().values())
    return jsonify(list_metadata)
@app.route("/datasets/fetch", methods=["POST"])
@jwt_required()
 def fetch_data():
    data = request.get_json()
    connector_metadata = get_connector_metadata()
    # Strong validation needed, otherwise data goes to Celery and crashes silently
    if not data or "sources" not in data:
        return jsonify({"error": "Sources must be provided"}), 400
    if "name" not in data or not str(data["name"]).strip():
        return jsonify({"error": "Dataset name is required"}), 400
    dataset_name = data["name"].strip()
    user_id = int(get_jwt_identity())
    custom_topics = data.get("topics")
    topics_for_processing = default_topic_list
    source_configs = data["sources"]
    if not isinstance(source_configs, list) or len(source_configs) == 0:
        return jsonify({"error": "Sources must be a non-empty list"}), 400
    for source in source_configs:
        if not isinstance(source, dict):
            return jsonify({"error": "Each source must be an object"}), 400
        if "name" not in source:
            return jsonify({"error": "Each source must contain a name"}), 400
        name = source["name"]
        limit = source.get("limit", 1000)
        category = source.get("category")
        search = source.get("search")
        if limit:
            try:
                limit = int(limit)
            except (ValueError, TypeError):
                return jsonify({"error": "Limit must be an integer"}), 400
            if limit > 1000:
                limit = 1000
        if name not in connector_metadata:
            return jsonify({"error": "Source not supported"}), 400
        if search and not connector_metadata[name]["search_enabled"]:
            return jsonify({"error": f"Source {name} does not support search"}), 400
        if category and not connector_metadata[name]["categories_enabled"]:
            return jsonify({"error": f"Source {name} does not support categories"}), 400
        # if category and not connectors[name]().category_exists(category):
        #     return jsonify({"error": f"Category does not exist for {name}"}), 400
    if custom_topics is not None:
        normalized_topics = normalize_topics(custom_topics)
        if not normalized_topics:
            return (
                jsonify(
                    {
                        "error": "Topics must be a non-empty JSON object with non-empty string keys and values"
                    }
                ),
                400,
            )
        topics_for_processing = normalized_topics
    try:
        dataset_id = dataset_manager.save_dataset_info(
            user_id, dataset_name, topics_for_processing
        )
        dataset_manager.set_dataset_status(
            dataset_id,
            "fetching",
            f"Data is being fetched from {', '.join(source['name'] for source in source_configs)}",
        )
        fetch_and_process_dataset.delay(dataset_id, source_configs, topics_for_processing)
    except Exception:
        print(traceback.format_exc())
        return jsonify({"error": "Failed to queue dataset processing"}), 500
    return (
        jsonify(
            {
                "message": "Dataset queued for processing",
                "dataset_id": dataset_id,
                "status": "processing",
            }
        ),
        202,
    )
@app.route("/datasets/upload", methods=["POST"])
@jwt_required()
 def upload_data():
    if "posts" not in request.files or "topics" not in request.files:
@@ -130,30 +269,39 @@ def upload_data():
    if not post_file.filename.endswith(".jsonl") or not topic_file.filename.endswith(
        ".json"
    ):
-        return jsonify(
+        return (
-            {"error": "Invalid file type. Only .jsonl and .json files are allowed."}
+            jsonify(
-        ), 400
+                {"error": "Invalid file type. Only .jsonl and .json files are allowed."}
            ),
            400,
        )
    try:
        current_user = int(get_jwt_identity())
        posts_df = pd.read_json(post_file, lines=True, convert_dates=False)
        topics = json.load(topic_file)
-        dataset_id = dataset_manager.save_dataset_info(current_user, dataset_name, topics)
+        dataset_id = dataset_manager.save_dataset_info(
            current_user, dataset_name, topics
        )
        process_dataset.delay(dataset_id, posts_df.to_dict(orient="records"), topics)
-        return jsonify(
+        return (
-            {
+            jsonify(
-                "message": "Dataset queued for processing",
+                {
-                "dataset_id": dataset_id,
+                    "message": "Dataset queued for processing",
-                "status": "processing",
+                    "dataset_id": dataset_id,
-            }
+                    "status": "processing",
-        ), 202
+                }
            ),
            202,
        )
    except ValueError as e:
-        return jsonify({"error": f"Failed to read JSONL file: {str(e)}"}), 400
+        return jsonify({"error": f"Failed to read JSONL file"}), 400
    except Exception as e:
-        return jsonify({"error": f"An unexpected error occurred: {str(e)}"}), 500
+        return jsonify({"error": f"An unexpected error occurred"}), 500
@app.route("/dataset/<int:dataset_id>", methods=["GET"])
@jwt_required()
@@ -162,7 +310,9 @@ def get_dataset(dataset_id):
        user_id = int(get_jwt_identity())
        if not dataset_manager.authorize_user_dataset(dataset_id, user_id):
-            raise NotAuthorisedException("This user is not authorised to access this dataset")
+            raise NotAuthorisedException(
                "This user is not authorised to access this dataset"
            )
        dataset_info = dataset_manager.get_dataset_info(dataset_id)
        included_cols = {"id", "name", "created_at"}
@@ -176,6 +326,7 @@ def get_dataset(dataset_id):
        print(traceback.format_exc())
        return jsonify({"error": "An unexpected error occured"}), 500
@app.route("/dataset/<int:dataset_id>", methods=["PATCH"])
@jwt_required()
 def update_dataset(dataset_id):
@@ -183,7 +334,9 @@ def update_dataset(dataset_id):
        user_id = int(get_jwt_identity())
        if not dataset_manager.authorize_user_dataset(dataset_id, user_id):
-            raise NotAuthorisedException("This user is not authorised to access this dataset")
+            raise NotAuthorisedException(
                "This user is not authorised to access this dataset"
            )
        body = request.get_json()
        new_name = body.get("name")
@@ -192,7 +345,12 @@ def update_dataset(dataset_id):
            return jsonify({"error": "A valid name must be provided"}), 400
        dataset_manager.update_dataset_name(dataset_id, new_name.strip())
-        return jsonify({"message": f"Dataset {dataset_id} renamed to '{new_name.strip()}'"}), 200
+        return (
            jsonify(
                {"message": f"Dataset {dataset_id} renamed to '{new_name.strip()}'"}
            ),
            200,
        )
    except NotAuthorisedException:
        return jsonify({"error": "User is not authorised to access this content"}), 403
    except NonExistentDatasetException:
@@ -201,6 +359,7 @@ def update_dataset(dataset_id):
        print(traceback.format_exc())
        return jsonify({"error": "An unexpected error occurred"}), 500
@app.route("/dataset/<int:dataset_id>", methods=["DELETE"])
@jwt_required()
 def delete_dataset(dataset_id):
@@ -208,11 +367,20 @@ def delete_dataset(dataset_id):
        user_id = int(get_jwt_identity())
        if not dataset_manager.authorize_user_dataset(dataset_id, user_id):
-            raise NotAuthorisedException("This user is not authorised to access this dataset")
+            raise NotAuthorisedException(
                "This user is not authorised to access this dataset"
            )
        dataset_manager.delete_dataset_info(dataset_id)
        dataset_manager.delete_dataset_content(dataset_id)
-        return jsonify({"message": f"Dataset {dataset_id} metadata and content successfully deleted"}), 200
+        return (
            jsonify(
                {
                    "message": f"Dataset {dataset_id} metadata and content successfully deleted"
                }
            ),
            200,
        )
    except NotAuthorisedException:
        return jsonify({"error": "User is not authorised to access this content"}), 403
    except NonExistentDatasetException:
@@ -221,6 +389,7 @@ def delete_dataset(dataset_id):
        print(traceback.format_exc())
        return jsonify({"error": "An unexpected error occured"}), 500
@app.route("/dataset/<int:dataset_id>/status", methods=["GET"])
@jwt_required()
 def get_dataset_status(dataset_id):
@@ -228,7 +397,9 @@ def get_dataset_status(dataset_id):
        user_id = int(get_jwt_identity())
        if not dataset_manager.authorize_user_dataset(dataset_id, user_id):
-            raise NotAuthorisedException("This user is not authorised to access this dataset")
+            raise NotAuthorisedException(
                "This user is not authorised to access this dataset"
            )
        dataset_status = dataset_manager.get_dataset_status(dataset_id)
        return jsonify(dataset_status), 200
@@ -240,26 +411,53 @@ def get_dataset_status(dataset_id):
        print(traceback.format_exc())
        return jsonify({"error": "An unexpected error occured"}), 500
-@app.route("/dataset/<int:dataset_id>/content", methods=["GET"])
+
@app.route("/dataset/<int:dataset_id>/linguistic", methods=["GET"])
@jwt_required()
-def content_endpoint(dataset_id):
+def get_linguistic_analysis(dataset_id):
    try:
        user_id = int(get_jwt_identity())
        if not dataset_manager.authorize_user_dataset(dataset_id, user_id):
-            raise NotAuthorisedException("This user is not authorised to access this dataset")
+            raise NotAuthorisedException(
                "This user is not authorised to access this dataset"
            )
        dataset_content = dataset_manager.get_dataset_content(dataset_id)
        filters = get_request_filters()
-        return jsonify(stat_gen.get_content_analysis(dataset_content, filters)), 200
+        return jsonify(stat_gen.linguistic(dataset_content, filters, dataset_id=dataset_id)), 200
    except NotAuthorisedException:
        return jsonify({"error": "User is not authorised to access this content"}), 403
    except NonExistentDatasetException:
        return jsonify({"error": "Dataset does not exist"}), 404
    except ValueError as e:
-        return jsonify({"error": f"Malformed or missing data: {str(e)}"}), 400
+        return jsonify({"error": f"Malformed or missing data"}), 400
    except Exception as e:
        print(traceback.format_exc())
-        return jsonify({"error": f"An unexpected error occurred: {str(e)}"}), 500
+        return jsonify({"error": f"An unexpected error occurred"}), 500
@app.route("/dataset/<int:dataset_id>/emotional", methods=["GET"])
@jwt_required()
 def get_emotional_analysis(dataset_id):
    try:
        user_id = int(get_jwt_identity())
        if not dataset_manager.authorize_user_dataset(dataset_id, user_id):
            raise NotAuthorisedException(
                "This user is not authorised to access this dataset"
            )
        dataset_content = dataset_manager.get_dataset_content(dataset_id)
        filters = get_request_filters()
        return jsonify(stat_gen.emotional(dataset_content, filters, dataset_id=dataset_id)), 200
    except NotAuthorisedException:
        return jsonify({"error": "User is not authorised to access this content"}), 403
    except NonExistentDatasetException:
        return jsonify({"error": "Dataset does not exist"}), 404
    except ValueError as e:
        return jsonify({"error": f"Malformed or missing data"}), 400
    except Exception as e:
        print(traceback.format_exc())
        return jsonify({"error": f"An unexpected error occurred"}), 500
@app.route("/dataset/<int:dataset_id>/summary", methods=["GET"])
@@ -268,42 +466,46 @@ def get_summary(dataset_id):
    try:
        user_id = int(get_jwt_identity())
        if not dataset_manager.authorize_user_dataset(dataset_id, user_id):
-            raise NotAuthorisedException("This user is not authorised to access this dataset")
+            raise NotAuthorisedException(
                "This user is not authorised to access this dataset"
            )
        dataset_content = dataset_manager.get_dataset_content(dataset_id)
        filters = get_request_filters()
-        return jsonify(stat_gen.summary(dataset_content, filters)), 200
+        return jsonify(stat_gen.summary(dataset_content, filters, dataset_id=dataset_id)), 200
    except NotAuthorisedException:
        return jsonify({"error": "User is not authorised to access this content"}), 403
    except NonExistentDatasetException:
        return jsonify({"error": "Dataset does not exist"}), 404
    except ValueError as e:
-        return jsonify({"error": f"Malformed or missing data: {str(e)}"}), 400
+        return jsonify({"error": f"Malformed or missing data"}), 400
    except Exception as e:
        print(traceback.format_exc())
-        return jsonify({"error": f"An unexpected error occurred: {str(e)}"}), 500
+        return jsonify({"error": f"An unexpected error occurred"}), 500
-@app.route("/dataset/<int:dataset_id>/time", methods=["GET"])
+@app.route("/dataset/<int:dataset_id>/temporal", methods=["GET"])
@jwt_required()
-def get_time_analysis(dataset_id):
+def get_temporal_analysis(dataset_id):
    try:
        user_id = int(get_jwt_identity())
        if not dataset_manager.authorize_user_dataset(dataset_id, user_id):
-            raise NotAuthorisedException("This user is not authorised to access this dataset")
+            raise NotAuthorisedException(
                "This user is not authorised to access this dataset"
            )
        dataset_content = dataset_manager.get_dataset_content(dataset_id)
        filters = get_request_filters()
-        return jsonify(stat_gen.get_time_analysis(dataset_content, filters)), 200
+        return jsonify(stat_gen.temporal(dataset_content, filters, dataset_id=dataset_id)), 200
    except NotAuthorisedException:
        return jsonify({"error": "User is not authorised to access this content"}), 403
    except NonExistentDatasetException:
        return jsonify({"error": "Dataset does not exist"}), 404
    except ValueError as e:
-        return jsonify({"error": f"Malformed or missing data: {str(e)}"}), 400
+        return jsonify({"error": f"Malformed or missing data"}), 400
    except Exception as e:
        print(traceback.format_exc())
-        return jsonify({"error": f"An unexpected error occurred: {str(e)}"}), 500
+        return jsonify({"error": f"An unexpected error occurred"}), 500
@app.route("/dataset/<int:dataset_id>/user", methods=["GET"])
@@ -312,20 +514,22 @@ def get_user_analysis(dataset_id):
    try:
        user_id = int(get_jwt_identity())
        if not dataset_manager.authorize_user_dataset(dataset_id, user_id):
-            raise NotAuthorisedException("This user is not authorised to access this dataset")
+            raise NotAuthorisedException(
                "This user is not authorised to access this dataset"
            )
        dataset_content = dataset_manager.get_dataset_content(dataset_id)
        filters = get_request_filters()
-        return jsonify(stat_gen.get_user_analysis(dataset_content, filters)), 200
+        return jsonify(stat_gen.user(dataset_content, filters, dataset_id=dataset_id)), 200
    except NotAuthorisedException:
        return jsonify({"error": "User is not authorised to access this content"}), 403
    except NonExistentDatasetException:
        return jsonify({"error": "Dataset does not exist"}), 404
    except ValueError as e:
-        return jsonify({"error": f"Malformed or missing data: {str(e)}"}), 400
+        return jsonify({"error": f"Malformed or missing data"}), 400
    except Exception as e:
        print(traceback.format_exc())
-        return jsonify({"error": f"An unexpected error occurred: {str(e)}"}), 500
+        return jsonify({"error": f"An unexpected error occurred"}), 500
@app.route("/dataset/<int:dataset_id>/cultural", methods=["GET"])
@@ -334,42 +538,70 @@ def get_cultural_analysis(dataset_id):
    try:
        user_id = int(get_jwt_identity())
        if not dataset_manager.authorize_user_dataset(dataset_id, user_id):
-            raise NotAuthorisedException("This user is not authorised to access this dataset")
+            raise NotAuthorisedException(
                "This user is not authorised to access this dataset"
            )
        dataset_content = dataset_manager.get_dataset_content(dataset_id)
        filters = get_request_filters()
-        return jsonify(stat_gen.get_cultural_analysis(dataset_content, filters)), 200
+        return jsonify(stat_gen.cultural(dataset_content, filters, dataset_id=dataset_id)), 200
    except NotAuthorisedException:
        return jsonify({"error": "User is not authorised to access this content"}), 403
    except NonExistentDatasetException:
        return jsonify({"error": "Dataset does not exist"}), 404
    except ValueError as e:
-        return jsonify({"error": f"Malformed or missing data: {str(e)}"}), 400
+        return jsonify({"error": f"Malformed or missing data"}), 400
    except Exception as e:
        print(traceback.format_exc())
-        return jsonify({"error": f"An unexpected error occurred: {str(e)}"}), 500
+        return jsonify({"error": f"An unexpected error occurred"}), 500
-@app.route("/dataset/<int:dataset_id>/interaction", methods=["GET"])
+@app.route("/dataset/<int:dataset_id>/interactional", methods=["GET"])
@jwt_required()
 def get_interaction_analysis(dataset_id):
    try:
        user_id = int(get_jwt_identity())
        if not dataset_manager.authorize_user_dataset(dataset_id, user_id):
-            raise NotAuthorisedException("This user is not authorised to access this dataset")
+            raise NotAuthorisedException(
                "This user is not authorised to access this dataset"
            )
        dataset_content = dataset_manager.get_dataset_content(dataset_id)
        filters = get_request_filters()
-        return jsonify(stat_gen.get_interactional_analysis(dataset_content, filters)), 200
+        return jsonify(stat_gen.interactional(dataset_content, filters, dataset_id=dataset_id)), 200
    except NotAuthorisedException:
        return jsonify({"error": "User is not authorised to access this content"}), 403
    except NonExistentDatasetException:
        return jsonify({"error": "Dataset does not exist"}), 404
    except ValueError as e:
-        return jsonify({"error": f"Malformed or missing data: {str(e)}"}), 400
+        return jsonify({"error": f"Malformed or missing data"}), 400
    except Exception as e:
        print(traceback.format_exc())
-        return jsonify({"error": f"An unexpected error occurred: {str(e)}"}), 500
+        return jsonify({"error": f"An unexpected error occurred"}), 500
@app.route("/dataset/<int:dataset_id>/all", methods=["GET"])
@jwt_required()
 def get_full_dataset(dataset_id: int):
    try:
        user_id = int(get_jwt_identity())
        if not dataset_manager.authorize_user_dataset(dataset_id, user_id):
            raise NotAuthorisedException(
                "This user is not authorised to access this dataset"
            )
        dataset_content = dataset_manager.get_dataset_content(dataset_id)
        filters = get_request_filters()
        return jsonify(stat_gen.filter_dataset(dataset_content, filters)), 200
    except NotAuthorisedException:
        return jsonify({"error": "User is not authorised to access this content"}), 403
    except NonExistentDatasetException:
        return jsonify({"error": "Dataset does not exist"}), 404
    except ValueError as e:
        return jsonify({"error": f"Malformed or missing data"}), 400
    except Exception as e:
        print(traceback.format_exc())
        return jsonify({"error": f"An unexpected error occurred"}), 500
 if __name__ == "__main__":
--- a/server/connectors/base.py
+++ b/server/connectors/base.py
@@ -0,0 +1,24 @@
 from abc import ABC, abstractmethod
 from dto.post import Post
 import os
 class BaseConnector(ABC):
    source_name: str  # machine readable
    display_name: str  # human readablee
    required_env: list[str] = []  
    search_enabled: bool
    categories_enabled: bool
    @classmethod
    def is_available(cls) -> bool:
        return all(os.getenv(var) for var in cls.required_env)
    @abstractmethod
    def get_new_posts_by_search(
        self, search: str = None, category: str = None, post_limit: int = 10
    ) -> list[Post]: ...
    @abstractmethod
    def category_exists(self, category: str) -> bool: ...
--- a/server/connectors/boards_api.py
+++ b/server/connectors/boards_api.py
@@ -7,32 +7,68 @@ from dto.post import Post
 from dto.comment import Comment
 from bs4 import BeautifulSoup
 from concurrent.futures import ThreadPoolExecutor, as_completed
 from server.connectors.base import BaseConnector
 logger = logging.getLogger(__name__)
-HEADERS = {
+HEADERS = {"User-Agent": "Mozilla/5.0 (compatible; Digital-Ethnography-Aid/1.0)"}
-    "User-Agent": "Mozilla/5.0 (compatible; ForumScraper/1.0)"
+
-}
+class BoardsAPI(BaseConnector):
    source_name: str = "boards.ie"
    display_name: str = "Boards.ie"
    categories_enabled: bool = True
    search_enabled: bool = False
 class BoardsAPI:
    def __init__(self):
-        self.url = "https://www.boards.ie"
+        self.base_url = "https://www.boards.ie"
        self.source_name = "Boards.ie"
-    def get_new_category_posts(self, category: str, post_limit: int, comment_limit: int)  -> list[Post]:
+    def get_new_posts_by_search(
        self, search: str, category: str, post_limit: int
    ) -> list[Post]:
        if search:
            raise NotImplementedError("Search not compatible with boards.ie")
        if category:
            return self._get_posts(f"{self.base_url}/categories/{category}", post_limit)
        else:
            return self._get_posts(f"{self.base_url}/discussions", post_limit)
    def category_exists(self, category: str) -> bool:
        if not category:
            return False
        url = f"{self.base_url}/categories/{category}"
        try:
            response = requests.head(url, headers=HEADERS, allow_redirects=True)
            if response.status_code == 200:
                return True
            if response.status_code == 404:
                return False
            # fallback if HEAD not supported
            response = requests.get(url, headers=HEADERS)
            return response.status_code == 200
        except requests.RequestException as e:
            logger.error(f"Error checking category '{category}': {e}")
            return False
    ## Private
    def _get_posts(self, url, limit) -> list[Post]:
        urls = []
        current_page = 1
-        logger.info(f"Fetching posts from category: {category}")
+        while len(urls) < limit:
-
+            url = f"{url}/p{current_page}"
        while len(urls) < post_limit:
            url = f"{self.url}/categories/{category}/p{current_page}"
            html = self._fetch_page(url)
            soup = BeautifulSoup(html, "html.parser")
-            logger.debug(f"Processing page {current_page} for category {category}")
+            logger.debug(f"Processing page {current_page} for link: {url}")
            for a in soup.select("a.threadbit-threadlink"):
-                if len(urls) >= post_limit:
+                if len(urls) >= limit:
                    break
                href = a.get("href")
@@ -41,22 +77,24 @@ class BoardsAPI:
            current_page += 1
-        logger.debug(f"Fetched {len(urls)} post URLs from category {category}")
+        logger.debug(f"Fetched {len(urls)} post URLs")
        # Fetch post details for each URL and create Post objects
        posts = []
        def fetch_and_parse(post_url):
            html = self._fetch_page(post_url)
-            post = self._parse_thread(html, post_url, comment_limit)
+            post = self._parse_thread(html, post_url)
            return post
-        with ThreadPoolExecutor(max_workers=30) as executor:
+        with ThreadPoolExecutor(max_workers=5) as executor:
            futures = {executor.submit(fetch_and_parse, url): url for url in urls}
            for i, future in enumerate(as_completed(futures)):
                post_url = futures[future]
-                logger.debug(f"Fetching Post {i + 1} / {len(urls)} details from URL: {post_url}")
+                logger.debug(
                    f"Fetching Post {i + 1} / {len(urls)} details from URL: {post_url}"
                )
                try:
                    post = future.result()
                    posts.append(post)
@@ -65,13 +103,12 @@ class BoardsAPI:
        return posts
    def _fetch_page(self, url: str) -> str:
        response = requests.get(url, headers=HEADERS)
        response.raise_for_status()
        return response.text
-    def _parse_thread(self, html: str, post_url: str, comment_limit: int) -> Post:
+    def _parse_thread(self, html: str, post_url: str) -> Post:
        soup = BeautifulSoup(html, "html.parser")
        # Author
@@ -82,10 +119,16 @@ class BoardsAPI:
        timestamp_tag = soup.select_one(".postbit-header")
        timestamp = None
        if timestamp_tag:
-            match = re.search(r"\d{2}-\d{2}-\d{4}\s+\d{2}:\d{2}[AP]M", timestamp_tag.get_text())
+            match = re.search(
                r"\d{2}-\d{2}-\d{4}\s+\d{2}:\d{2}[AP]M", timestamp_tag.get_text()
            )
            timestamp = match.group(0) if match else None
            # convert to unix epoch
-            timestamp = datetime.datetime.strptime(timestamp, "%d-%m-%Y %I:%M%p").timestamp() if timestamp else None
+            timestamp = (
                datetime.datetime.strptime(timestamp, "%d-%m-%Y %I:%M%p").timestamp()
                if timestamp
                else None
            )
        # Post ID
        post_num = re.search(r"discussion/(\d+)", post_url)
@@ -93,14 +136,16 @@ class BoardsAPI:
        # Content
        content_tag = soup.select_one(".Message.userContent")
-        content = content_tag.get_text(separator="\n", strip=True) if content_tag else None
+        content = (
            content_tag.get_text(separator="\n", strip=True) if content_tag else None
        )
        # Title
        title_tag = soup.select_one(".PageTitle h1")
        title = title_tag.text.strip() if title_tag else None
        # Comments
-        comments = self._parse_comments(post_url, post_num, comment_limit)
+        comments = self._parse_comments(post_url, post_num)
        post = Post(
            id=post_num,
@@ -110,16 +155,16 @@ class BoardsAPI:
            url=post_url,
            timestamp=timestamp,
            source=self.source_name,
-            comments=comments
+            comments=comments,
        )
        return post
-    def _parse_comments(self, url: str, post_id: str, comment_limit: int) -> list[Comment]:
+    def _parse_comments(self, url: str, post_id: str) -> list[Comment]:
        comments = []
        current_url = url
-        while current_url and len(comments) < comment_limit:
+        while current_url:
            html = self._fetch_page(current_url)
            page_comments = self._parse_page_comments(html, post_id)
            comments.extend(page_comments)
@@ -128,9 +173,9 @@ class BoardsAPI:
            soup = BeautifulSoup(html, "html.parser")
            next_link = soup.find("a", class_="Next")
-            if next_link and next_link.get('href'):
+            if next_link and next_link.get("href"):
-                href = next_link.get('href')
+                href = next_link.get("href")
-                current_url = href if href.startswith('http') else self.url + href
+                current_url = href if href.startswith("http") else url + href
            else:
                current_url = None
@@ -146,21 +191,29 @@ class BoardsAPI:
            comment_id = tag.get("id")
            # Author
-            user_elem = tag.find('span', class_='userinfo-username-title')
+            user_elem = tag.find("span", class_="userinfo-username-title")
            username = user_elem.get_text(strip=True) if user_elem else None
            # Timestamp
-            date_elem = tag.find('span', class_='DateCreated')
+            date_elem = tag.find("span", class_="DateCreated")
            timestamp = date_elem.get_text(strip=True) if date_elem else None
-            timestamp = datetime.datetime.strptime(timestamp, "%d-%m-%Y %I:%M%p").timestamp() if timestamp else None
+            timestamp = (
                datetime.datetime.strptime(timestamp, "%d-%m-%Y %I:%M%p").timestamp()
                if timestamp
                else None
            )
            # Content
-            message_div = tag.find('div', class_='Message userContent')
+            message_div = tag.find("div", class_="Message userContent")
            if message_div.blockquote:
                message_div.blockquote.decompose()
-            content = message_div.get_text(separator="\n", strip=True) if message_div else None
+            content = (
                message_div.get_text(separator="\n", strip=True)
                if message_div
                else None
            )
            comment = Comment(
                id=comment_id,
@@ -169,10 +222,8 @@ class BoardsAPI:
                content=content,
                timestamp=timestamp,
                reply_to=None,
-                source=self.source_name
+                source=self.source_name,
            )
            comments.append(comment)
        return comments
--- a/server/connectors/reddit_api.py
+++ b/server/connectors/reddit_api.py
@@ -0,0 +1,259 @@
 import requests
 import logging
 import time
 import os
 from dotenv import load_dotenv
 from requests.auth import HTTPBasicAuth
 from dto.post import Post
 from dto.user import User
 from dto.comment import Comment
 from server.connectors.base import BaseConnector
 logger = logging.getLogger(__name__)
 CLIENT_ID = os.getenv("REDDIT_CLIENT_ID")
 CLIENT_SECRET = os.getenv("REDDIT_CLIENT_SECRET")
 class RedditAPI(BaseConnector):
    source_name: str = "reddit"
    display_name: str = "Reddit"
    search_enabled: bool = True
    categories_enabled: bool = True
    def __init__(self):
        self.url = "https://www.reddit.com/"
        self.token = None
        self.token_expiry = 0
    # Public Methods #
    def get_new_posts_by_search(
        self, search: str, category: str, post_limit: int
    ) -> list[Post]:
        prefix = f"r/{category}/" if category else ""
        params = {"limit": post_limit}
        if search:
            endpoint = f"{prefix}search.json"
            params.update(
                {"q": search, "sort": "new", "restrict_sr": "on" if category else "off"}
            )
        else:
            endpoint = f"{prefix}new.json"
        posts = []
        after = None
        while len(posts) < post_limit:
            batch_limit = min(100, post_limit - len(posts))
            params["limit"] = batch_limit
            if after:
                params["after"] = after
            data = self._fetch_post_overviews(endpoint, params)
            if not data or "data" not in data or not data["data"].get("children"):
                break
            batch_posts = self._parse_posts(data)
            posts.extend(batch_posts)
            after = data["data"].get("after")
            if not after:
                break
        return posts[:post_limit]
    def _get_new_subreddit_posts(self, subreddit: str, limit: int = 10) -> list[Post]:
        posts = []
        after = None
        url = f"r/{subreddit}/new.json"
        logger.info(f"Fetching new posts from subreddit: {subreddit}")
        while len(posts) < limit:
            batch_limit = min(100, limit - len(posts))
            params = {"limit": batch_limit, "after": after}
            data = self._fetch_post_overviews(url, params)
            batch_posts = self._parse_posts(data)
            logger.debug(
                f"Fetched {len(batch_posts)} new posts from subreddit {subreddit}"
            )
            if not batch_posts:
                break
            posts.extend(batch_posts)
            after = data["data"].get("after")
            if not after:
                break
        return posts
    def get_user(self, username: str) -> User:
        data = self._fetch_post_overviews(f"user/{username}/about.json", {})
        return self._parse_user(data)
    def category_exists(self, category: str) -> bool:
        try:
            data = self._fetch_post_overviews(f"r/{category}/about.json", {})
            return (
                data is not None
                and "data" in data
                and data["data"].get("id") is not None
            )
        except Exception:
            return False
    ## Private Methods ##
    def _parse_posts(self, data) -> list[Post]:
        posts = []
        total_num_posts = len(data["data"]["children"])
        current_index = 0
        for item in data["data"]["children"]:
            current_index += 1
            logger.debug(f"Parsing post {current_index} of {total_num_posts}")
            post_data = item["data"]
            post = Post(
                id=post_data["id"],
                author=post_data["author"],
                title=post_data["title"],
                content=post_data.get("selftext", ""),
                url=post_data["url"],
                timestamp=post_data["created_utc"],
                source=self.source_name,
                comments=self._get_post_comments(post_data["id"]),
            )
            post.subreddit = post_data["subreddit"]
            post.upvotes = post_data["ups"]
            posts.append(post)
        return posts
    def _get_post_comments(self, post_id: str) -> list[Comment]:
        comments: list[Comment] = []
        url = f"comments/{post_id}.json"
        data = self._fetch_post_overviews(url, {})
        if len(data) < 2:
            return comments
        comment_data = data[1]["data"]["children"]
        def _parse_comment_tree(items, parent_id=None):
            for item in items:
                if item["kind"] != "t1":
                    continue
                comment_info = item["data"]
                comment = Comment(
                    id=comment_info["id"],
                    post_id=post_id,
                    author=comment_info["author"],
                    content=comment_info.get("body", ""),
                    timestamp=comment_info["created_utc"],
                    reply_to=parent_id or comment_info.get("parent_id", None),
                    source=self.source_name,
                )
                comments.append(comment)
                # Process replies recursively
                replies = comment_info.get("replies")
                if replies and isinstance(replies, dict):
                    reply_items = replies.get("data", {}).get("children", [])
                    _parse_comment_tree(reply_items, parent_id=comment.id)
        _parse_comment_tree(comment_data)
        return comments
    def _parse_user(self, data) -> User:
        user_data = data["data"]
        user = User(username=user_data["name"], created_utc=user_data["created_utc"])
        user.karma = user_data["total_karma"]
        return user
    def _get_token(self):
        if self.token and time.time() < self.token_expiry:
            return self.token
        logger.info("Fetching new Reddit access token...")
        auth = HTTPBasicAuth(CLIENT_ID, CLIENT_SECRET)
        data = {
            "grant_type": "client_credentials"
        }
        headers = {
            "User-Agent": "python:ethnography-college-project:0.1 (by /u/ThisBirchWood)"
        }
        response = requests.post(
            "https://www.reddit.com/api/v1/access_token",
            auth=auth,
            data=data,
            headers=headers,
        )
        response.raise_for_status()
        token_json = response.json()
        self.token = token_json["access_token"]
        self.token_expiry = time.time() + token_json["expires_in"] - 60
        logger.info(
            f"Obtained new Reddit access token (expires in {token_json['expires_in']}s)"
        )
        return self.token
    def _fetch_post_overviews(self, endpoint: str, params: dict) -> dict:
        url = f"https://oauth.reddit.com/{endpoint.lstrip('/')}"
        max_retries = 15
        backoff = 1  # seconds
        for attempt in range(max_retries):
            try:
                response = requests.get(
                    url,
                    headers={
                        "User-agent": "python:ethnography-college-project:0.1 (by /u/ThisBirchWood)",
                        "Authorization": f"Bearer {self._get_token()}",
                    },
                    params=params,
                )
                if response.status_code == 429:
                    try:
                        wait_time = int(response.headers.get("X-Ratelimit-Reset", backoff))
                        wait_time += 1  # Add a small buffer to ensure the rate limit has reset
                    except ValueError:
                        wait_time = backoff
                    logger.warning(
                        f"Rate limited by Reddit API. Retrying in {wait_time} seconds..."
                    )
                    time.sleep(wait_time)
                    backoff *= 2
                    continue
                if response.status_code == 500:
                    logger.warning("Server error from Reddit API. Retrying...")
                    time.sleep(backoff)
                    backoff *= 2
                    continue
                response.raise_for_status()
                return response.json()
            except requests.RequestException as e:
                print(f"Error fetching data from Reddit API: {e}")
                return {}
--- a/server/connectors/registry.py
+++ b/server/connectors/registry.py
@@ -0,0 +1,35 @@
 import pkgutil
 import importlib
 import server.connectors
 from server.connectors.base import BaseConnector
 def _discover_connectors() -> list[type[BaseConnector]]:
    """Walk the connectors package and collect all BaseConnector subclasses."""
    for _, module_name, _ in pkgutil.iter_modules(server.connectors.__path__):
        if module_name in ("base", "registry"):
            continue
        importlib.import_module(f"server.connectors.{module_name}")
    return [
        cls
        for cls in BaseConnector.__subclasses__()
        if cls.source_name  # guard against abstract intermediaries
    ]
 def get_available_connectors() -> dict[str, type[BaseConnector]]:
    return {c.source_name: c for c in _discover_connectors() if c.is_available()}
 def get_connector_metadata() -> dict[str, dict]:
    res = {}
    for id, obj in get_available_connectors().items():
        res[id] = {
            "id": id,
            "label": obj.display_name,
            "search_enabled": obj.search_enabled,
            "categories_enabled": obj.categories_enabled,
        }
    return res
--- a/server/connectors/youtube_api.py
+++ b/server/connectors/youtube_api.py
@@ -0,0 +1,118 @@
 import os
 import datetime
 import logging
 from dotenv import load_dotenv
 from googleapiclient.discovery import build
 from googleapiclient.errors import HttpError
 from dto.post import Post
 from dto.comment import Comment
 from server.connectors.base import BaseConnector
 load_dotenv()
 API_KEY = os.getenv("YOUTUBE_API_KEY")
 logger = logging.getLogger(__name__)
 logger.setLevel(logging.INFO)
 class YouTubeAPI(BaseConnector):
    source_name: str = "youtube"
    display_name: str = "YouTube"
    search_enabled: bool = True
    categories_enabled: bool = False
    def __init__(self):
        self.youtube = build("youtube", "v3", developerKey=API_KEY)
    def get_new_posts_by_search(
        self, search: str, category: str, post_limit: int
    ) -> list[Post]:
        videos = self._search_videos(search, post_limit)
        posts = []
        for video in videos:
            video_id = video["id"]["videoId"]
            snippet = video["snippet"]
            title = snippet["title"]
            description = snippet["description"]
            published_at = datetime.datetime.strptime(
                snippet["publishedAt"], "%Y-%m-%dT%H:%M:%SZ"
            ).timestamp()
            channel_title = snippet["channelTitle"]
            comments = []
            comments_data = self._get_video_comments(video_id)
            for comment_thread in comments_data:
                comment_snippet = comment_thread["snippet"]["topLevelComment"][
                    "snippet"
                ]
                comment = Comment(
                    id=comment_thread["id"],
                    post_id=video_id,
                    content=comment_snippet["textDisplay"],
                    author=comment_snippet["authorDisplayName"],
                    timestamp=datetime.datetime.strptime(
                        comment_snippet["publishedAt"], "%Y-%m-%dT%H:%M:%SZ"
                    ).timestamp(),
                    reply_to=None,
                    source=self.source_name,
                )
                comments.append(comment)
            post = Post(
                id=video_id,
                content=f"{title}\n\n{description}",
                author=channel_title,
                timestamp=published_at,
                url=f"https://www.youtube.com/watch?v={video_id}",
                title=title,
                source=self.source_name,
                comments=comments,
            )
            posts.append(post)
        return posts
    def category_exists(self, category):
        return True
    def _search_videos(self, query, limit):
        results = []
        next_page_token = None
        while len(results) < limit:
            batch_size = min(50, limit - len(results))
            request = self.youtube.search().list(
                q=query, 
                part="snippet", 
                type="video", 
                maxResults=batch_size, 
                pageToken=next_page_token
            )
            response = request.execute()
            results.extend(response.get("items", []))
            logging.info(f"Fetched {len(results)} out of {limit} videos for query '{query}'")
            next_page_token = response.get("nextPageToken")
            if not next_page_token:
                logging.warning(f"No more pages of results available for query '{query}'")
                break
        return results[:limit]
    def _get_video_comments(self, video_id):
        request = self.youtube.commentThreads().list(
            part="snippet", videoId=video_id, textFormat="plainText"
        )
        try:
            response = request.execute()
        except HttpError as e:
            print(f"Error fetching comments for video {video_id}: {e}")
            return []
        return response.get("items", [])
--- a/server/core/auth.py
+++ b/server/core/auth.py
@@ -1,6 +1,11 @@
 import re
 from server.db.database import PostgresConnector
 from flask_bcrypt import Bcrypt
 EMAIL_REGEX = re.compile(r"[^@]+@[^@]+\.[^@]+")
 class AuthManager:
    def __init__(self, db: PostgresConnector, bcrypt: Bcrypt):
        self.db = db
@@ -18,6 +23,12 @@ class AuthManager:
    def register_user(self, username, email, password):
        hashed_password = self.bcrypt.generate_password_hash(password).decode("utf-8")
        if len(username) < 3:
            raise ValueError("Username must be longer than 3 characters")
        if not EMAIL_REGEX.match(email):
            raise ValueError("Please enter a valid email address")
        if self.get_user_by_email(email):
            raise ValueError("Email already registered")
@@ -28,7 +39,7 @@ class AuthManager:
    def authenticate_user(self, username, password):
        user = self.get_user_by_username(username)
-        if user and self.bcrypt.check_password_hash(user['password_hash'], password):
+        if user and self.bcrypt.check_password_hash(user["password_hash"], password):
            return user
        return None
@@ -38,7 +49,9 @@ class AuthManager:
        return result[0] if result else None
    def get_user_by_username(self, username) -> dict:
-        query = "SELECT id, username, email, password_hash FROM users WHERE username = %s"
+        query = (
            "SELECT id, username, email, password_hash FROM users WHERE username = %s"
        )
        result = self.db.execute(query, (username,), fetch=True)
        return result[0] if result else None
--- a/server/core/datasets.py
+++ b/server/core/datasets.py
@@ -1,7 +1,8 @@
 import pandas as pd
 from server.db.database import PostgresConnector
 from psycopg2.extras import Json
-from server.exceptions import NotAuthorisedException, NonExistentDatasetException
+from server.exceptions import NonExistentDatasetException
 class DatasetManager:
    def __init__(self, db: PostgresConnector):
@@ -20,12 +21,39 @@ class DatasetManager:
    def get_user_datasets(self, user_id: int) -> list[dict]:
        query = "SELECT * FROM datasets WHERE user_id = %s"
-        return self.db.execute(query, (user_id, ), fetch=True)
+        return self.db.execute(query, (user_id,), fetch=True)
    def get_dataset_content(self, dataset_id: int) -> pd.DataFrame:
        query = "SELECT * FROM events WHERE dataset_id = %s"
        result = self.db.execute(query, (dataset_id,), fetch=True)
-        return pd.DataFrame(result)
+        df = pd.DataFrame(result)
        if df.empty:
            return df
        dedupe_columns = [
            column
            for column in [
                "post_id",
                "parent_id",
                "reply_to",
                "author",
                "type",
                "timestamp",
                "dt",
                "title",
                "content",
                "source",
                "topic",
            ]
            if column in df.columns
        ]
        if dedupe_columns:
            df = df.drop_duplicates(subset=dedupe_columns, keep="first")
        else:
            df = df.drop_duplicates(keep="first")
        return df.reset_index(drop=True)
    def get_dataset_info(self, dataset_id: int) -> dict:
        query = "SELECT * FROM datasets WHERE id = %s"
@@ -42,13 +70,25 @@ class DatasetManager:
            VALUES (%s, %s, %s)
            RETURNING id
        """
-        result = self.db.execute(query, (user_id, dataset_name, Json(topics)), fetch=True)
+        result = self.db.execute(
            query, (user_id, dataset_name, Json(topics)), fetch=True
        )
        return result[0]["id"] if result else None
    def save_dataset_content(self, dataset_id: int, event_data: pd.DataFrame):
        if event_data.empty:
            return
        dedupe_columns = [
            column for column in ["id", "type", "source"] if column in event_data.columns
        ]
        if dedupe_columns:
            event_data = event_data.drop_duplicates(subset=dedupe_columns, keep="first")
        else:
            event_data = event_data.drop_duplicates(keep="first")
        self.delete_dataset_content(dataset_id)
        query = """
            INSERT INTO events (
                dataset_id,
@@ -101,7 +141,7 @@ class DatasetManager:
                row["source"],
                row.get("topic"),
                row.get("topic_confidence"),
-                Json(row["ner_entities"]) if row.get("ner_entities") else None,
+                Json(row["entities"]) if row.get("entities") is not None else None,
                row.get("emotion_anger"),
                row.get("emotion_disgust"),
                row.get("emotion_fear"),
@@ -113,8 +153,10 @@ class DatasetManager:
        self.db.execute_batch(query, values)
-    def set_dataset_status(self, dataset_id: int, status: str, status_message: str | None = None):
+    def set_dataset_status(
-        if status not in ["processing", "complete", "error"]:
+        self, dataset_id: int, status: str, status_message: str | None = None
    ):
        if status not in ["fetching", "processing", "complete", "error"]:
            raise ValueError("Invalid status")
        query = """
@@ -137,7 +179,7 @@ class DatasetManager:
            WHERE id = %s
        """
-        result = self.db.execute(query, (dataset_id, ), fetch=True)
+        result = self.db.execute(query, (dataset_id,), fetch=True)
        if not result:
            print(result)
@@ -152,9 +194,9 @@ class DatasetManager:
    def delete_dataset_info(self, dataset_id: int):
        query = "DELETE FROM datasets WHERE id = %s"
-        self.db.execute(query, (dataset_id, ))
+        self.db.execute(query, (dataset_id,))
    def delete_dataset_content(self, dataset_id: int):
        query = "DELETE FROM events WHERE dataset_id = %s"
-        self.db.execute(query, (dataset_id, ))
+        self.db.execute(query, (dataset_id,))
--- a/server/db/database.py
+++ b/server/db/database.py
@@ -1,8 +1,17 @@
 import os
 import psycopg2
 import os
 from dotenv import load_dotenv
 from psycopg2.extras import RealDictCursor
 from psycopg2.extras import execute_batch
 load_dotenv()
 postgres_host = os.getenv("POSTGRES_HOST", "localhost")
 postgres_port = os.getenv("POSTGRES_PORT", 5432)
 postgres_user = os.getenv("POSTGRES_USER", "postgres")
 postgres_password = os.getenv("POSTGRES_PASSWORD", "postgres")
 postgres_db = os.getenv("POSTGRES_DB", "postgres")
 from server.exceptions import DatabaseNotConfiguredException
@@ -15,14 +24,16 @@ class PostgresConnector:
        try:
            self.connection = psycopg2.connect(
-                host=os.getenv("POSTGRES_HOST", "localhost"),
+                host=postgres_host,
-                port=os.getenv("POSTGRES_PORT", 5432),
+                port=postgres_port,
-                user=os.getenv("POSTGRES_USER", "postgres"),
+                user=postgres_user,
-                password=os.getenv("POSTGRES_PASSWORD", "postgres"),
+                password=postgres_password,
-                database=os.getenv("POSTGRES_DB", "postgres"),
+                database=postgres_db,
            )
        except psycopg2.OperationalError as e:
-            raise DatabaseNotConfiguredException(f"Ensure database is up and running: {e}")
+            raise DatabaseNotConfiguredException(
                f"Ensure database is up and running: {e}"
            )
        self.connection.autocommit = False
--- a/server/db/schema.sql
+++ b/server/db/schema.sql
@@ -23,7 +23,7 @@ CREATE TABLE datasets (
    -- Enforce valid states
    CONSTRAINT datasets_status_check
-    CHECK (status IN ('processing', 'complete', 'error'))
+    CHECK (status IN ('fetching', 'processing', 'complete', 'error'))
 );
 CREATE TABLE events (
@@ -43,7 +43,7 @@ CREATE TABLE events (
    weekday VARCHAR(255) NOT NULL,
    /* Posts Only */
-    title VARCHAR(255),
+    title TEXT,
    /* Comments Only*/
    parent_id VARCHAR(255),
--- a/server/queue/celery_app.py
+++ b/server/queue/celery_app.py
@@ -1,16 +1,23 @@
 from celery import Celery
 from dotenv import load_dotenv
 from server.utils import get_env
 load_dotenv()
 REDIS_URL = get_env("REDIS_URL")
 def create_celery():
    celery = Celery(
        "ethnograph",
-        broker="redis://redis:6379/0",
+        broker=REDIS_URL,
-        backend="redis://redis:6379/0",
+        backend=REDIS_URL,
    )
    celery.conf.task_serializer = "json"
    celery.conf.result_serializer = "json"
    celery.conf.accept_content = ["json"]
    return celery
 celery = create_celery()
 from server.queue import tasks
--- a/server/queue/tasks.py
+++ b/server/queue/tasks.py
@@ -1,9 +1,16 @@
 from time import time
 import pandas as pd
 import logging
 from server.queue.celery_app import celery
 from server.analysis.enrichment import DatasetEnrichment
 from server.db.database import PostgresConnector
 from server.core.datasets import DatasetManager
 from server.connectors.registry import get_available_connectors
 logger = logging.getLogger(__name__)
@celery.task(bind=True, max_retries=3)
 def process_dataset(self, dataset_id: int, posts: list, topics: dict):
@@ -13,10 +20,65 @@ def process_dataset(self, dataset_id: int, posts: list, topics: dict):
    try:
        df = pd.DataFrame(posts)
        dataset_manager.set_dataset_status(
            dataset_id, "processing", "NLP Processing Started"
        )
        processor = DatasetEnrichment(df, topics)
        enriched_df = processor.enrich()
        dataset_manager.save_dataset_content(dataset_id, enriched_df)
-        dataset_manager.set_dataset_status(dataset_id, "complete", "NLP Processing Completed Successfully")
+        dataset_manager.set_dataset_status(
            dataset_id, "complete", "NLP Processing Completed Successfully"
        )
    except Exception as e:
-        dataset_manager.set_dataset_status(dataset_id, "error", f"An error occurred: {e}")
+        dataset_manager.set_dataset_status(
            dataset_id, "error", f"An error occurred: {e}"
        )
@celery.task(bind=True, max_retries=3)
 def fetch_and_process_dataset(
    self, dataset_id: int, source_info: list[dict], topics: dict
 ):
    connectors = get_available_connectors()
    db = PostgresConnector()
    dataset_manager = DatasetManager(db)
    posts = []
    try:
        for metadata in source_info:
            fetch_start = time()
            name = metadata["name"]
            search = metadata.get("search")
            category = metadata.get("category")
            limit = metadata.get("limit", 100)
            connector = connectors[name]()
            raw_posts = connector.get_new_posts_by_search(
                search=search, category=category, post_limit=limit
            )
            posts.extend(post.to_dict() for post in raw_posts)
        fetch_time = time() - fetch_start
        df = pd.DataFrame(posts)
        nlp_start = time()
        dataset_manager.set_dataset_status(
            dataset_id, "processing", "NLP Processing Started"
        )
        processor = DatasetEnrichment(df, topics)
        enriched_df = processor.enrich()
        nlp_time = time() - nlp_start
        dataset_manager.save_dataset_content(dataset_id, enriched_df)
        dataset_manager.set_dataset_status(
            dataset_id, "complete", f"Completed Successfully. Fetch time: {fetch_time:.2f}s, NLP time: {nlp_time:.2f}s"
        )
    except Exception as e:
        dataset_manager.set_dataset_status(
            dataset_id, "error", f"An error occurred: {e}"
        )
--- a/server/topics.json
+++ b/server/topics.json
@@ -0,0 +1,67 @@
 {
  "Personal Life": "daily life, life updates, what happened today, personal stories, life events, reflections",
  "Relationships": "dating, relationships, breakups, friendships, family relationships, marriage, relationship advice",
  "Family & Parenting": "parents, parenting, children, raising kids, family dynamics, family stories",
  "Work & Careers": "jobs, workplaces, office life, promotions, quitting jobs, career advice, workplace drama",
  "Education": "school, studying, exams, university, homework, academic pressure, learning experiences",
  "Money & Finance": "saving money, debt, budgeting, cost of living, financial advice, personal finance",
  "Health & Fitness": "exercise, gym, workouts, running, diet, fitness routines, weight loss",
  "Mental Health": "stress, anxiety, depression, burnout, therapy, emotional wellbeing",
  "Food & Cooking": "meals, cooking, recipes, restaurants, snacks, food opinions",
  "Travel": "holidays, trips, tourism, travel experiences, airports, flights, travel tips",
  "Entertainment": "movies, TV shows, streaming services, celebrities, pop culture",
  "Music": "songs, albums, artists, concerts, music opinions",
  "Gaming": "video games, gaming culture, consoles, PC gaming, esports",
  "Sports": "sports matches, teams, players, competitions, sports opinions",
  "Technology": "phones, gadgets, apps, AI, software, tech trends",
  "Internet Culture": "memes, viral trends, online jokes, internet drama, trending topics",
  "Social Media": "platforms, influencers, content creators, algorithms, online communities",
  "News & Current Events": "breaking news, world events, major incidents, public discussions",
  "Politics": "political debates, elections, government policies, ideology",
  "Culture & Society": "social issues, cultural trends, generational debates, societal changes",
  "Identity & Lifestyle": "personal identity, lifestyle choices, values, self-expression",
  "Hobbies & Interests": "art, photography, crafts, collecting, hobbies",
  "Fashion & Beauty": "clothing, style, makeup, skincare, fashion trends",
  "Animals & Pets": "pets, animal videos, pet care, wildlife",
  "Humour": "jokes, funny stories, sarcasm, memes",
  "Opinions & Debates": "hot takes, controversial opinions, arguments, discussions",
  "Advice & Tips": "life advice, tutorials, how-to tips, recommendations",
  "Product Reviews": "reviews, recommendations, experiences with products",
  "Complaints & Rants": "frustrations, complaining, venting about things",
  "Motivation & Inspiration": "motivational quotes, success stories, encouragement",
  "Questions & Curiosity": "asking questions, seeking opinions, curiosity posts",
  "Celebrations & Achievements": "birthdays, milestones, achievements, good news",
  "Random Thoughts": "shower thoughts, observations, random ideas"
 }
--- a/server/utils.py
+++ b/server/utils.py
@@ -1,4 +1,5 @@
 import datetime
 import os
 from flask import request
 def parse_datetime_filter(value):
@@ -48,3 +49,9 @@ def get_request_filters() -> dict:
        filters["data_sources"] = data_sources
    return filters
 def get_env(name: str) -> str:
    value = os.getenv(name)
    if not value:
        raise RuntimeError(f"Missing required environment variable: {name}")
    return value
Author	SHA1	Message	Date
Dylan De Faoite	5970f555fa	docs(readme): update readme	2026-04-19 13:54:09 +01:00
Dylan De Faoite	9b7a51ff33	docs(report): add Declaration of Originality and Acknowledgements sections	2026-04-18 22:10:16 +01:00
Dylan De Faoite	2d39ea6e66	refactor(connector): clean up comments	2026-04-18 22:10:03 +01:00
Dylan De Faoite	c1e5482f55	docs(report): fix typos	2026-04-18 16:09:22 +01:00
Dylan De Faoite	b2d7f6edaf	docs(report): add visualizations and emotional analysis for Cork dataset	2026-04-18 15:44:04 +01:00
Dylan De Faoite	10efa664df	docs(report): fix typos and add more eval	2026-04-17 20:31:39 +01:00
Dylan De Faoite	3db7c1d3ae	docs(report): add future work section	2026-04-16 16:54:18 +01:00
Dylan De Faoite	72e17e900e	fix(report): correct typos	2026-04-16 16:41:27 +01:00
Dylan De Faoite	7b9a17f395	fix(connector): reduce ThreadPoolExecutor max_workers	2026-04-16 16:37:27 +01:00
Dylan De Faoite	0a396dd504	docs(report): add more citations	2026-04-16 16:23:36 +01:00
Dylan De Faoite	c6e8144116	docs(report): add traditionl vs digital ethnography reference	2026-04-16 16:08:59 +01:00
Dylan De Faoite	760d2daf7f	docs(report): remove redundant phrasing	2026-04-16 15:59:24 +01:00
Dylan De Faoite	ca38b992eb	build(docker): switch backend flask deployment to Gunicorn	2026-04-15 17:57:22 +01:00
Dylan De Faoite	ee9c7b4ab2	docs(report): finish evaluation & reflection	2026-04-15 17:52:54 +01:00
Dylan De Faoite	703a7c435c	fix(youtube_api): video search capped at 50	2026-04-14 17:54:43 +01:00
Dylan De Faoite	02ba727d05	chore(connector): add buffer to ratelimit reset	2026-04-14 17:41:09 +01:00
Dylan De Faoite	76591bc89e	feat(tasks): add fetch and NLP processing time logging to dataset status	2026-04-14 17:35:43 +01:00
Dylan De Faoite	e35e51d295	fix(reddit_api): handle rate limit wait time conversion error	2026-04-14 17:35:21 +01:00
Dylan De Faoite	d2fe637743	docs: update references for digital ethnography and further work on evaluation	2026-04-14 15:16:56 +01:00
Dylan De Faoite	e1831aab7d	docs(report): add researcher feedback	2026-04-13 22:00:41 +01:00
Dylan De Faoite	a3ef5a5655	chore: add more defaults to example env	2026-04-13 22:00:19 +01:00
Dylan De Faoite	5f943ce733	Merge pull request 'Corpus Explorer Feature' (#11 ) from feat/corpus-explorer into main Reviewed-on: #11	2026-04-13 19:02:45 +01:00
Dylan De Faoite	9964a919c3	docs(report): enhance frontend design section	2026-04-13 19:01:51 +01:00
Dylan De Faoite	c11434344a	refactor: streamline CorpusExplorer components	2026-04-13 17:06:46 +01:00
Dylan De Faoite	bc356848ef	docs(report): start frontend section	2026-04-13 16:43:20 +01:00
Dylan De Faoite	047427432f	docs(report): add summary section for dataset overview and update authentication manager details	2026-04-13 12:24:43 +01:00
Dylan De Faoite	d0d02e9ebf	docs(report): add stance markers image and update related sections	2026-04-12 16:15:18 +01:00
Dylan De Faoite	68342606e3	docs(report): add NLP backoff diagram and update references for NER model	2026-04-11 15:24:57 +01:00
Dylan De Faoite	afae7f42a1	docs(report): add data pipeline diagram and update references for embedding models	2026-04-11 15:03:24 +01:00
Dylan De Faoite	4dd2721e98	Merge remote-tracking branch 'origin/main' into feat/corpus-explorer	2026-04-10 13:19:17 +01:00
Dylan De Faoite	99afe82464	docs(report): refine emotional classification model details	2026-04-10 13:17:11 +01:00
Dylan De Faoite	8c44df94c0	docs(report): update references for emotion classification models and NLP techniques	2026-04-09 19:01:21 +01:00
Dylan De Faoite	42905cc547	docs(report): add connector implementation & design NLP docs	2026-04-08 20:39:51 +01:00
Dylan De Faoite	ec64551881	fix(connectors): update User-Agent header for BoardsAPI	2026-04-08 19:34:30 +01:00
Dylan De Faoite	e274b8295a	docs(report): add citations and start implementation section	2026-04-08 17:28:41 +01:00
Dylan De Faoite	3df6776111	docs(report): add decision tradeoff decisions	2026-04-07 18:04:25 +01:00
Dylan De Faoite	a347869353	docs(report): add more justification for ethnographic endpoints	2026-04-07 15:22:47 +01:00
Dylan De Faoite	8b4e13702e	docs(report): add ucc crest to title page	2026-04-07 12:55:01 +01:00
Dylan De Faoite	8fa4f3fbdf	refactor(report): move data pipeline above ethnographic analysis	2026-04-07 12:52:48 +01:00
Dylan De Faoite	c6cae040f0	feat(analysis): add emotional averages to stance markers	2026-04-07 12:49:18 +01:00
Dylan De Faoite	addc1d4087	docs(report): add justification at each stage	2026-04-07 12:17:02 +01:00
Dylan De Faoite	225133a074	docs(report): add ethnographic analysis section	2026-04-07 11:54:57 +01:00
Dylan De Faoite	e903e1b738	feat(user): add dominant topic information to user data	2026-04-07 11:34:03 +01:00
Dylan De Faoite	0c4dc02852	docs(report): add ethnographic analysis section	2026-04-06 19:39:09 +01:00
Dylan De Faoite	33e4291def	docs(report): add table of contents	2026-04-06 19:34:38 +01:00
Dylan De Faoite	cedbce128e	docs(report): add auto-fetch section	2026-04-06 19:32:49 +01:00
Dylan De Faoite	107dae0e95	docs(report): add data storage section	2026-04-06 19:26:10 +01:00
Dylan De Faoite	23833e2c5b	docs(report): add custom topic section	2026-04-06 18:47:29 +01:00
Dylan De Faoite	f2b6917f1f	docs(report); add data ingestion section	2026-04-06 12:44:17 +01:00
Dylan De Faoite	b57a8d3c65	docs(report): add data pipeline and connector sections Also moved requirements to the end of design, where it is more appropriately placed. Requirements can be specified after discussing potential pitfalls.	2026-04-04 14:36:52 +01:00
Dylan De Faoite	ac65e26eab	docs(report): add ethics section	2026-04-04 13:52:56 +01:00
Dylan De Faoite	6efa75dfe6	chore(connectors): reduce aggressive parallel connections to boards.ie	2026-04-04 12:33:06 +01:00
Dylan De Faoite	de61e7653f	perf(connector): add reddit API authentication to speed up fetching This aligns better with ethics and massively increases rate limits.	2026-04-04 12:26:54 +01:00
Dylan De Faoite	98aa04256b	fix(reddit_api): fix reddit ratelimit check	2026-04-04 10:20:48 +01:00
Dylan De Faoite	5f81c51979	docs(report): add scalability constraints	2026-04-03 20:06:19 +01:00
Dylan De Faoite	361b532766	docs(analysis): add feasability analysis	2026-04-03 20:02:22 +01:00
Dylan De Faoite	9ef96661fc	report(analysis): update structure & add justifications	2026-04-03 18:35:08 +01:00
Dylan De Faoite	9375abded5	docs(design): add docker & async processing sections	2026-04-03 17:59:01 +01:00
Dylan De Faoite	74ecdf238a	docs: add database schema diagram	2026-04-02 19:30:20 +01:00
Dylan De Faoite	b85987e179	docs: add system architecture diagram	2026-04-02 18:59:32 +01:00
Dylan De Faoite	37d08c63b8	chore: rename auto-scraper to auto-fetcher Improves the perception of ethics	2026-04-01 09:50:53 +01:00
Dylan De Faoite	1482e96051	feat(datasets): implement deduplication of dataset records in get_dataset_content	2026-04-01 09:06:07 +01:00
Dylan De Faoite	cd6030a760	fix(ngrams): remove stop words from ngrams	2026-04-01 08:44:47 +01:00
Dylan De Faoite	6378015726	fix(stats): remove duplicated entries in corpus explorer	2026-04-01 00:22:29 +01:00
Dylan De Faoite	430793cd09	feat(frontend): add "show more" functionality to corpus explorer	2026-04-01 00:09:20 +01:00
Dylan De Faoite	b270ed03ae	feat(frontend): implement corpus explorer This allows you to view the posts & comments associated with a specific aggregate.	2026-04-01 00:04:25 +01:00
Dylan De Faoite	1dde5f7b08	fix(nlp): fix missing `processing` dataset status update	2026-03-31 20:59:09 +01:00
Dylan De Faoite	a841c6f6a1	perf(stats): memoize derived state and reduce intermediate allocations	2026-03-31 20:15:07 +01:00
Dylan De Faoite	2045ccebb5	build(docker): update CMD to include host binding	2026-03-31 19:31:58 +01:00
Dylan De Faoite	efb4c8384d	chore(stats): remove average_thread_depth	2026-03-31 16:40:54 +01:00
Dylan De Faoite	75fd042d74	feat(api): add support for custom topic lists when autoscraping	2026-03-31 13:36:37 +01:00
Dylan De Faoite	e776ef53ac	refactor(database): configurable database source	2026-03-29 21:30:18 +01:00
Dylan De Faoite	f996b38fa5	fix(report): remove unicode char	2026-03-25 19:46:29 +00:00
Dylan De Faoite	6d8ae3e811	docs: add section on Topic Modelling in NLP	2026-03-25 19:44:14 +00:00
Dylan De Faoite	376773a0cc	style: run python linter & prettifier on backend code	2026-03-25 19:34:43 +00:00
Dylan De Faoite	aae10c4d9d	style: run prettifier plugin on entire frontend	2026-03-25 19:30:21 +00:00
Dylan De Faoite	8730af146d	chore: remove main.py Not used anymore.	2026-03-22 14:41:47 +00:00
Dylan De Faoite	7716ee0bff	build(env): extract Redis URL into env file This could allow one to connect to a remote Redis instance with a powerful GPU, allowing one to offload the NLP work.	2026-03-22 14:41:15 +00:00
Dylan De Faoite	97e897c240	fix(analysis): broken entity handling in cultural endpoint	2026-03-22 14:34:05 +00:00
Dylan De Faoite	c3762f189c	build(docker): comment out GPU deployment configuration from worker service While this works for NVIDIA GPUs, it breaks on a MacBook or any non-NVIDIA machine. I commented it out because it's still useful on these machines.	2026-03-22 13:34:51 +00:00
Dylan De Faoite	078716754c	feat(report): add main.tex for project documentation and analysis	2026-03-21 23:54:42 +00:00
Dylan De Faoite	e43eae5afd	fix(frontend): missing "fetching" status from auto-scrape When auto-scraping, the dataset status page would say "Dataset Ready" when it was still fetching.	2026-03-21 22:49:16 +00:00
Dylan De Faoite	b537b5ef16	docs: update .gitignore	2026-03-21 19:24:51 +00:00
Dylan De Faoite	acc591ff1e	Merge pull request 'Finish off the links between frontend and backend' (#10 ) from feat/add-frontend-pages into main Reviewed-on: #10	2026-03-18 20:30:19 +00:00
Dylan De Faoite	e054997bb1	feat(frontend): reword CulturalStats to improve understandability	2026-03-18 19:23:35 +00:00
Dylan De Faoite	e5414befa7	feat(frontend): add dominant emotion display to UserModal	2026-03-18 19:12:25 +00:00
Dylan De Faoite	86926898ce	feat(frontend): improve labels to be more understandable	2026-03-18 19:12:11 +00:00
Dylan De Faoite	b1177540a1	feat(frontend): enhance EmotionalStats component with detailed mood analysis	2026-03-18 19:11:18 +00:00
Dylan De Faoite	f604fcc531	feat(frontend): add warning message for scraping limits	2026-03-18 19:02:11 +00:00
Dylan De Faoite	b7aec2b0ea	feat(frontend): add favicon Credit goes to `srip` on flaticon for the image.	2026-03-18 19:00:31 +00:00
Dylan De Faoite	1446dd176d	feat(frontend): center page selection	2026-03-18 18:53:14 +00:00
Dylan De Faoite	c215024ef2	feat(frontend): add deleted user filter Reddit often contains "[Deleted]" when a user is banned or deletes their post/comment. Keeping the backend faithful to the original dataset is important so the filtering is being done on the frontend.	2026-03-18 18:50:51 +00:00
Dylan De Faoite	17ef42e548	feat!(frontend): add cultural, interactional and linguistic stat pages	2026-03-18 18:43:49 +00:00
Dylan De Faoite	7e4a91bb5e	style(frontend): style api types to be in order of the endpoint	2026-03-18 18:40:39 +00:00
Dylan De Faoite	436549641f	chore(frontend): add api types for new backend data	2026-03-18 18:37:39 +00:00
Dylan De Faoite	3e78a54388	feat(stat): add conversation concentration metric Remove old `initiator_ratio` metric which wasn't working due every event having a `reply_to` value. This metric was suggested by AI, and is a surprisingly interesting one that gave interesting insights.	2026-03-18 18:36:09 +00:00
Dylan De Faoite	71998c450e	fix(db): change title type to text Occasionally a Reddit post would have a long title, and would break in the schema.	2026-03-17 19:49:03 +00:00
Dylan De Faoite	2a00384a55	feat(interaction): add top interaction pairs and initiator ratio methods	2026-03-17 19:03:56 +00:00
Dylan De Faoite	8372aa7278	feat(api): add endpoint to view entire dataset	2026-03-17 13:36:41 +00:00
Dylan De Faoite	7b5a939271	fix(stats): missing private methods in User obj	2026-03-17 13:36:10 +00:00
Dylan De Faoite	2fa1dff4b7	feat(stat): add lexical diversity stat	2026-03-17 13:27:49 +00:00
Dylan De Faoite	31fb275ee3	fix(db): incorrect NER column being inserted	2026-03-17 12:53:30 +00:00
Dylan De Faoite	8a0f6e71e8	chore(api): rename cultural entity emotion endpoint	2026-03-17 12:31:53 +00:00
Dylan De Faoite	9093059d05	refactor(stats): move user stats out of interactional into users	2026-03-17 12:23:03 +00:00
Dylan De Faoite	8a13444b16	chore(frontend): add new API types	2026-03-16 16:46:07 +00:00
Dylan De Faoite	3468fdc2ea	feat(api): add new user and linguistic endpoints	2026-03-16 16:45:11 +00:00
Dylan De Faoite	09a4f9036f	refactor(stats): add summary and user stat classes for consistency	2026-03-16 16:43:24 +00:00
Dylan De Faoite	97fccd073b	feat(emotional): add average emotion & dominant emotion stats	2026-03-16 16:41:28 +00:00
Dylan De Faoite	94befb61c5	Merge pull request 'Automatic Scraping of dataset options' (#9 ) from feat/automatic-scraping-datasets into main Reviewed-on: #9	2026-03-14 21:58:49 +00:00
Dylan De Faoite	12f5953146	fix(api): remove error exceptions in API responses Mainly a security thing, we don't want actual code errors being given in the API response, as someone could find out how the inner workings of the code behaves.	2026-03-14 21:58:00 +00:00
Dylan De Faoite	5b0441c34b	fix(connector): unnecessary comment limits In addition, I made some methods private to better align with the BaseConnector parent class.	2026-03-14 21:53:13 +00:00
Dylan De Faoite	d2b919cd66	fix(api): enforce integer limit and cap at 1000 in scrape_data function	2026-03-14 17:35:05 +00:00
Dylan De Faoite	062937ec3c	fix(api): incorrect validation on search	2026-03-14 17:12:02 +00:00
Dylan De Faoite	2a00795cc2	chore(connectors): implement `category_exists` for Boards API	2026-03-14 17:11:49 +00:00
Dylan De Faoite	c990f29645	fix(frontend): misaligned loading page for datasets	2026-03-14 17:05:46 +00:00
Dylan De Faoite	8a423b2a29	feat(connectors): implement category validation in scraping process	2026-03-14 16:59:43 +00:00
Dylan De Faoite	d96f459104	fix(connectors): update URL references to use base_url in BoardsAPI	2026-03-13 21:59:17 +00:00
Dylan De Faoite	162a4de64e	fix(frontend): detects which sources support category or search	2026-03-12 10:07:28 +00:00
Dylan De Faoite	6684780d23	fix(connectors): add stronger validation to scrape endpoint Strong validation needed, otherwise data goes to Celery and crashes silently. In addition it checks if that specific source supports search or category.	2026-03-12 09:59:07 +00:00
Dylan De Faoite	c12f1b4371	chore(connectors): add category and search validation fields	2026-03-12 09:56:34 +00:00
Dylan De Faoite	01d6bd0164	fix(connectors): category / search fields breaking Ideally category and search are fully optional, however some sites break if one or the other is not provided. Unfortuntely `boards.ie` has a different page type for searches and I'm not bothered to implement a scraper from scratch. In addition, removed comment limit options.	2026-03-11 21:16:26 +00:00
Dylan De Faoite	12cbc24074	chore(utils): remove `split_limit` function	2026-03-11 19:47:44 +00:00
Dylan De Faoite	0658713f42	chore: remove unused dataset creation script	2026-03-11 19:44:38 +00:00
Dylan De Faoite	b2ae1a9f70	feat(frontend): add page for scraping endpoint	2026-03-11 19:41:34 +00:00
Dylan De Faoite	eff416c34e	fix(connectors): hardcoded source name in Youtube connector	2026-03-10 23:36:09 +00:00
Dylan De Faoite	524c9c50a0	fix(api): incorrect dataset status update message	2026-03-10 23:28:21 +00:00
Dylan De Faoite	2ab74d922a	feat(api): support per-source search, category and limit configuration	2026-03-10 23:15:33 +00:00
Dylan De Faoite	d520e2af98	fix(auth): missing email and username business rules	2026-03-10 22:48:04 +00:00
Dylan De Faoite	8fe84a30f6	fix: data leak when opening topics file	2026-03-10 22:45:07 +00:00
Dylan De Faoite	dc330b87b9	fix(celery): process dataset directly in fetch task Calling the original `process_dataset` function led to issues with JSON serialisation.	2026-03-10 22:17:00 +00:00
Dylan De Faoite	7ccc934f71	build: change celery to debug mode	2026-03-10 22:14:45 +00:00
Dylan De Faoite	a3dbe04a57	fix(frontend): option to delete dataset not shown after fail	2026-03-10 19:23:48 +00:00
Dylan De Faoite	a65c4a461c	fix(api): flask delegates dataset fetch to celery	2026-03-10 19:17:41 +00:00
Dylan De Faoite	15704a0782	chore(db): update db schema to include "fetching" status	2026-03-10 19:17:08 +00:00
Dylan De Faoite	6ec47256d0	feat(api): add database scraping endpoints	2026-03-10 19:04:33 +00:00
Dylan De Faoite	2572664e26	chore(utils): add env getter that fails if env not found	2026-03-10 18:50:53 +00:00
Dylan De Faoite	17bd4702b2	fix(connectors): connector detectors returning name of ID alongside connector obj	2026-03-10 18:36:40 +00:00
Dylan De Faoite	53cb5c2ea5	feat(topics): add generalised topic list This is easier and quicker compared to deriving a topics list based on the dataset that has been scraped. While using LLMs to create a personalised topic list based on the query, category or dataset itself would yield better results for most, it is beyond the scope of this project.	2026-03-10 18:36:08 +00:00
Dylan De Faoite	0866dda8b3	chore: add util to always split evenly	2026-03-10 18:25:05 +00:00
Dylan De Faoite	5ccb2e73cd	fix(connectors): incorrect registry location Registry paths were using the incorrect connector path locations.	2026-03-10 18:18:42 +00:00
Dylan De Faoite	2a8d7c7972	refactor(connectors): Youtube & Reddit connectors implement BaseConnector	2026-03-10 18:11:33 +00:00
Dylan De Faoite	e7a8c17be4	chore(connectors): add base connector inheritance	2026-03-10 18:08:01 +00:00
Dylan De Faoite	cc799f7368	feat(connectors): add base connector and registry for detection Idea is to have a "plugin-type" system, where new connectors can extend the `BaseConnector` class and implement the fetch posts method. These are automatically detected by the registry, and automatically used in new Flask endpoints that give a list of possible sources. Allows for an open-ended system where new data scrapers / API consumers can be added dynamically.	2026-03-09 21:29:03 +00:00
Dylan De Faoite	262a70dbf3	refactor(api): rename /upload endpoint Ensures consistency with the other dataset-based endpoints and follows the REST-API rules more cleanly.	2026-03-09 20:55:12 +00:00
Dylan De Faoite	ca444e9cb0	refactor: move connectors to backend dir They will now be more used in the backend.	2026-03-09 20:53:13 +00:00
Dylan De Faoite	738af5415b	Merge pull request 'Editable and removable datasets' (#8 ) from feat/editable-datasets into main Reviewed-on: #8	2026-03-05 16:55:48 +00:00
`@@ -10,4 +10,4 @@ COPY . .`

	`EXPOSE 5173`	`EXPOSE 5173`

	`CMD ["npm", "run", "dev", "--", "--host"]`	`CMD ["npm", "run", "dev", "--", "--host", "0.0.0.0"]`