A web-based analytics platform for exploring online communities. Built as a final year CS project at UCC, crosspost ingests data from Reddit, YouTube, and Boards.ie, runs NLP analysis on it (emotion detection, topic classification, named entity recognition, stance markers), and surfaces the results through an interactive dashboard. The motivating use case is digital ethnography — studying how people talk, what they talk about, and how culture forms in online spaces. The included dataset is centred on Cork, Ireland.

What it does

Fetch posts and comments from Reddit, YouTube, and Boards.ie (or upload your own .jsonl file)
Normalise everything into a unified schema regardless of source
Run NLP analysis asynchronously in the background via Celery workers
Explore results through a tabbed dashboard: temporal patterns, word clouds, emotion breakdowns, user activity, interaction graphs, topic clusters, and more
Multi-user support — each user has their own datasets, isolated from everyone else

Prerequisites

Docker & Docker Compose
A Reddit App (client id & secret)
YouTube Data v3 API Key

Setup

Clone the Repo

git clone https://github.com/your-username/crosspost.git
cd crosspost

Configure Enviornment Vars

cp example.env .env

Fill in each required empty env. Some are already filled in, these are sensible defaults that usually don't need to be changed

Start everything

docker compose up -d

This starts:

crosspost_db — PostgreSQL on port 5432
crosspost_redis — Redis on port 6379
crosspost_flask — Flask API on port 5000
crosspost_worker — Celery worker for background NLP/fetching tasks
crosspost_frontend — Vite dev server on port 5173

Data Format for Manual Uploads

If you want to upload your own data rather than fetch it via the connectors, the expected format is newline-delimited JSON (.jsonl) where each line is a post object:

{"id": "abc123", "author": "username", "title": "Post title", "content": "Post body", "url": "https://...", "timestamp": 1700000000.0, "source": "reddit", "comments": []}

Notes

GPU support: The Celery worker is configured with --pool=solo to avoid memory conflicts when multiple NLP models are loaded. If you have an NVIDIA GPU, uncomment the deploy.resources block in docker-compose.yml and make sure the NVIDIA Container Toolkit is installed.

Releases 1

1.0 - FYP Submission Latest

2026-04-19 19:53:11 +01:00

Languages

TypeScript 39.1%

TeX 35.4%

Python 24.8%

CSS 0.2%

JavaScript 0.2%

Other 0.2%