Compare commits
88 Commits
f604fcc531
...
main
| Author | SHA1 | Date | |
|---|---|---|---|
| 5970f555fa | |||
| 9b7a51ff33 | |||
| 2d39ea6e66 | |||
| c1e5482f55 | |||
| b2d7f6edaf | |||
| 10efa664df | |||
| 3db7c1d3ae | |||
| 72e17e900e | |||
| 7b9a17f395 | |||
| 0a396dd504 | |||
| c6e8144116 | |||
| 760d2daf7f | |||
| ca38b992eb | |||
| ee9c7b4ab2 | |||
| 703a7c435c | |||
| 02ba727d05 | |||
| 76591bc89e | |||
| e35e51d295 | |||
| d2fe637743 | |||
| e1831aab7d | |||
| a3ef5a5655 | |||
| 5f943ce733 | |||
| 9964a919c3 | |||
| c11434344a | |||
| bc356848ef | |||
| 047427432f | |||
| d0d02e9ebf | |||
| 68342606e3 | |||
| afae7f42a1 | |||
| 4dd2721e98 | |||
| 99afe82464 | |||
| 8c44df94c0 | |||
| 42905cc547 | |||
| ec64551881 | |||
| e274b8295a | |||
| 3df6776111 | |||
| a347869353 | |||
| 8b4e13702e | |||
| 8fa4f3fbdf | |||
| c6cae040f0 | |||
| addc1d4087 | |||
| 225133a074 | |||
| e903e1b738 | |||
| 0c4dc02852 | |||
| 33e4291def | |||
| cedbce128e | |||
| 107dae0e95 | |||
| 23833e2c5b | |||
| f2b6917f1f | |||
| b57a8d3c65 | |||
| ac65e26eab | |||
| 6efa75dfe6 | |||
| de61e7653f | |||
| 98aa04256b | |||
| 5f81c51979 | |||
| 361b532766 | |||
| 9ef96661fc | |||
| 9375abded5 | |||
| 74ecdf238a | |||
| b85987e179 | |||
| 37d08c63b8 | |||
| 1482e96051 | |||
| cd6030a760 | |||
| 6378015726 | |||
| 430793cd09 | |||
| b270ed03ae | |||
| 1dde5f7b08 | |||
| a841c6f6a1 | |||
| 2045ccebb5 | |||
| efb4c8384d | |||
| 75fd042d74 | |||
| e776ef53ac | |||
| f996b38fa5 | |||
| 6d8ae3e811 | |||
| 376773a0cc | |||
| aae10c4d9d | |||
| 8730af146d | |||
| 7716ee0bff | |||
| 97e897c240 | |||
| c3762f189c | |||
| 078716754c | |||
| e43eae5afd | |||
| b537b5ef16 | |||
| acc591ff1e | |||
| e054997bb1 | |||
| e5414befa7 | |||
| 86926898ce | |||
| b1177540a1 |
5
.gitignore
vendored
@@ -10,4 +10,7 @@ __pycache__/
|
|||||||
node_modules/
|
node_modules/
|
||||||
dist/
|
dist/
|
||||||
|
|
||||||
*.sh
|
helper
|
||||||
|
db
|
||||||
|
report/build
|
||||||
|
.DS_Store
|
||||||
60
README.md
@@ -1,29 +1,49 @@
|
|||||||
# crosspost
|
# crosspost
|
||||||
**crosspost** is a browser-based tool designed to support *digital ethnography*, the study of how people interact, communicate, and form culture in online spaces such as forums, social media platforms, and comment-driven communities.
|
A web-based analytics platform for exploring online communities. Built as a final year CS project at UCC, crosspost ingests data from Reddit, YouTube, and Boards.ie, runs NLP analysis on it (emotion detection, topic classification, named entity recognition, stance markers), and surfaces the results through an interactive dashboard.
|
||||||
|
The motivating use case is digital ethnography — studying how people talk, what they talk about, and how culture forms in online spaces. The included dataset is centred on Cork, Ireland.
|
||||||
|
|
||||||
The project aims to make it easier for students, researchers, and journalists to collect, organise, and explore online discourse in a structured and ethical way, without requiring deep technical expertise.
|
## What it does
|
||||||
|
- Fetch posts and comments from Reddit, YouTube, and Boards.ie (or upload your own .jsonl file)
|
||||||
|
- Normalise everything into a unified schema regardless of source
|
||||||
|
- Run NLP analysis asynchronously in the background via Celery workers
|
||||||
|
- Explore results through a tabbed dashboard: temporal patterns, word clouds, emotion breakdowns, user activity, interaction graphs, topic clusters, and more
|
||||||
|
- Multi-user support — each user has their own datasets, isolated from everyone else
|
||||||
|
|
||||||
By combining data ingestion, analysis, and visualisation in a single system, crosspost turns raw online interactions into meaningful insights about how conversations emerge, evolve, and spread across platforms.
|
# Prerequisites
|
||||||
|
- Docker & Docker Compose
|
||||||
|
- A Reddit App (client id & secret)
|
||||||
|
- YouTube Data v3 API Key
|
||||||
|
|
||||||
## Goals for this project
|
# Setup
|
||||||
- Collect data ethically: enable users to link/upload text, images, and interaction data (messages etc) from specified online communities. Potentially and automated method for importing (using APIs or scraping techniques) could be included as well.
|
1) **Clone the Repo**
|
||||||
- Organise content: Store gathered material in a structured database with tagging for themes, dates, and sources.
|
```
|
||||||
Analyse patterns: Use natural language processing (NLP) to detect frequent keywords, sentiment, and interaction networks.
|
git clone https://github.com/your-username/crosspost.git
|
||||||
- Visualise insights: Present findings as charts, timelines, and network diagrams to reveal how conversations and topics evolve.
|
cd crosspost
|
||||||
- Have clearly stated and explained ethical and privacy guidelines for users. The student will design the architecture, implement data pipelines, integrate basic NLP models, and create an interactive dashboard.
|
```
|
||||||
|
|
||||||
Beyond programming, the project involves applying ethical research principles, handling data responsibly, and designing for non-technical users. By the end, the project will demonstrate how computer science can bridge technology and social research — turning raw online interactions into meaningful cultural insights.
|
2) **Configure Enviornment Vars**
|
||||||
|
```
|
||||||
|
cp example.env .env
|
||||||
|
```
|
||||||
|
Fill in each required empty env. Some are already filled in, these are sensible defaults that usually don't need to be changed
|
||||||
|
|
||||||
## Scope
|
3) **Start everything**
|
||||||
|
```
|
||||||
|
docker compose up -d
|
||||||
|
```
|
||||||
|
|
||||||
This project focuses on:
|
This starts:
|
||||||
- Designing a modular data ingestion pipeline
|
- `crosspost_db` — PostgreSQL on port 5432
|
||||||
- Implementing backend data processing and storage
|
- `crosspost_redis` — Redis on port 6379
|
||||||
- Integrating lightweight NLP-based analysis
|
- `crosspost_flask` — Flask API on port 5000
|
||||||
- Building a simple, accessible frontend for exploration and visualisation
|
- `crosspost_worker` — Celery worker for background NLP/fetching tasks
|
||||||
|
- `crosspost_frontend` — Vite dev server on port 5173
|
||||||
|
|
||||||
# Requirements
|
# Data Format for Manual Uploads
|
||||||
|
If you want to upload your own data rather than fetch it via the connectors, the expected format is newline-delimited JSON (.jsonl) where each line is a post object:
|
||||||
|
```json
|
||||||
|
{"id": "abc123", "author": "username", "title": "Post title", "content": "Post body", "url": "https://...", "timestamp": 1700000000.0, "source": "reddit", "comments": []}
|
||||||
|
```
|
||||||
|
|
||||||
- **Python** ≥ 3.9
|
# Notes
|
||||||
- **Python packages** listed in `requirements.txt`
|
- **GPU support**: The Celery worker is configured with `--pool=solo` to avoid memory conflicts when multiple NLP models are loaded. If you have an NVIDIA GPU, uncomment the deploy.resources block in docker-compose.yml and make sure the NVIDIA Container Toolkit is installed.
|
||||||
- npm ≥ version 11
|
|
||||||
@@ -28,7 +28,7 @@ services:
|
|||||||
- .env
|
- .env
|
||||||
ports:
|
ports:
|
||||||
- "5000:5000"
|
- "5000:5000"
|
||||||
command: flask --app server.app run --host=0.0.0.0 --debug
|
command: gunicorn server.app:app --bind 0.0.0.0:5000 --workers 2 --threads 4
|
||||||
depends_on:
|
depends_on:
|
||||||
- postgres
|
- postgres
|
||||||
- redis
|
- redis
|
||||||
@@ -69,4 +69,4 @@ services:
|
|||||||
- backend
|
- backend
|
||||||
|
|
||||||
volumes:
|
volumes:
|
||||||
model_cache:
|
model_cache:
|
||||||
|
|||||||
@@ -1,8 +0,0 @@
|
|||||||
# Generic User Data Transfer Object for social media platforms
|
|
||||||
class User:
|
|
||||||
def __init__(self, username: str, created_utc: int, ):
|
|
||||||
self.username = username
|
|
||||||
self.created_utc = created_utc
|
|
||||||
|
|
||||||
# Optionals
|
|
||||||
self.karma = None
|
|
||||||
20
example.env
@@ -1,13 +1,16 @@
|
|||||||
# API Keys
|
# API Keys
|
||||||
YOUTUBE_API_KEY=
|
YOUTUBE_API_KEY=
|
||||||
|
REDDIT_CLIENT_ID=
|
||||||
|
REDDIT_CLIENT_SECRET=
|
||||||
|
|
||||||
# Database
|
# Database
|
||||||
POSTGRES_USER=
|
# Database
|
||||||
POSTGRES_PASSWORD=
|
POSTGRES_USER=postgres
|
||||||
POSTGRES_DB=
|
POSTGRES_PASSWORD=postgres
|
||||||
POSTGRES_HOST=
|
POSTGRES_DB=mydatabase
|
||||||
|
POSTGRES_HOST=postgres
|
||||||
POSTGRES_PORT=5432
|
POSTGRES_PORT=5432
|
||||||
POSTGRES_DIR=
|
POSTGRES_DIR=./db
|
||||||
|
|
||||||
# JWT
|
# JWT
|
||||||
JWT_SECRET_KEY=
|
JWT_SECRET_KEY=
|
||||||
@@ -18,5 +21,10 @@ HF_HOME=/models/huggingface
|
|||||||
TRANSFORMERS_CACHE=/models/huggingface
|
TRANSFORMERS_CACHE=/models/huggingface
|
||||||
TORCH_HOME=/models/torch
|
TORCH_HOME=/models/torch
|
||||||
|
|
||||||
# Frontend
|
# URLs
|
||||||
FRONTEND_URL=http://localhost:5173
|
FRONTEND_URL=http://localhost:5173
|
||||||
|
BACKEND_URL=http://backend:5000
|
||||||
|
REDIS_URL=redis://redis:6379/0
|
||||||
|
|
||||||
|
# API & Scraping
|
||||||
|
MAX_FETCH_LIMIT=1000
|
||||||
@@ -10,4 +10,4 @@ COPY . .
|
|||||||
|
|
||||||
EXPOSE 5173
|
EXPOSE 5173
|
||||||
|
|
||||||
CMD ["npm", "run", "dev", "--", "--host"]
|
CMD ["npm", "run", "dev", "--", "--host", "0.0.0.0"]
|
||||||
@@ -5,7 +5,7 @@ import DatasetsPage from "./pages/Datasets";
|
|||||||
import DatasetStatusPage from "./pages/DatasetStatus";
|
import DatasetStatusPage from "./pages/DatasetStatus";
|
||||||
import LoginPage from "./pages/Login";
|
import LoginPage from "./pages/Login";
|
||||||
import UploadPage from "./pages/Upload";
|
import UploadPage from "./pages/Upload";
|
||||||
import AutoScrapePage from "./pages/AutoScrape";
|
import AutoFetchPage from "./pages/AutoFetch";
|
||||||
import StatPage from "./pages/Stats";
|
import StatPage from "./pages/Stats";
|
||||||
import { getDocumentTitle } from "./utils/documentTitle";
|
import { getDocumentTitle } from "./utils/documentTitle";
|
||||||
import DatasetEditPage from "./pages/DatasetEdit";
|
import DatasetEditPage from "./pages/DatasetEdit";
|
||||||
@@ -23,7 +23,7 @@ function App() {
|
|||||||
<Route path="/" element={<Navigate to="/login" replace />} />
|
<Route path="/" element={<Navigate to="/login" replace />} />
|
||||||
<Route path="/login" element={<LoginPage />} />
|
<Route path="/login" element={<LoginPage />} />
|
||||||
<Route path="/upload" element={<UploadPage />} />
|
<Route path="/upload" element={<UploadPage />} />
|
||||||
<Route path="/auto-scrape" element={<AutoScrapePage />} />
|
<Route path="/auto-fetch" element={<AutoFetchPage />} />
|
||||||
<Route path="/datasets" element={<DatasetsPage />} />
|
<Route path="/datasets" element={<DatasetsPage />} />
|
||||||
<Route path="/dataset/:datasetId/status" element={<DatasetStatusPage />} />
|
<Route path="/dataset/:datasetId/status" element={<DatasetStatusPage />} />
|
||||||
<Route path="/dataset/:datasetId/stats" element={<StatPage />} />
|
<Route path="/dataset/:datasetId/stats" element={<StatPage />} />
|
||||||
|
|||||||
@@ -3,7 +3,7 @@ import axios from "axios";
|
|||||||
import { Outlet, useLocation, useNavigate } from "react-router-dom";
|
import { Outlet, useLocation, useNavigate } from "react-router-dom";
|
||||||
import StatsStyling from "../styles/stats_styling";
|
import StatsStyling from "../styles/stats_styling";
|
||||||
|
|
||||||
const API_BASE_URL = import.meta.env.VITE_BACKEND_URL
|
const API_BASE_URL = import.meta.env.VITE_BACKEND_URL;
|
||||||
|
|
||||||
type ProfileResponse = {
|
type ProfileResponse = {
|
||||||
user?: Record<string, unknown>;
|
user?: Record<string, unknown>;
|
||||||
@@ -33,7 +33,10 @@ const AppLayout = () => {
|
|||||||
const location = useLocation();
|
const location = useLocation();
|
||||||
const navigate = useNavigate();
|
const navigate = useNavigate();
|
||||||
const [isSignedIn, setIsSignedIn] = useState(false);
|
const [isSignedIn, setIsSignedIn] = useState(false);
|
||||||
const [currentUser, setCurrentUser] = useState<Record<string, unknown> | null>(null);
|
const [currentUser, setCurrentUser] = useState<Record<
|
||||||
|
string,
|
||||||
|
unknown
|
||||||
|
> | null>(null);
|
||||||
|
|
||||||
const syncAuthState = useCallback(async () => {
|
const syncAuthState = useCallback(async () => {
|
||||||
const token = localStorage.getItem("access_token");
|
const token = localStorage.getItem("access_token");
|
||||||
@@ -48,7 +51,9 @@ const AppLayout = () => {
|
|||||||
axios.defaults.headers.common.Authorization = `Bearer ${token}`;
|
axios.defaults.headers.common.Authorization = `Bearer ${token}`;
|
||||||
|
|
||||||
try {
|
try {
|
||||||
const response = await axios.get<ProfileResponse>(`${API_BASE_URL}/profile`);
|
const response = await axios.get<ProfileResponse>(
|
||||||
|
`${API_BASE_URL}/profile`,
|
||||||
|
);
|
||||||
setIsSignedIn(true);
|
setIsSignedIn(true);
|
||||||
setCurrentUser(response.data.user ?? null);
|
setCurrentUser(response.data.user ?? null);
|
||||||
} catch {
|
} catch {
|
||||||
@@ -81,27 +86,35 @@ const AppLayout = () => {
|
|||||||
<div style={{ ...styles.container, ...styles.appHeaderWrap }}>
|
<div style={{ ...styles.container, ...styles.appHeaderWrap }}>
|
||||||
<div style={{ ...styles.card, ...styles.headerBar }}>
|
<div style={{ ...styles.card, ...styles.headerBar }}>
|
||||||
<div style={styles.appHeaderBrandRow}>
|
<div style={styles.appHeaderBrandRow}>
|
||||||
<span style={styles.appTitle}>
|
<span style={styles.appTitle}>CrossPost Analysis Engine</span>
|
||||||
CrossPost Analysis Engine
|
|
||||||
</span>
|
|
||||||
<span
|
<span
|
||||||
style={{
|
style={{
|
||||||
...styles.authStatusBadge,
|
...styles.authStatusBadge,
|
||||||
...(isSignedIn ? styles.authStatusSignedIn : styles.authStatusSignedOut),
|
...(isSignedIn
|
||||||
|
? styles.authStatusSignedIn
|
||||||
|
: styles.authStatusSignedOut),
|
||||||
}}
|
}}
|
||||||
>
|
>
|
||||||
{isSignedIn ? `Signed in: ${getUserLabel(currentUser)}` : "Not signed in"}
|
{isSignedIn
|
||||||
|
? `Signed in: ${getUserLabel(currentUser)}`
|
||||||
|
: "Not signed in"}
|
||||||
</span>
|
</span>
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
<div style={styles.controlsWrapped}>
|
<div style={styles.controlsWrapped}>
|
||||||
{isSignedIn && <button
|
{isSignedIn && (
|
||||||
type="button"
|
<button
|
||||||
style={location.pathname === "/datasets" ? styles.buttonPrimary : styles.buttonSecondary}
|
type="button"
|
||||||
onClick={() => navigate("/datasets")}
|
style={
|
||||||
>
|
location.pathname === "/datasets"
|
||||||
My datasets
|
? styles.buttonPrimary
|
||||||
</button>}
|
: styles.buttonSecondary
|
||||||
|
}
|
||||||
|
onClick={() => navigate("/datasets")}
|
||||||
|
>
|
||||||
|
My datasets
|
||||||
|
</button>
|
||||||
|
)}
|
||||||
|
|
||||||
<button
|
<button
|
||||||
type="button"
|
type="button"
|
||||||
|
|||||||
@@ -8,20 +8,20 @@ const Card = (props: {
|
|||||||
value: string | number;
|
value: string | number;
|
||||||
sublabel?: string;
|
sublabel?: string;
|
||||||
rightSlot?: React.ReactNode;
|
rightSlot?: React.ReactNode;
|
||||||
style?: CSSProperties
|
style?: CSSProperties;
|
||||||
}) => {
|
}) => {
|
||||||
return (
|
return (
|
||||||
<div style={{ ...styles.cardBase, ...props.style }}>
|
<div style={{ ...styles.cardBase, ...props.style }}>
|
||||||
<div style={styles.cardTopRow}>
|
<div style={styles.cardTopRow}>
|
||||||
<div style={styles.cardLabel}>
|
<div style={styles.cardLabel}>{props.label}</div>
|
||||||
{props.label}
|
|
||||||
</div>
|
|
||||||
{props.rightSlot ? <div>{props.rightSlot}</div> : null}
|
{props.rightSlot ? <div>{props.rightSlot}</div> : null}
|
||||||
</div>
|
</div>
|
||||||
<div style={styles.cardValue}>{props.value}</div>
|
<div style={styles.cardValue}>{props.value}</div>
|
||||||
{props.sublabel ? <div style={styles.cardSubLabel}>{props.sublabel}</div> : null}
|
{props.sublabel ? (
|
||||||
|
<div style={styles.cardSubLabel}>{props.sublabel}</div>
|
||||||
|
) : null}
|
||||||
</div>
|
</div>
|
||||||
);
|
);
|
||||||
}
|
};
|
||||||
|
|
||||||
export default Card;
|
export default Card;
|
||||||
|
|||||||
@@ -34,10 +34,20 @@ export default function ConfirmationModal({
|
|||||||
<p style={styles.sectionSubtitle}>{message}</p>
|
<p style={styles.sectionSubtitle}>{message}</p>
|
||||||
|
|
||||||
<div style={{ display: "flex", justifyContent: "flex-end", gap: 8 }}>
|
<div style={{ display: "flex", justifyContent: "flex-end", gap: 8 }}>
|
||||||
<button type="button" onClick={onCancel} style={styles.buttonSecondary} disabled={loading}>
|
<button
|
||||||
|
type="button"
|
||||||
|
onClick={onCancel}
|
||||||
|
style={styles.buttonSecondary}
|
||||||
|
disabled={loading}
|
||||||
|
>
|
||||||
{cancelLabel}
|
{cancelLabel}
|
||||||
</button>
|
</button>
|
||||||
<button type="button" onClick={onConfirm} style={styles.buttonDanger} disabled={loading}>
|
<button
|
||||||
|
type="button"
|
||||||
|
onClick={onConfirm}
|
||||||
|
style={styles.buttonDanger}
|
||||||
|
disabled={loading}
|
||||||
|
>
|
||||||
{loading ? "Deleting..." : confirmLabel}
|
{loading ? "Deleting..." : confirmLabel}
|
||||||
</button>
|
</button>
|
||||||
</div>
|
</div>
|
||||||
|
|||||||
247
frontend/src/components/CorpusExplorer.tsx
Normal file
@@ -0,0 +1,247 @@
|
|||||||
|
import { useEffect, useState } from "react";
|
||||||
|
import { Dialog, DialogPanel, DialogTitle } from "@headlessui/react";
|
||||||
|
|
||||||
|
import StatsStyling from "../styles/stats_styling";
|
||||||
|
import type { DatasetRecord } from "../utils/corpusExplorer";
|
||||||
|
|
||||||
|
const styles = StatsStyling;
|
||||||
|
const INITIAL_RECORD_COUNT = 60;
|
||||||
|
const RECORD_BATCH_SIZE = 60;
|
||||||
|
const EXCERPT_LENGTH = 320;
|
||||||
|
|
||||||
|
const cleanText = (value: unknown) => {
|
||||||
|
if (typeof value !== "string") {
|
||||||
|
return "";
|
||||||
|
}
|
||||||
|
|
||||||
|
const trimmed = value.trim();
|
||||||
|
if (!trimmed) {
|
||||||
|
return "";
|
||||||
|
}
|
||||||
|
|
||||||
|
const lowered = trimmed.toLowerCase();
|
||||||
|
if (lowered === "nan" || lowered === "null" || lowered === "undefined") {
|
||||||
|
return "";
|
||||||
|
}
|
||||||
|
|
||||||
|
return trimmed;
|
||||||
|
};
|
||||||
|
|
||||||
|
const displayText = (value: unknown, fallback: string) => {
|
||||||
|
const cleaned = cleanText(value);
|
||||||
|
return cleaned || fallback;
|
||||||
|
};
|
||||||
|
|
||||||
|
type CorpusExplorerProps = {
|
||||||
|
open: boolean;
|
||||||
|
onClose: () => void;
|
||||||
|
title: string;
|
||||||
|
description: string;
|
||||||
|
records: DatasetRecord[];
|
||||||
|
loading: boolean;
|
||||||
|
error: string;
|
||||||
|
emptyMessage: string;
|
||||||
|
};
|
||||||
|
|
||||||
|
const formatRecordDate = (record: DatasetRecord) => {
|
||||||
|
if (typeof record.dt === "string" && record.dt) {
|
||||||
|
const date = new Date(record.dt);
|
||||||
|
if (!Number.isNaN(date.getTime())) {
|
||||||
|
return date.toLocaleString();
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
if (typeof record.date === "string" && record.date) {
|
||||||
|
return record.date;
|
||||||
|
}
|
||||||
|
|
||||||
|
if (typeof record.timestamp === "number") {
|
||||||
|
return new Date(record.timestamp * 1000).toLocaleString();
|
||||||
|
}
|
||||||
|
|
||||||
|
return "Unknown time";
|
||||||
|
};
|
||||||
|
|
||||||
|
const getRecordKey = (record: DatasetRecord, index: number) =>
|
||||||
|
String(record.id ?? record.post_id ?? `${record.author ?? "record"}-${index}`);
|
||||||
|
|
||||||
|
const getRecordTitle = (record: DatasetRecord) => {
|
||||||
|
if (record.type === "comment") {
|
||||||
|
return "";
|
||||||
|
}
|
||||||
|
|
||||||
|
const title = cleanText(record.title);
|
||||||
|
if (title) {
|
||||||
|
return title;
|
||||||
|
}
|
||||||
|
|
||||||
|
const content = cleanText(record.content);
|
||||||
|
if (!content) {
|
||||||
|
return "Untitled record";
|
||||||
|
}
|
||||||
|
|
||||||
|
return content.length > 120 ? `${content.slice(0, 117)}...` : content;
|
||||||
|
};
|
||||||
|
|
||||||
|
const CorpusExplorer = ({
|
||||||
|
open,
|
||||||
|
onClose,
|
||||||
|
title,
|
||||||
|
description,
|
||||||
|
records,
|
||||||
|
loading,
|
||||||
|
error,
|
||||||
|
emptyMessage,
|
||||||
|
}: CorpusExplorerProps) => {
|
||||||
|
const [visibleCount, setVisibleCount] = useState(INITIAL_RECORD_COUNT);
|
||||||
|
const [expandedKeys, setExpandedKeys] = useState<Record<string, boolean>>({});
|
||||||
|
|
||||||
|
useEffect(() => {
|
||||||
|
if (open) {
|
||||||
|
setVisibleCount(INITIAL_RECORD_COUNT);
|
||||||
|
setExpandedKeys({});
|
||||||
|
}
|
||||||
|
}, [open, title, records.length]);
|
||||||
|
|
||||||
|
const hasMoreRecords = visibleCount < records.length;
|
||||||
|
|
||||||
|
return (
|
||||||
|
<Dialog open={open} onClose={onClose} style={styles.modalRoot}>
|
||||||
|
<div style={styles.modalBackdrop} />
|
||||||
|
|
||||||
|
<div style={styles.modalContainer}>
|
||||||
|
<DialogPanel
|
||||||
|
style={{
|
||||||
|
...styles.card,
|
||||||
|
...styles.modalPanel,
|
||||||
|
width: "min(960px, 96vw)",
|
||||||
|
maxHeight: "88vh",
|
||||||
|
display: "flex",
|
||||||
|
flexDirection: "column",
|
||||||
|
gap: 12,
|
||||||
|
overflow: "hidden",
|
||||||
|
}}
|
||||||
|
>
|
||||||
|
<div style={styles.headerBar}>
|
||||||
|
<div style={{ minWidth: 0 }}>
|
||||||
|
<DialogTitle style={styles.sectionTitle}>{title}</DialogTitle>
|
||||||
|
<p style={styles.sectionSubtitle}>
|
||||||
|
{description} {loading ? "Loading records..." : `${records.length.toLocaleString()} records.`}
|
||||||
|
</p>
|
||||||
|
</div>
|
||||||
|
|
||||||
|
<button onClick={onClose} style={styles.buttonSecondary}>
|
||||||
|
Close
|
||||||
|
</button>
|
||||||
|
</div>
|
||||||
|
|
||||||
|
{error ? <p style={styles.sectionSubtitle}>{error}</p> : null}
|
||||||
|
|
||||||
|
{!loading && !error && !records.length ? (
|
||||||
|
<p style={styles.sectionSubtitle}>{emptyMessage}</p>
|
||||||
|
) : null}
|
||||||
|
|
||||||
|
{loading ? <div style={styles.topUserMeta}>Preparing corpus slice...</div> : null}
|
||||||
|
|
||||||
|
{!loading && !error && records.length ? (
|
||||||
|
<>
|
||||||
|
<div
|
||||||
|
style={{
|
||||||
|
...styles.topUsersList,
|
||||||
|
overflowY: "auto",
|
||||||
|
overflowX: "hidden",
|
||||||
|
paddingRight: 4,
|
||||||
|
}}
|
||||||
|
>
|
||||||
|
{records.slice(0, visibleCount).map((record, index) => {
|
||||||
|
const recordKey = getRecordKey(record, index);
|
||||||
|
const titleText = getRecordTitle(record);
|
||||||
|
const content = cleanText(record.content);
|
||||||
|
const isExpanded = !!expandedKeys[recordKey];
|
||||||
|
const canExpand = content.length > EXCERPT_LENGTH;
|
||||||
|
const excerpt =
|
||||||
|
canExpand && !isExpanded
|
||||||
|
? `${content.slice(0, EXCERPT_LENGTH - 3)}...`
|
||||||
|
: content || "No content available.";
|
||||||
|
|
||||||
|
return (
|
||||||
|
<div key={recordKey} style={styles.topUserItem}>
|
||||||
|
<div style={{ ...styles.headerBar, alignItems: "flex-start" }}>
|
||||||
|
<div style={{ minWidth: 0, flex: 1 }}>
|
||||||
|
{titleText ? <div style={styles.topUserName}>{titleText}</div> : null}
|
||||||
|
<div
|
||||||
|
style={{
|
||||||
|
...styles.topUserMeta,
|
||||||
|
overflowWrap: "anywhere",
|
||||||
|
wordBreak: "break-word",
|
||||||
|
}}
|
||||||
|
>
|
||||||
|
{displayText(record.author, "Unknown author")} • {displayText(record.source, "Unknown source")} • {displayText(record.type, "record")} • {formatRecordDate(record)}
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
<div
|
||||||
|
style={{
|
||||||
|
...styles.topUserMeta,
|
||||||
|
marginLeft: 12,
|
||||||
|
textAlign: "right",
|
||||||
|
overflowWrap: "anywhere",
|
||||||
|
wordBreak: "break-word",
|
||||||
|
}}
|
||||||
|
>
|
||||||
|
{cleanText(record.topic) ? `Topic: ${cleanText(record.topic)}` : ""}
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
|
||||||
|
<div
|
||||||
|
style={{
|
||||||
|
...styles.topUserMeta,
|
||||||
|
marginTop: 8,
|
||||||
|
whiteSpace: "pre-wrap",
|
||||||
|
overflowWrap: "anywhere",
|
||||||
|
wordBreak: "break-word",
|
||||||
|
}}
|
||||||
|
>
|
||||||
|
{excerpt}
|
||||||
|
</div>
|
||||||
|
|
||||||
|
{canExpand ? (
|
||||||
|
<div style={{ marginTop: 10 }}>
|
||||||
|
<button
|
||||||
|
onClick={() =>
|
||||||
|
setExpandedKeys((current) => ({
|
||||||
|
...current,
|
||||||
|
[recordKey]: !current[recordKey],
|
||||||
|
}))
|
||||||
|
}
|
||||||
|
style={styles.buttonSecondary}
|
||||||
|
>
|
||||||
|
{isExpanded ? "Show Less" : "Show More"}
|
||||||
|
</button>
|
||||||
|
</div>
|
||||||
|
) : null}
|
||||||
|
</div>
|
||||||
|
);
|
||||||
|
})}
|
||||||
|
</div>
|
||||||
|
|
||||||
|
{hasMoreRecords ? (
|
||||||
|
<div style={{ display: "flex", justifyContent: "center" }}>
|
||||||
|
<button
|
||||||
|
onClick={() =>
|
||||||
|
setVisibleCount((current) => current + RECORD_BATCH_SIZE)
|
||||||
|
}
|
||||||
|
style={styles.buttonSecondary}
|
||||||
|
>
|
||||||
|
Show More Records
|
||||||
|
</button>
|
||||||
|
</div>
|
||||||
|
) : null}
|
||||||
|
</>
|
||||||
|
) : null}
|
||||||
|
</DialogPanel>
|
||||||
|
</div>
|
||||||
|
</Dialog>
|
||||||
|
);
|
||||||
|
};
|
||||||
|
|
||||||
|
export default CorpusExplorer;
|
||||||
@@ -1,110 +1,240 @@
|
|||||||
import Card from "./Card";
|
import Card from "./Card";
|
||||||
import StatsStyling from "../styles/stats_styling";
|
import StatsStyling from "../styles/stats_styling";
|
||||||
import type { CulturalAnalysisResponse } from "../types/ApiTypes";
|
import type { CulturalAnalysisResponse } from "../types/ApiTypes";
|
||||||
|
import {
|
||||||
|
buildCertaintySpec,
|
||||||
|
buildDeonticSpec,
|
||||||
|
buildEntitySpec,
|
||||||
|
buildHedgeSpec,
|
||||||
|
buildIdentityBucketSpec,
|
||||||
|
buildPermissionSpec,
|
||||||
|
type CorpusExplorerSpec,
|
||||||
|
} from "../utils/corpusExplorer";
|
||||||
|
|
||||||
const styles = StatsStyling;
|
const styles = StatsStyling;
|
||||||
|
const exploreButtonStyle = { padding: "4px 8px", fontSize: 12 };
|
||||||
|
|
||||||
type CulturalStatsProps = {
|
type CulturalStatsProps = {
|
||||||
data: CulturalAnalysisResponse;
|
data: CulturalAnalysisResponse;
|
||||||
|
onExplore: (spec: CorpusExplorerSpec) => void;
|
||||||
};
|
};
|
||||||
|
|
||||||
const CulturalStats = ({ data }: CulturalStatsProps) => {
|
const renderExploreButton = (onClick: () => void) => (
|
||||||
|
<button
|
||||||
|
onClick={onClick}
|
||||||
|
style={{ ...styles.buttonSecondary, ...exploreButtonStyle }}
|
||||||
|
>
|
||||||
|
Explore
|
||||||
|
</button>
|
||||||
|
);
|
||||||
|
|
||||||
|
const CulturalStats = ({ data, onExplore }: CulturalStatsProps) => {
|
||||||
const identity = data.identity_markers;
|
const identity = data.identity_markers;
|
||||||
const stance = data.stance_markers;
|
const stance = data.stance_markers;
|
||||||
|
const inGroupWords = identity?.in_group_usage ?? 0;
|
||||||
|
const outGroupWords = identity?.out_group_usage ?? 0;
|
||||||
|
const totalGroupWords = inGroupWords + outGroupWords;
|
||||||
|
const inGroupWordRate =
|
||||||
|
typeof identity?.in_group_ratio === "number"
|
||||||
|
? identity.in_group_ratio * 100
|
||||||
|
: null;
|
||||||
|
const outGroupWordRate =
|
||||||
|
typeof identity?.out_group_ratio === "number"
|
||||||
|
? identity.out_group_ratio * 100
|
||||||
|
: null;
|
||||||
const rawEntities = data.avg_emotion_per_entity?.entity_emotion_avg ?? {};
|
const rawEntities = data.avg_emotion_per_entity?.entity_emotion_avg ?? {};
|
||||||
const entities = Object.entries(rawEntities)
|
const entities = Object.entries(rawEntities)
|
||||||
.sort((a, b) => (b[1].post_count - a[1].post_count))
|
.sort((a, b) => b[1].post_count - a[1].post_count)
|
||||||
.slice(0, 20);
|
.slice(0, 20);
|
||||||
|
|
||||||
const topEmotion = (emotionAvg: Record<string, number> | undefined) => {
|
const topEmotion = (emotionAvg: Record<string, number> | undefined) => {
|
||||||
const entries = Object.entries(emotionAvg ?? {});
|
const entries = Object.entries(emotionAvg ?? {});
|
||||||
if (!entries.length) {
|
if (!entries.length) {
|
||||||
return "—";
|
return "-";
|
||||||
}
|
}
|
||||||
|
|
||||||
entries.sort((a, b) => b[1] - a[1]);
|
entries.sort((a, b) => b[1] - a[1]);
|
||||||
const dominant = entries[0] ?? ["emotion_unknown", 0];
|
const dominant = entries[0] ?? ["emotion_unknown", 0];
|
||||||
const dominantLabel = dominant[0].replace("emotion_", "");
|
const dominantLabel = dominant[0].replace("emotion_", "");
|
||||||
return `${dominantLabel} (${dominant[1].toFixed(3)})`;
|
return `${dominantLabel} (${(dominant[1] * 100).toFixed(1)}%)`;
|
||||||
};
|
};
|
||||||
|
|
||||||
return (
|
return (
|
||||||
<div style={styles.page}>
|
<div style={styles.page}>
|
||||||
<div style={{ ...styles.container, ...styles.grid }}>
|
<div style={{ ...styles.container, ...styles.grid }}>
|
||||||
|
<div style={{ ...styles.card, gridColumn: "span 12" }}>
|
||||||
|
<h2 style={styles.sectionTitle}>Community Framing Overview</h2>
|
||||||
|
<p style={styles.sectionSubtitle}>
|
||||||
|
Simple view of how often people use "us" words vs "them" words, and
|
||||||
|
the tone around that language.
|
||||||
|
</p>
|
||||||
|
</div>
|
||||||
|
|
||||||
<Card
|
<Card
|
||||||
label="In-Group Usage"
|
label="In-Group Words"
|
||||||
value={identity?.in_group_usage?.toLocaleString() ?? "—"}
|
value={inGroupWords.toLocaleString()}
|
||||||
sublabel="we/us/our references"
|
sublabel="Times we/us/our appears"
|
||||||
style={{ gridColumn: "span 3" }}
|
style={{ gridColumn: "span 3" }}
|
||||||
/>
|
/>
|
||||||
<Card
|
<Card
|
||||||
label="Out-Group Usage"
|
label="Out-Group Words"
|
||||||
value={identity?.out_group_usage?.toLocaleString() ?? "—"}
|
value={outGroupWords.toLocaleString()}
|
||||||
sublabel="they/them/their references"
|
sublabel="Times they/them/their appears"
|
||||||
style={{ gridColumn: "span 3" }}
|
style={{ gridColumn: "span 3" }}
|
||||||
/>
|
/>
|
||||||
<Card
|
<Card
|
||||||
label="In-Group Posts"
|
label="In-Group Posts"
|
||||||
value={identity?.in_group_posts?.toLocaleString() ?? "—"}
|
value={identity?.in_group_posts?.toLocaleString() ?? "-"}
|
||||||
sublabel="Posts with stronger in-group language"
|
sublabel='Posts leaning toward "us" language'
|
||||||
|
rightSlot={renderExploreButton(() =>
|
||||||
|
onExplore(buildIdentityBucketSpec("in")),
|
||||||
|
)}
|
||||||
style={{ gridColumn: "span 3" }}
|
style={{ gridColumn: "span 3" }}
|
||||||
/>
|
/>
|
||||||
<Card
|
<Card
|
||||||
label="Out-Group Posts"
|
label="Out-Group Posts"
|
||||||
value={identity?.out_group_posts?.toLocaleString() ?? "—"}
|
value={identity?.out_group_posts?.toLocaleString() ?? "-"}
|
||||||
sublabel="Posts with stronger out-group language"
|
sublabel='Posts leaning toward "them" language'
|
||||||
|
rightSlot={renderExploreButton(() =>
|
||||||
|
onExplore(buildIdentityBucketSpec("out")),
|
||||||
|
)}
|
||||||
style={{ gridColumn: "span 3" }}
|
style={{ gridColumn: "span 3" }}
|
||||||
/>
|
/>
|
||||||
|
|
||||||
<Card
|
<Card
|
||||||
label="Hedge Markers"
|
label="Balanced Posts"
|
||||||
value={stance?.hedge_total?.toLocaleString() ?? "—"}
|
value={identity?.tie_posts?.toLocaleString() ?? "-"}
|
||||||
sublabel={typeof stance?.hedge_per_1k_tokens === "number" ? `${stance.hedge_per_1k_tokens.toFixed(3)} per 1k tokens` : "Marker frequency"}
|
sublabel="Posts with equal us/them signals"
|
||||||
|
rightSlot={renderExploreButton(() =>
|
||||||
|
onExplore(buildIdentityBucketSpec("tie")),
|
||||||
|
)}
|
||||||
style={{ gridColumn: "span 3" }}
|
style={{ gridColumn: "span 3" }}
|
||||||
/>
|
/>
|
||||||
<Card
|
<Card
|
||||||
label="Certainty Markers"
|
label="Total Group Words"
|
||||||
value={stance?.certainty_total?.toLocaleString() ?? "—"}
|
value={totalGroupWords.toLocaleString()}
|
||||||
sublabel={typeof stance?.certainty_per_1k_tokens === "number" ? `${stance.certainty_per_1k_tokens.toFixed(3)} per 1k tokens` : "Marker frequency"}
|
sublabel="In-group + out-group words"
|
||||||
style={{ gridColumn: "span 3" }}
|
style={{ gridColumn: "span 3" }}
|
||||||
/>
|
/>
|
||||||
<Card
|
<Card
|
||||||
label="Deontic Markers"
|
label="In-Group Share"
|
||||||
value={stance?.deontic_total?.toLocaleString() ?? "—"}
|
value={
|
||||||
sublabel={typeof stance?.deontic_per_1k_tokens === "number" ? `${stance.deontic_per_1k_tokens.toFixed(3)} per 1k tokens` : "Marker frequency"}
|
inGroupWordRate === null ? "-" : `${inGroupWordRate.toFixed(2)}%`
|
||||||
|
}
|
||||||
|
sublabel="Share of all words"
|
||||||
style={{ gridColumn: "span 3" }}
|
style={{ gridColumn: "span 3" }}
|
||||||
/>
|
/>
|
||||||
<Card
|
<Card
|
||||||
label="Permission Markers"
|
label="Out-Group Share"
|
||||||
value={stance?.permission_total?.toLocaleString() ?? "—"}
|
value={
|
||||||
sublabel={typeof stance?.permission_per_1k_tokens === "number" ? `${stance.permission_per_1k_tokens.toFixed(3)} per 1k tokens` : "Marker frequency"}
|
outGroupWordRate === null ? "-" : `${outGroupWordRate.toFixed(2)}%`
|
||||||
|
}
|
||||||
|
sublabel="Share of all words"
|
||||||
|
style={{ gridColumn: "span 3" }}
|
||||||
|
/>
|
||||||
|
|
||||||
|
<Card
|
||||||
|
label="Hedging Words"
|
||||||
|
value={stance?.hedge_total?.toLocaleString() ?? "-"}
|
||||||
|
sublabel={
|
||||||
|
typeof stance?.hedge_per_1k_tokens === "number"
|
||||||
|
? `${stance.hedge_per_1k_tokens.toFixed(1)} per 1k words`
|
||||||
|
: "Word frequency"
|
||||||
|
}
|
||||||
|
rightSlot={renderExploreButton(() => onExplore(buildHedgeSpec()))}
|
||||||
|
style={{ gridColumn: "span 3" }}
|
||||||
|
/>
|
||||||
|
<Card
|
||||||
|
label="Certainty Words"
|
||||||
|
value={stance?.certainty_total?.toLocaleString() ?? "-"}
|
||||||
|
sublabel={
|
||||||
|
typeof stance?.certainty_per_1k_tokens === "number"
|
||||||
|
? `${stance.certainty_per_1k_tokens.toFixed(1)} per 1k words`
|
||||||
|
: "Word frequency"
|
||||||
|
}
|
||||||
|
rightSlot={renderExploreButton(() => onExplore(buildCertaintySpec()))}
|
||||||
|
style={{ gridColumn: "span 3" }}
|
||||||
|
/>
|
||||||
|
<Card
|
||||||
|
label="Need/Should Words"
|
||||||
|
value={stance?.deontic_total?.toLocaleString() ?? "-"}
|
||||||
|
sublabel={
|
||||||
|
typeof stance?.deontic_per_1k_tokens === "number"
|
||||||
|
? `${stance.deontic_per_1k_tokens.toFixed(1)} per 1k words`
|
||||||
|
: "Word frequency"
|
||||||
|
}
|
||||||
|
rightSlot={renderExploreButton(() => onExplore(buildDeonticSpec()))}
|
||||||
|
style={{ gridColumn: "span 3" }}
|
||||||
|
/>
|
||||||
|
<Card
|
||||||
|
label="Permission Words"
|
||||||
|
value={stance?.permission_total?.toLocaleString() ?? "-"}
|
||||||
|
sublabel={
|
||||||
|
typeof stance?.permission_per_1k_tokens === "number"
|
||||||
|
? `${stance.permission_per_1k_tokens.toFixed(1)} per 1k words`
|
||||||
|
: "Word frequency"
|
||||||
|
}
|
||||||
|
rightSlot={renderExploreButton(() => onExplore(buildPermissionSpec()))}
|
||||||
style={{ gridColumn: "span 3" }}
|
style={{ gridColumn: "span 3" }}
|
||||||
/>
|
/>
|
||||||
|
|
||||||
<div style={{ ...styles.card, gridColumn: "span 6" }}>
|
<div style={{ ...styles.card, gridColumn: "span 6" }}>
|
||||||
<h2 style={styles.sectionTitle}>In-Group Emotion Profile</h2>
|
<h2 style={styles.sectionTitle}>Mood in "Us" Posts</h2>
|
||||||
<p style={styles.sectionSubtitle}>Dominant average emotion where in-group framing is stronger.</p>
|
<p style={styles.sectionSubtitle}>
|
||||||
|
Most likely emotion when in-group wording is stronger.
|
||||||
|
</p>
|
||||||
<div style={styles.topUserName}>{topEmotion(identity?.in_group_emotion_avg)}</div>
|
<div style={styles.topUserName}>{topEmotion(identity?.in_group_emotion_avg)}</div>
|
||||||
|
<div style={{ marginTop: 12 }}>
|
||||||
|
<button
|
||||||
|
onClick={() => onExplore(buildIdentityBucketSpec("in"))}
|
||||||
|
style={styles.buttonSecondary}
|
||||||
|
>
|
||||||
|
Explore records
|
||||||
|
</button>
|
||||||
|
</div>
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
<div style={{ ...styles.card, gridColumn: "span 6" }}>
|
<div style={{ ...styles.card, gridColumn: "span 6" }}>
|
||||||
<h2 style={styles.sectionTitle}>Out-Group Emotion Profile</h2>
|
<h2 style={styles.sectionTitle}>Mood in "Them" Posts</h2>
|
||||||
<p style={styles.sectionSubtitle}>Dominant average emotion where out-group framing is stronger.</p>
|
<p style={styles.sectionSubtitle}>
|
||||||
|
Most likely emotion when out-group wording is stronger.
|
||||||
|
</p>
|
||||||
<div style={styles.topUserName}>{topEmotion(identity?.out_group_emotion_avg)}</div>
|
<div style={styles.topUserName}>{topEmotion(identity?.out_group_emotion_avg)}</div>
|
||||||
|
<div style={{ marginTop: 12 }}>
|
||||||
|
<button
|
||||||
|
onClick={() => onExplore(buildIdentityBucketSpec("out"))}
|
||||||
|
style={styles.buttonSecondary}
|
||||||
|
>
|
||||||
|
Explore records
|
||||||
|
</button>
|
||||||
|
</div>
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
<div style={{ ...styles.card, gridColumn: "span 12" }}>
|
<div style={{ ...styles.card, gridColumn: "span 12" }}>
|
||||||
<h2 style={styles.sectionTitle}>Entity Emotion Averages</h2>
|
<h2 style={styles.sectionTitle}>Entity Mood Snapshot</h2>
|
||||||
<p style={styles.sectionSubtitle}>Most frequent entities and their dominant average emotion signature.</p>
|
<p style={styles.sectionSubtitle}>
|
||||||
|
Most mentioned entities and the mood that appears most with each.
|
||||||
|
</p>
|
||||||
{!entities.length ? (
|
{!entities.length ? (
|
||||||
<div style={styles.topUserMeta}>No entity-level cultural data available.</div>
|
<div style={styles.topUserMeta}>No entity-level cultural data available.</div>
|
||||||
) : (
|
) : (
|
||||||
<div style={{ ...styles.topUsersList, maxHeight: 420, overflowY: "auto" }}>
|
<div
|
||||||
|
style={{
|
||||||
|
...styles.topUsersList,
|
||||||
|
maxHeight: 420,
|
||||||
|
overflowY: "auto",
|
||||||
|
}}
|
||||||
|
>
|
||||||
{entities.map(([entity, aggregate]) => (
|
{entities.map(([entity, aggregate]) => (
|
||||||
<div key={entity} style={styles.topUserItem}>
|
<div
|
||||||
|
key={entity}
|
||||||
|
style={{ ...styles.topUserItem, cursor: "pointer" }}
|
||||||
|
onClick={() => onExplore(buildEntitySpec(entity))}
|
||||||
|
>
|
||||||
<div style={styles.topUserName}>{entity}</div>
|
<div style={styles.topUserName}>{entity}</div>
|
||||||
<div style={styles.topUserMeta}>
|
<div style={styles.topUserMeta}>
|
||||||
{aggregate.post_count.toLocaleString()} posts • Dominant emotion: {topEmotion(aggregate.emotion_avg)}
|
{aggregate.post_count.toLocaleString()} posts • Likely mood:{" "}
|
||||||
|
{topEmotion(aggregate.emotion_avg)}
|
||||||
</div>
|
</div>
|
||||||
</div>
|
</div>
|
||||||
))}
|
))}
|
||||||
|
|||||||
@@ -1,14 +1,25 @@
|
|||||||
import type { ContentAnalysisResponse } from "../types/ApiTypes"
|
import type { EmotionalAnalysisResponse } from "../types/ApiTypes";
|
||||||
import StatsStyling from "../styles/stats_styling";
|
import StatsStyling from "../styles/stats_styling";
|
||||||
|
import {
|
||||||
|
buildDominantEmotionSpec,
|
||||||
|
buildSourceSpec,
|
||||||
|
buildTopicSpec,
|
||||||
|
type CorpusExplorerSpec,
|
||||||
|
} from "../utils/corpusExplorer";
|
||||||
|
|
||||||
const styles = StatsStyling;
|
const styles = StatsStyling;
|
||||||
|
|
||||||
type EmotionalStatsProps = {
|
type EmotionalStatsProps = {
|
||||||
contentData: ContentAnalysisResponse;
|
emotionalData: EmotionalAnalysisResponse;
|
||||||
}
|
onExplore: (spec: CorpusExplorerSpec) => void;
|
||||||
|
};
|
||||||
|
|
||||||
const EmotionalStats = ({contentData}: EmotionalStatsProps) => {
|
const EmotionalStats = ({ emotionalData, onExplore }: EmotionalStatsProps) => {
|
||||||
const rows = contentData.average_emotion_by_topic ?? [];
|
const rows = emotionalData.average_emotion_by_topic ?? [];
|
||||||
|
const overallEmotionAverage = emotionalData.overall_emotion_average ?? [];
|
||||||
|
const dominantEmotionDistribution =
|
||||||
|
emotionalData.dominant_emotion_distribution ?? [];
|
||||||
|
const emotionBySource = emotionalData.emotion_by_source ?? [];
|
||||||
const lowSampleThreshold = 20;
|
const lowSampleThreshold = 20;
|
||||||
const stableSampleThreshold = 50;
|
const stableSampleThreshold = 50;
|
||||||
const emotionKeys = rows.length
|
const emotionKeys = rows.length
|
||||||
@@ -31,7 +42,7 @@ const EmotionalStats = ({contentData}: EmotionalStatsProps) => {
|
|||||||
topic: String(row.topic),
|
topic: String(row.topic),
|
||||||
count: Number(row.n ?? 0),
|
count: Number(row.n ?? 0),
|
||||||
emotion: maxKey.replace("emotion_", "") || "unknown",
|
emotion: maxKey.replace("emotion_", "") || "unknown",
|
||||||
value: maxValue > Number.NEGATIVE_INFINITY ? maxValue : 0
|
value: maxValue > Number.NEGATIVE_INFINITY ? maxValue : 0,
|
||||||
};
|
};
|
||||||
});
|
});
|
||||||
|
|
||||||
@@ -45,8 +56,12 @@ const EmotionalStats = ({contentData}: EmotionalStatsProps) => {
|
|||||||
.filter((count) => Number.isFinite(count) && count > 0)
|
.filter((count) => Number.isFinite(count) && count > 0)
|
||||||
.sort((a, b) => a - b);
|
.sort((a, b) => a - b);
|
||||||
|
|
||||||
const lowSampleTopics = strongestPerTopic.filter((topic) => topic.count < lowSampleThreshold).length;
|
const lowSampleTopics = strongestPerTopic.filter(
|
||||||
const stableSampleTopics = strongestPerTopic.filter((topic) => topic.count >= stableSampleThreshold).length;
|
(topic) => topic.count < lowSampleThreshold,
|
||||||
|
).length;
|
||||||
|
const stableSampleTopics = strongestPerTopic.filter(
|
||||||
|
(topic) => topic.count >= stableSampleThreshold,
|
||||||
|
).length;
|
||||||
|
|
||||||
const medianSampleSize = sampleSizes.length
|
const medianSampleSize = sampleSizes.length
|
||||||
? sampleSizes[Math.floor(sampleSizes.length / 2)]
|
? sampleSizes[Math.floor(sampleSizes.length / 2)]
|
||||||
@@ -64,42 +79,184 @@ const EmotionalStats = ({contentData}: EmotionalStatsProps) => {
|
|||||||
return (
|
return (
|
||||||
<div style={styles.page}>
|
<div style={styles.page}>
|
||||||
<div style={{ ...styles.container, ...styles.card, marginTop: 16 }}>
|
<div style={{ ...styles.container, ...styles.card, marginTop: 16 }}>
|
||||||
<h2 style={styles.sectionTitle}>Average Emotion by Topic</h2>
|
<h2 style={styles.sectionTitle}>Topic Mood Overview</h2>
|
||||||
<p style={styles.sectionSubtitle}>Read confidence together with sample size. Topics with fewer than {lowSampleThreshold} events are usually noisy and less reliable.</p>
|
<p style={styles.sectionSubtitle}>
|
||||||
|
Use the strength score together with post count. Topics with fewer
|
||||||
|
than {lowSampleThreshold} events are often noisy.
|
||||||
|
</p>
|
||||||
<div style={styles.emotionalSummaryRow}>
|
<div style={styles.emotionalSummaryRow}>
|
||||||
<span><strong style={{ color: "#24292f" }}>Topics:</strong> {strongestPerTopic.length}</span>
|
<span>
|
||||||
<span><strong style={{ color: "#24292f" }}>Median Sample:</strong> {medianSampleSize} events</span>
|
<strong style={{ color: "#24292f" }}>Topics:</strong>{" "}
|
||||||
<span><strong style={{ color: "#24292f" }}>Low Sample (<{lowSampleThreshold}):</strong> {lowSampleTopics}</span>
|
{strongestPerTopic.length}
|
||||||
<span><strong style={{ color: "#24292f" }}>Stable Sample ({stableSampleThreshold}+):</strong> {stableSampleTopics}</span>
|
</span>
|
||||||
|
<span>
|
||||||
|
<strong style={{ color: "#24292f" }}>Median Posts:</strong>{" "}
|
||||||
|
{medianSampleSize}
|
||||||
|
</span>
|
||||||
|
<span>
|
||||||
|
<strong style={{ color: "#24292f" }}>
|
||||||
|
Small Topics (<{lowSampleThreshold}):
|
||||||
|
</strong>{" "}
|
||||||
|
{lowSampleTopics}
|
||||||
|
</span>
|
||||||
|
<span>
|
||||||
|
<strong style={{ color: "#24292f" }}>
|
||||||
|
Stable Topics ({stableSampleThreshold}+):
|
||||||
|
</strong>{" "}
|
||||||
|
{stableSampleTopics}
|
||||||
|
</span>
|
||||||
</div>
|
</div>
|
||||||
<p style={{ ...styles.sectionSubtitle, marginTop: 10, marginBottom: 0 }}>
|
<p
|
||||||
Confidence reflects how strongly one emotion leads within a topic, not model accuracy. Use larger samples for stronger conclusions.
|
style={{ ...styles.sectionSubtitle, marginTop: 10, marginBottom: 0 }}
|
||||||
|
>
|
||||||
|
Strength means how far the top emotion is ahead in that topic. It does
|
||||||
|
not mean model accuracy.
|
||||||
</p>
|
</p>
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
<div style={{ ...styles.container, ...styles.grid }}>
|
<div style={{ ...styles.container, ...styles.grid }}>
|
||||||
{strongestPerTopic.map((topic) => (
|
<div style={{ ...styles.card, gridColumn: "span 4" }}>
|
||||||
<div key={topic.topic} style={{ ...styles.card, gridColumn: "span 4" }}>
|
<h2 style={styles.sectionTitle}>Mood Averages</h2>
|
||||||
<h3 style={{ ...styles.sectionTitle, marginBottom: 6 }}>{topic.topic}</h3>
|
<p style={styles.sectionSubtitle}>Average score for each emotion.</p>
|
||||||
<div style={styles.emotionalTopicLabel}>
|
{!overallEmotionAverage.length ? (
|
||||||
Top Emotion
|
<div style={styles.topUserMeta}>
|
||||||
|
No overall emotion averages available.
|
||||||
</div>
|
</div>
|
||||||
<div style={styles.emotionalTopicValue}>
|
) : (
|
||||||
{formatEmotion(topic.emotion)}
|
<div
|
||||||
|
style={{
|
||||||
|
...styles.topUsersList,
|
||||||
|
maxHeight: 260,
|
||||||
|
overflowY: "auto",
|
||||||
|
}}
|
||||||
|
>
|
||||||
|
{[...overallEmotionAverage]
|
||||||
|
.sort((a, b) => b.score - a.score)
|
||||||
|
.map((row) => (
|
||||||
|
<div
|
||||||
|
key={row.emotion}
|
||||||
|
style={{ ...styles.topUserItem, cursor: "pointer" }}
|
||||||
|
onClick={() => onExplore(buildDominantEmotionSpec(row.emotion))}
|
||||||
|
>
|
||||||
|
<div style={styles.topUserName}>
|
||||||
|
{formatEmotion(row.emotion)}
|
||||||
|
</div>
|
||||||
|
<div style={styles.topUserMeta}>{row.score.toFixed(3)}</div>
|
||||||
|
</div>
|
||||||
|
))}
|
||||||
</div>
|
</div>
|
||||||
<div style={styles.emotionalMetricRow}>
|
)}
|
||||||
<span>Confidence</span>
|
</div>
|
||||||
<span style={styles.emotionalMetricValue}>{topic.value.toFixed(3)}</span>
|
|
||||||
|
<div style={{ ...styles.card, gridColumn: "span 4" }}>
|
||||||
|
<h2 style={styles.sectionTitle}>Mood Split</h2>
|
||||||
|
<p style={styles.sectionSubtitle}>
|
||||||
|
How often each emotion is dominant.
|
||||||
|
</p>
|
||||||
|
{!dominantEmotionDistribution.length ? (
|
||||||
|
<div style={styles.topUserMeta}>
|
||||||
|
No dominant-emotion split available.
|
||||||
</div>
|
</div>
|
||||||
<div style={styles.emotionalMetricRowCompact}>
|
) : (
|
||||||
<span>Sample Size</span>
|
<div
|
||||||
<span style={styles.emotionalMetricValue}>{topic.count} events</span>
|
style={{
|
||||||
|
...styles.topUsersList,
|
||||||
|
maxHeight: 260,
|
||||||
|
overflowY: "auto",
|
||||||
|
}}
|
||||||
|
>
|
||||||
|
{[...dominantEmotionDistribution]
|
||||||
|
.sort((a, b) => b.ratio - a.ratio)
|
||||||
|
.map((row) => (
|
||||||
|
<div
|
||||||
|
key={row.emotion}
|
||||||
|
style={{ ...styles.topUserItem, cursor: "pointer" }}
|
||||||
|
onClick={() => onExplore(buildDominantEmotionSpec(row.emotion))}
|
||||||
|
>
|
||||||
|
<div style={styles.topUserName}>
|
||||||
|
{formatEmotion(row.emotion)}
|
||||||
|
</div>
|
||||||
|
<div style={styles.topUserMeta}>
|
||||||
|
{(row.ratio * 100).toFixed(1)}% •{" "}
|
||||||
|
{row.count.toLocaleString()} events
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
))}
|
||||||
</div>
|
</div>
|
||||||
|
)}
|
||||||
|
</div>
|
||||||
|
|
||||||
|
<div style={{ ...styles.card, gridColumn: "span 4" }}>
|
||||||
|
<h2 style={styles.sectionTitle}>Mood by Source</h2>
|
||||||
|
<p style={styles.sectionSubtitle}>Leading emotion in each source.</p>
|
||||||
|
{!emotionBySource.length ? (
|
||||||
|
<div style={styles.topUserMeta}>
|
||||||
|
No source emotion profile available.
|
||||||
|
</div>
|
||||||
|
) : (
|
||||||
|
<div
|
||||||
|
style={{
|
||||||
|
...styles.topUsersList,
|
||||||
|
maxHeight: 260,
|
||||||
|
overflowY: "auto",
|
||||||
|
}}
|
||||||
|
>
|
||||||
|
{[...emotionBySource]
|
||||||
|
.sort((a, b) => b.event_count - a.event_count)
|
||||||
|
.map((row) => (
|
||||||
|
<div
|
||||||
|
key={row.source}
|
||||||
|
style={{ ...styles.topUserItem, cursor: "pointer" }}
|
||||||
|
onClick={() => onExplore(buildSourceSpec(row.source))}
|
||||||
|
>
|
||||||
|
<div style={styles.topUserName}>{row.source}</div>
|
||||||
|
<div style={styles.topUserMeta}>
|
||||||
|
{formatEmotion(row.dominant_emotion)} •{" "}
|
||||||
|
{row.dominant_score.toFixed(3)} •{" "}
|
||||||
|
{row.event_count.toLocaleString()} events
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
))}
|
||||||
|
</div>
|
||||||
|
)}
|
||||||
|
</div>
|
||||||
|
|
||||||
|
<div style={{ ...styles.card, gridColumn: "span 12" }}>
|
||||||
|
<h2 style={styles.sectionTitle}>Topic Snapshots</h2>
|
||||||
|
<p style={styles.sectionSubtitle}>
|
||||||
|
Per-topic mood with strength and post count.
|
||||||
|
</p>
|
||||||
|
<div style={{ ...styles.grid, marginTop: 10 }}>
|
||||||
|
{strongestPerTopic.map((topic) => (
|
||||||
|
<div
|
||||||
|
key={topic.topic}
|
||||||
|
style={{ ...styles.cardBase, gridColumn: "span 4", cursor: "pointer" }}
|
||||||
|
onClick={() => onExplore(buildTopicSpec(topic.topic))}
|
||||||
|
>
|
||||||
|
<h3 style={{ ...styles.sectionTitle, marginBottom: 6 }}>
|
||||||
|
{topic.topic}
|
||||||
|
</h3>
|
||||||
|
<div style={styles.emotionalTopicLabel}>Likely Mood</div>
|
||||||
|
<div style={styles.emotionalTopicValue}>
|
||||||
|
{formatEmotion(topic.emotion)}
|
||||||
|
</div>
|
||||||
|
<div style={styles.emotionalMetricRow}>
|
||||||
|
<span>Strength</span>
|
||||||
|
<span style={styles.emotionalMetricValue}>
|
||||||
|
{topic.value.toFixed(3)}
|
||||||
|
</span>
|
||||||
|
</div>
|
||||||
|
<div style={styles.emotionalMetricRowCompact}>
|
||||||
|
<span>Posts in Topic</span>
|
||||||
|
<span style={styles.emotionalMetricValue}>{topic.count}</span>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
))}
|
||||||
</div>
|
</div>
|
||||||
))}
|
</div>
|
||||||
</div>
|
</div>
|
||||||
</div>
|
</div>
|
||||||
);
|
);
|
||||||
}
|
};
|
||||||
|
|
||||||
export default EmotionalStats;
|
export default EmotionalStats;
|
||||||
|
|||||||
@@ -24,22 +24,35 @@ type InteractionalStatsProps = {
|
|||||||
const InteractionalStats = ({ data }: InteractionalStatsProps) => {
|
const InteractionalStats = ({ data }: InteractionalStatsProps) => {
|
||||||
const graph = data.interaction_graph ?? {};
|
const graph = data.interaction_graph ?? {};
|
||||||
const userCount = Object.keys(graph).length;
|
const userCount = Object.keys(graph).length;
|
||||||
const edges = Object.values(graph).flatMap((targets) => Object.values(targets));
|
let edgeCount = 0;
|
||||||
const edgeCount = edges.length;
|
let interactionVolume = 0;
|
||||||
const interactionVolume = edges.reduce((sum, value) => sum + value, 0);
|
for (const targets of Object.values(graph)) {
|
||||||
|
for (const value of Object.values(targets)) {
|
||||||
|
edgeCount += 1;
|
||||||
|
interactionVolume += value;
|
||||||
|
}
|
||||||
|
}
|
||||||
const concentration = data.conversation_concentration;
|
const concentration = data.conversation_concentration;
|
||||||
const topTenCommentShare = typeof concentration?.top_10pct_comment_share === "number"
|
const topTenCommentShare =
|
||||||
? concentration?.top_10pct_comment_share
|
typeof concentration?.top_10pct_comment_share === "number"
|
||||||
: null;
|
? concentration?.top_10pct_comment_share
|
||||||
const topTenAuthorCount = typeof concentration?.top_10pct_author_count === "number"
|
: null;
|
||||||
? concentration.top_10pct_author_count
|
const topTenAuthorCount =
|
||||||
: null;
|
typeof concentration?.top_10pct_author_count === "number"
|
||||||
const totalCommentingAuthors = typeof concentration?.total_commenting_authors === "number"
|
? concentration.top_10pct_author_count
|
||||||
? concentration.total_commenting_authors
|
: null;
|
||||||
: null;
|
const totalCommentingAuthors =
|
||||||
const singleCommentAuthorRatio = typeof concentration?.single_comment_author_ratio === "number"
|
typeof concentration?.total_commenting_authors === "number"
|
||||||
? concentration.single_comment_author_ratio
|
? concentration.total_commenting_authors
|
||||||
: null;
|
: null;
|
||||||
|
const singleCommentAuthorRatio =
|
||||||
|
typeof concentration?.single_comment_author_ratio === "number"
|
||||||
|
? concentration.single_comment_author_ratio
|
||||||
|
: null;
|
||||||
|
const singleCommentAuthors =
|
||||||
|
typeof concentration?.single_comment_authors === "number"
|
||||||
|
? concentration.single_comment_authors
|
||||||
|
: null;
|
||||||
|
|
||||||
const topPairs = (data.top_interaction_pairs ?? [])
|
const topPairs = (data.top_interaction_pairs ?? [])
|
||||||
.filter((item): item is [[string, string], number] => {
|
.filter((item): item is [[string, string], number] => {
|
||||||
@@ -50,26 +63,28 @@ const InteractionalStats = ({ data }: InteractionalStatsProps) => {
|
|||||||
const pair = item[0];
|
const pair = item[0];
|
||||||
const count = item[1];
|
const count = item[1];
|
||||||
|
|
||||||
return Array.isArray(pair)
|
return (
|
||||||
&& pair.length === 2
|
Array.isArray(pair) &&
|
||||||
&& typeof pair[0] === "string"
|
pair.length === 2 &&
|
||||||
&& typeof pair[1] === "string"
|
typeof pair[0] === "string" &&
|
||||||
&& typeof count === "number";
|
typeof pair[1] === "string" &&
|
||||||
|
typeof count === "number"
|
||||||
|
);
|
||||||
})
|
})
|
||||||
.slice(0, 20);
|
.slice(0, 20);
|
||||||
|
|
||||||
const topPairChartData = topPairs.slice(0, 8).map(([[source, target], value], index) => ({
|
const topPairChartData = topPairs
|
||||||
pair: `${source} -> ${target}`,
|
.slice(0, 8)
|
||||||
replies: value,
|
.map(([[source, target], value], index) => ({
|
||||||
rank: index + 1,
|
pair: `${source} -> ${target}`,
|
||||||
}));
|
replies: value,
|
||||||
|
rank: index + 1,
|
||||||
|
}));
|
||||||
|
|
||||||
const topTenSharePercent = topTenCommentShare === null
|
const topTenSharePercent =
|
||||||
? null
|
topTenCommentShare === null ? null : topTenCommentShare * 100;
|
||||||
: topTenCommentShare * 100;
|
const nonTopTenSharePercent =
|
||||||
const nonTopTenSharePercent = topTenSharePercent === null
|
topTenSharePercent === null ? null : Math.max(0, 100 - topTenSharePercent);
|
||||||
? null
|
|
||||||
: Math.max(0, 100 - topTenSharePercent);
|
|
||||||
|
|
||||||
let concentrationPieData: { name: string; value: number }[] = [];
|
let concentrationPieData: { name: string; value: number }[] = [];
|
||||||
if (topTenSharePercent !== null && nonTopTenSharePercent !== null) {
|
if (topTenSharePercent !== null && nonTopTenSharePercent !== null) {
|
||||||
@@ -84,55 +99,78 @@ const InteractionalStats = ({ data }: InteractionalStatsProps) => {
|
|||||||
return (
|
return (
|
||||||
<div style={styles.page}>
|
<div style={styles.page}>
|
||||||
<div style={{ ...styles.container, ...styles.grid }}>
|
<div style={{ ...styles.container, ...styles.grid }}>
|
||||||
|
<div style={{ ...styles.card, gridColumn: "span 12" }}>
|
||||||
|
<h2 style={styles.sectionTitle}>Conversation Overview</h2>
|
||||||
|
<p style={styles.sectionSubtitle}>
|
||||||
|
Who talks to who, how much they interact, and how concentrated the replies are.
|
||||||
|
</p>
|
||||||
|
</div>
|
||||||
|
|
||||||
<Card
|
<Card
|
||||||
label="Avg Thread Depth"
|
label="Users in Network"
|
||||||
value={typeof data.average_thread_depth === "number" ? data.average_thread_depth.toFixed(2) : "—"}
|
|
||||||
sublabel="Depth from reply chains"
|
|
||||||
style={{ gridColumn: "span 3" }}
|
|
||||||
/>
|
|
||||||
<Card
|
|
||||||
label="Network Users"
|
|
||||||
value={userCount.toLocaleString()}
|
value={userCount.toLocaleString()}
|
||||||
sublabel="Authors in interaction graph"
|
sublabel="Users in the reply graph"
|
||||||
style={{ gridColumn: "span 3" }}
|
style={{ gridColumn: "span 4" }}
|
||||||
/>
|
/>
|
||||||
<Card
|
<Card
|
||||||
label="Unique Links"
|
label="User-to-User Links"
|
||||||
value={edgeCount.toLocaleString()}
|
value={edgeCount.toLocaleString()}
|
||||||
sublabel="Directed source-target pairs"
|
sublabel="Unique reply directions"
|
||||||
style={{ gridColumn: "span 3" }}
|
style={{ gridColumn: "span 4" }}
|
||||||
/>
|
/>
|
||||||
<Card
|
<Card
|
||||||
label="Interaction Volume"
|
label="Total Replies"
|
||||||
value={interactionVolume.toLocaleString()}
|
value={interactionVolume.toLocaleString()}
|
||||||
sublabel="Sum of link weights"
|
sublabel="All reply links combined"
|
||||||
style={{ gridColumn: "span 3" }}
|
style={{ gridColumn: "span 4" }}
|
||||||
/>
|
/>
|
||||||
<Card
|
<Card
|
||||||
label="Top 10% Comment Share"
|
label="Concentrated Replies"
|
||||||
value={topTenSharePercent === null ? "-" : `${topTenSharePercent.toFixed(1)}%`}
|
value={
|
||||||
sublabel={topTenAuthorCount === null || totalCommentingAuthors === null
|
topTenSharePercent === null
|
||||||
? "Comment volume held by top commenters"
|
? "-"
|
||||||
: `${topTenAuthorCount.toLocaleString()} of ${totalCommentingAuthors.toLocaleString()} authors`}
|
: `${topTenSharePercent.toFixed(1)}%`
|
||||||
|
}
|
||||||
|
sublabel={
|
||||||
|
topTenAuthorCount === null || totalCommentingAuthors === null
|
||||||
|
? "Reply share from the top 10% commenters"
|
||||||
|
: `${topTenAuthorCount.toLocaleString()} of ${totalCommentingAuthors.toLocaleString()} authors`
|
||||||
|
}
|
||||||
style={{ gridColumn: "span 6" }}
|
style={{ gridColumn: "span 6" }}
|
||||||
/>
|
/>
|
||||||
<Card
|
<Card
|
||||||
label="Single-Comment Authors"
|
label="Single-Comment Authors"
|
||||||
value={singleCommentAuthorRatio === null ? "-" : `${(singleCommentAuthorRatio * 100).toFixed(1)}%`}
|
value={
|
||||||
sublabel="Authors who commented exactly once"
|
singleCommentAuthorRatio === null
|
||||||
|
? "-"
|
||||||
|
: `${(singleCommentAuthorRatio * 100).toFixed(1)}%`
|
||||||
|
}
|
||||||
|
sublabel={
|
||||||
|
singleCommentAuthors === null
|
||||||
|
? "Authors who commented exactly once"
|
||||||
|
: `${singleCommentAuthors.toLocaleString()} authors commented exactly once`
|
||||||
|
}
|
||||||
style={{ gridColumn: "span 6" }}
|
style={{ gridColumn: "span 6" }}
|
||||||
/>
|
/>
|
||||||
|
|
||||||
<div style={{ ...styles.card, gridColumn: "span 12" }}>
|
<div style={{ ...styles.card, gridColumn: "span 12" }}>
|
||||||
<h2 style={styles.sectionTitle}>Interaction Visuals</h2>
|
<h2 style={styles.sectionTitle}>Conversation Visuals</h2>
|
||||||
<p style={styles.sectionSubtitle}>Quick charts for interaction direction and conversation concentration.</p>
|
<p style={styles.sectionSubtitle}>
|
||||||
|
Main reply links and concentration split.
|
||||||
|
</p>
|
||||||
|
|
||||||
<div style={{ ...styles.grid, marginTop: 12 }}>
|
<div style={{ ...styles.grid, marginTop: 12 }}>
|
||||||
<div style={{ ...styles.cardBase, gridColumn: "span 6" }}>
|
<div style={{ ...styles.cardBase, gridColumn: "span 6" }}>
|
||||||
<h3 style={{ ...styles.sectionTitle, fontSize: "1rem" }}>Top Interaction Pairs</h3>
|
<h3 style={{ ...styles.sectionTitle, fontSize: "1rem" }}>
|
||||||
|
Top Interaction Pairs
|
||||||
|
</h3>
|
||||||
<div style={{ width: "100%", height: 300 }}>
|
<div style={{ width: "100%", height: 300 }}>
|
||||||
<ResponsiveContainer>
|
<ResponsiveContainer>
|
||||||
<BarChart data={topPairChartData} layout="vertical" margin={{ top: 8, right: 16, left: 16, bottom: 8 }}>
|
<BarChart
|
||||||
|
data={topPairChartData}
|
||||||
|
layout="vertical"
|
||||||
|
margin={{ top: 8, right: 16, left: 16, bottom: 8 }}
|
||||||
|
>
|
||||||
<CartesianGrid strokeDasharray="3 3" stroke="#d9e2ec" />
|
<CartesianGrid strokeDasharray="3 3" stroke="#d9e2ec" />
|
||||||
<XAxis type="number" allowDecimals={false} />
|
<XAxis type="number" allowDecimals={false} />
|
||||||
<YAxis
|
<YAxis
|
||||||
@@ -142,14 +180,20 @@ const InteractionalStats = ({ data }: InteractionalStatsProps) => {
|
|||||||
width={36}
|
width={36}
|
||||||
/>
|
/>
|
||||||
<Tooltip />
|
<Tooltip />
|
||||||
<Bar dataKey="replies" fill="#2b6777" radius={[0, 6, 6, 0]} />
|
<Bar
|
||||||
|
dataKey="replies"
|
||||||
|
fill="#2b6777"
|
||||||
|
radius={[0, 6, 6, 0]}
|
||||||
|
/>
|
||||||
</BarChart>
|
</BarChart>
|
||||||
</ResponsiveContainer>
|
</ResponsiveContainer>
|
||||||
</div>
|
</div>
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
<div style={{ ...styles.cardBase, gridColumn: "span 6" }}>
|
<div style={{ ...styles.cardBase, gridColumn: "span 6" }}>
|
||||||
<h3 style={{ ...styles.sectionTitle, fontSize: "1rem" }}>Top 10% vs Other Comment Share</h3>
|
<h3 style={{ ...styles.sectionTitle, fontSize: "1rem" }}>
|
||||||
|
Top 10% vs Other Comment Share
|
||||||
|
</h3>
|
||||||
<div style={{ width: "100%", height: 300 }}>
|
<div style={{ width: "100%", height: 300 }}>
|
||||||
<ResponsiveContainer>
|
<ResponsiveContainer>
|
||||||
<PieChart>
|
<PieChart>
|
||||||
@@ -162,7 +206,10 @@ const InteractionalStats = ({ data }: InteractionalStatsProps) => {
|
|||||||
paddingAngle={2}
|
paddingAngle={2}
|
||||||
>
|
>
|
||||||
{concentrationPieData.map((entry, index) => (
|
{concentrationPieData.map((entry, index) => (
|
||||||
<Cell key={`${entry.name}-${index}`} fill={PIE_COLORS[index % PIE_COLORS.length]} />
|
<Cell
|
||||||
|
key={`${entry.name}-${index}`}
|
||||||
|
fill={PIE_COLORS[index % PIE_COLORS.length]}
|
||||||
|
/>
|
||||||
))}
|
))}
|
||||||
</Pie>
|
</Pie>
|
||||||
<Tooltip />
|
<Tooltip />
|
||||||
@@ -175,16 +222,33 @@ const InteractionalStats = ({ data }: InteractionalStatsProps) => {
|
|||||||
</div>
|
</div>
|
||||||
|
|
||||||
<div style={{ ...styles.card, gridColumn: "span 12" }}>
|
<div style={{ ...styles.card, gridColumn: "span 12" }}>
|
||||||
<h2 style={styles.sectionTitle}>Top Interaction Pairs</h2>
|
<h2 style={styles.sectionTitle}>Frequent Reply Paths</h2>
|
||||||
<p style={styles.sectionSubtitle}>Most frequent directed reply paths between users.</p>
|
<p style={styles.sectionSubtitle}>
|
||||||
|
Most common user-to-user reply paths.
|
||||||
|
</p>
|
||||||
{!topPairs.length ? (
|
{!topPairs.length ? (
|
||||||
<div style={styles.topUserMeta}>No interaction pair data available.</div>
|
<div style={styles.topUserMeta}>
|
||||||
|
No interaction pair data available.
|
||||||
|
</div>
|
||||||
) : (
|
) : (
|
||||||
<div style={{ ...styles.topUsersList, maxHeight: 420, overflowY: "auto" }}>
|
<div
|
||||||
|
style={{
|
||||||
|
...styles.topUsersList,
|
||||||
|
maxHeight: 420,
|
||||||
|
overflowY: "auto",
|
||||||
|
}}
|
||||||
|
>
|
||||||
{topPairs.map(([[source, target], value], index) => (
|
{topPairs.map(([[source, target], value], index) => (
|
||||||
<div key={`${source}->${target}-${index}`} style={styles.topUserItem}>
|
<div
|
||||||
<div style={styles.topUserName}>{source} -> {target}</div>
|
key={`${source}->${target}-${index}`}
|
||||||
<div style={styles.topUserMeta}>{value.toLocaleString()} replies</div>
|
style={styles.topUserItem}
|
||||||
|
>
|
||||||
|
<div style={styles.topUserName}>
|
||||||
|
{source} -> {target}
|
||||||
|
</div>
|
||||||
|
<div style={styles.topUserMeta}>
|
||||||
|
{value.toLocaleString()} replies
|
||||||
|
</div>
|
||||||
</div>
|
</div>
|
||||||
))}
|
))}
|
||||||
</div>
|
</div>
|
||||||
|
|||||||
@@ -1,14 +1,20 @@
|
|||||||
import Card from "./Card";
|
import Card from "./Card";
|
||||||
import StatsStyling from "../styles/stats_styling";
|
import StatsStyling from "../styles/stats_styling";
|
||||||
import type { LinguisticAnalysisResponse } from "../types/ApiTypes";
|
import type { LinguisticAnalysisResponse } from "../types/ApiTypes";
|
||||||
|
import {
|
||||||
|
buildNgramSpec,
|
||||||
|
buildWordSpec,
|
||||||
|
type CorpusExplorerSpec,
|
||||||
|
} from "../utils/corpusExplorer";
|
||||||
|
|
||||||
const styles = StatsStyling;
|
const styles = StatsStyling;
|
||||||
|
|
||||||
type LinguisticStatsProps = {
|
type LinguisticStatsProps = {
|
||||||
data: LinguisticAnalysisResponse;
|
data: LinguisticAnalysisResponse;
|
||||||
|
onExplore: (spec: CorpusExplorerSpec) => void;
|
||||||
};
|
};
|
||||||
|
|
||||||
const LinguisticStats = ({ data }: LinguisticStatsProps) => {
|
const LinguisticStats = ({ data, onExplore }: LinguisticStatsProps) => {
|
||||||
const lexical = data.lexical_diversity;
|
const lexical = data.lexical_diversity;
|
||||||
const words = data.word_frequencies ?? [];
|
const words = data.word_frequencies ?? [];
|
||||||
const bigrams = data.common_two_phrases ?? [];
|
const bigrams = data.common_two_phrases ?? [];
|
||||||
@@ -21,33 +27,54 @@ const LinguisticStats = ({ data }: LinguisticStatsProps) => {
|
|||||||
return (
|
return (
|
||||||
<div style={styles.page}>
|
<div style={styles.page}>
|
||||||
<div style={{ ...styles.container, ...styles.grid }}>
|
<div style={{ ...styles.container, ...styles.grid }}>
|
||||||
|
<div style={{ ...styles.card, gridColumn: "span 12" }}>
|
||||||
|
<h2 style={styles.sectionTitle}>Language Overview</h2>
|
||||||
|
<p style={styles.sectionSubtitle}>
|
||||||
|
Quick read on how broad and repetitive the wording is.
|
||||||
|
</p>
|
||||||
|
</div>
|
||||||
|
|
||||||
<Card
|
<Card
|
||||||
label="Total Tokens"
|
label="Total Words"
|
||||||
value={lexical?.total_tokens?.toLocaleString() ?? "—"}
|
value={lexical?.total_tokens?.toLocaleString() ?? "—"}
|
||||||
sublabel="After token filtering"
|
sublabel="Words after basic filtering"
|
||||||
style={{ gridColumn: "span 4" }}
|
style={{ gridColumn: "span 4" }}
|
||||||
/>
|
/>
|
||||||
<Card
|
<Card
|
||||||
label="Unique Tokens"
|
label="Unique Words"
|
||||||
value={lexical?.unique_tokens?.toLocaleString() ?? "—"}
|
value={lexical?.unique_tokens?.toLocaleString() ?? "—"}
|
||||||
sublabel="Distinct vocabulary items"
|
sublabel="Different words used"
|
||||||
style={{ gridColumn: "span 4" }}
|
style={{ gridColumn: "span 4" }}
|
||||||
/>
|
/>
|
||||||
<Card
|
<Card
|
||||||
label="Type-Token Ratio"
|
label="Vocabulary Variety"
|
||||||
value={typeof lexical?.ttr === "number" ? lexical.ttr.toFixed(4) : "—"}
|
value={
|
||||||
sublabel="Vocabulary richness proxy"
|
typeof lexical?.ttr === "number" ? lexical.ttr.toFixed(4) : "—"
|
||||||
|
}
|
||||||
|
sublabel="Higher means less repetition"
|
||||||
style={{ gridColumn: "span 4" }}
|
style={{ gridColumn: "span 4" }}
|
||||||
/>
|
/>
|
||||||
|
|
||||||
<div style={{ ...styles.card, gridColumn: "span 4" }}>
|
<div style={{ ...styles.card, gridColumn: "span 4" }}>
|
||||||
<h2 style={styles.sectionTitle}>Top Words</h2>
|
<h2 style={styles.sectionTitle}>Top Words</h2>
|
||||||
<p style={styles.sectionSubtitle}>Most frequent filtered terms.</p>
|
<p style={styles.sectionSubtitle}>Most used single words.</p>
|
||||||
<div style={{ ...styles.topUsersList, maxHeight: 360, overflowY: "auto" }}>
|
<div
|
||||||
|
style={{
|
||||||
|
...styles.topUsersList,
|
||||||
|
maxHeight: 360,
|
||||||
|
overflowY: "auto",
|
||||||
|
}}
|
||||||
|
>
|
||||||
{topWords.map((item) => (
|
{topWords.map((item) => (
|
||||||
<div key={item.word} style={styles.topUserItem}>
|
<div
|
||||||
|
key={item.word}
|
||||||
|
style={{ ...styles.topUserItem, cursor: "pointer" }}
|
||||||
|
onClick={() => onExplore(buildWordSpec(item.word))}
|
||||||
|
>
|
||||||
<div style={styles.topUserName}>{item.word}</div>
|
<div style={styles.topUserName}>{item.word}</div>
|
||||||
<div style={styles.topUserMeta}>{item.count.toLocaleString()} uses</div>
|
<div style={styles.topUserMeta}>
|
||||||
|
{item.count.toLocaleString()} uses
|
||||||
|
</div>
|
||||||
</div>
|
</div>
|
||||||
))}
|
))}
|
||||||
</div>
|
</div>
|
||||||
@@ -55,12 +82,24 @@ const LinguisticStats = ({ data }: LinguisticStatsProps) => {
|
|||||||
|
|
||||||
<div style={{ ...styles.card, gridColumn: "span 4" }}>
|
<div style={{ ...styles.card, gridColumn: "span 4" }}>
|
||||||
<h2 style={styles.sectionTitle}>Top Bigrams</h2>
|
<h2 style={styles.sectionTitle}>Top Bigrams</h2>
|
||||||
<p style={styles.sectionSubtitle}>Most frequent 2-word phrases.</p>
|
<p style={styles.sectionSubtitle}>Most used 2-word phrases.</p>
|
||||||
<div style={{ ...styles.topUsersList, maxHeight: 360, overflowY: "auto" }}>
|
<div
|
||||||
|
style={{
|
||||||
|
...styles.topUsersList,
|
||||||
|
maxHeight: 360,
|
||||||
|
overflowY: "auto",
|
||||||
|
}}
|
||||||
|
>
|
||||||
{topBigrams.map((item) => (
|
{topBigrams.map((item) => (
|
||||||
<div key={item.ngram} style={styles.topUserItem}>
|
<div
|
||||||
|
key={item.ngram}
|
||||||
|
style={{ ...styles.topUserItem, cursor: "pointer" }}
|
||||||
|
onClick={() => onExplore(buildNgramSpec(item.ngram))}
|
||||||
|
>
|
||||||
<div style={styles.topUserName}>{item.ngram}</div>
|
<div style={styles.topUserName}>{item.ngram}</div>
|
||||||
<div style={styles.topUserMeta}>{item.count.toLocaleString()} uses</div>
|
<div style={styles.topUserMeta}>
|
||||||
|
{item.count.toLocaleString()} uses
|
||||||
|
</div>
|
||||||
</div>
|
</div>
|
||||||
))}
|
))}
|
||||||
</div>
|
</div>
|
||||||
@@ -68,12 +107,24 @@ const LinguisticStats = ({ data }: LinguisticStatsProps) => {
|
|||||||
|
|
||||||
<div style={{ ...styles.card, gridColumn: "span 4" }}>
|
<div style={{ ...styles.card, gridColumn: "span 4" }}>
|
||||||
<h2 style={styles.sectionTitle}>Top Trigrams</h2>
|
<h2 style={styles.sectionTitle}>Top Trigrams</h2>
|
||||||
<p style={styles.sectionSubtitle}>Most frequent 3-word phrases.</p>
|
<p style={styles.sectionSubtitle}>Most used 3-word phrases.</p>
|
||||||
<div style={{ ...styles.topUsersList, maxHeight: 360, overflowY: "auto" }}>
|
<div
|
||||||
|
style={{
|
||||||
|
...styles.topUsersList,
|
||||||
|
maxHeight: 360,
|
||||||
|
overflowY: "auto",
|
||||||
|
}}
|
||||||
|
>
|
||||||
{topTrigrams.map((item) => (
|
{topTrigrams.map((item) => (
|
||||||
<div key={item.ngram} style={styles.topUserItem}>
|
<div
|
||||||
|
key={item.ngram}
|
||||||
|
style={{ ...styles.topUserItem, cursor: "pointer" }}
|
||||||
|
onClick={() => onExplore(buildNgramSpec(item.ngram))}
|
||||||
|
>
|
||||||
<div style={styles.topUserName}>{item.ngram}</div>
|
<div style={styles.topUserName}>{item.ngram}</div>
|
||||||
<div style={styles.topUserMeta}>{item.count.toLocaleString()} uses</div>
|
<div style={styles.topUserMeta}>
|
||||||
|
{item.count.toLocaleString()} uses
|
||||||
|
</div>
|
||||||
</div>
|
</div>
|
||||||
))}
|
))}
|
||||||
</div>
|
</div>
|
||||||
|
|||||||
@@ -1,4 +1,4 @@
|
|||||||
import { useState } from "react";
|
import { memo, useMemo } from "react";
|
||||||
import {
|
import {
|
||||||
LineChart,
|
LineChart,
|
||||||
Line,
|
Line,
|
||||||
@@ -6,32 +6,55 @@ import {
|
|||||||
YAxis,
|
YAxis,
|
||||||
Tooltip,
|
Tooltip,
|
||||||
CartesianGrid,
|
CartesianGrid,
|
||||||
ResponsiveContainer
|
ResponsiveContainer,
|
||||||
} from "recharts";
|
} from "recharts";
|
||||||
|
|
||||||
import ActivityHeatmap from "../stats/ActivityHeatmap";
|
import ActivityHeatmap from "../stats/ActivityHeatmap";
|
||||||
import { ReactWordcloud } from '@cp949/react-wordcloud';
|
import { ReactWordcloud } from "@cp949/react-wordcloud";
|
||||||
import StatsStyling from "../styles/stats_styling";
|
import StatsStyling from "../styles/stats_styling";
|
||||||
import Card from "../components/Card";
|
import Card from "../components/Card";
|
||||||
import UserModal from "../components/UserModal";
|
|
||||||
|
|
||||||
import {
|
import {
|
||||||
type SummaryResponse,
|
type SummaryResponse,
|
||||||
type FrequencyWord,
|
type FrequencyWord,
|
||||||
type UserAnalysisResponse,
|
type UserEndpointResponse,
|
||||||
type TimeAnalysisResponse,
|
type TimeAnalysisResponse,
|
||||||
type ContentAnalysisResponse,
|
type LinguisticAnalysisResponse,
|
||||||
type User
|
} from "../types/ApiTypes";
|
||||||
} from '../types/ApiTypes'
|
import {
|
||||||
|
buildAllRecordsSpec,
|
||||||
|
buildDateBucketSpec,
|
||||||
|
buildOneTimeUsersSpec,
|
||||||
|
buildUserSpec,
|
||||||
|
type CorpusExplorerSpec,
|
||||||
|
} from "../utils/corpusExplorer";
|
||||||
|
|
||||||
const styles = StatsStyling;
|
const styles = StatsStyling;
|
||||||
|
const MAX_WORDCLOUD_WORDS = 250;
|
||||||
|
const exploreButtonStyle = { padding: "4px 8px", fontSize: 12 };
|
||||||
|
|
||||||
|
const WORDCLOUD_OPTIONS = {
|
||||||
|
rotations: 2,
|
||||||
|
rotationAngles: [0, 90] as [number, number],
|
||||||
|
fontSizes: [14, 60] as [number, number],
|
||||||
|
enableTooltip: true,
|
||||||
|
};
|
||||||
|
|
||||||
type SummaryStatsProps = {
|
type SummaryStatsProps = {
|
||||||
userData: UserAnalysisResponse | null;
|
userData: UserEndpointResponse | null;
|
||||||
timeData: TimeAnalysisResponse | null;
|
timeData: TimeAnalysisResponse | null;
|
||||||
contentData: ContentAnalysisResponse | null;
|
linguisticData: LinguisticAnalysisResponse | null;
|
||||||
summary: SummaryResponse | null;
|
summary: SummaryResponse | null;
|
||||||
}
|
onExplore: (spec: CorpusExplorerSpec) => void;
|
||||||
|
};
|
||||||
|
|
||||||
|
type WordCloudPanelProps = {
|
||||||
|
words: { text: string; value: number }[];
|
||||||
|
};
|
||||||
|
|
||||||
|
const WordCloudPanel = memo(({ words }: WordCloudPanelProps) => (
|
||||||
|
<ReactWordcloud words={words} options={WORDCLOUD_OPTIONS} />
|
||||||
|
));
|
||||||
|
|
||||||
function formatDateRange(startUnix: number, endUnix: number) {
|
function formatDateRange(startUnix: number, endUnix: number) {
|
||||||
const start = new Date(startUnix * 1000);
|
const start = new Date(startUnix * 1000);
|
||||||
@@ -44,174 +67,188 @@ function formatDateRange(startUnix: number, endUnix: number) {
|
|||||||
day: "2-digit",
|
day: "2-digit",
|
||||||
});
|
});
|
||||||
|
|
||||||
return `${fmt(start)} → ${fmt(end)}`;
|
return `${fmt(start)} -> ${fmt(end)}`;
|
||||||
}
|
}
|
||||||
|
|
||||||
function convertFrequencyData(data: FrequencyWord[]) {
|
function convertFrequencyData(data: FrequencyWord[]) {
|
||||||
return data.map((d: FrequencyWord) => ({
|
return data.map((d: FrequencyWord) => ({
|
||||||
text: d.word,
|
text: d.word,
|
||||||
value: d.count,
|
value: d.count,
|
||||||
}))
|
}));
|
||||||
}
|
}
|
||||||
|
|
||||||
const SummaryStats = ({userData, timeData, contentData, summary}: SummaryStatsProps) => {
|
const renderExploreButton = (onClick: () => void) => (
|
||||||
const [selectedUser, setSelectedUser] = useState<string | null>(null);
|
<button
|
||||||
const selectedUserData: User | null = userData?.users.find((u) => u.author === selectedUser) ?? null;
|
onClick={onClick}
|
||||||
|
style={{ ...styles.buttonSecondary, ...exploreButtonStyle }}
|
||||||
|
>
|
||||||
|
Explore
|
||||||
|
</button>
|
||||||
|
);
|
||||||
|
|
||||||
console.log(summary)
|
const SummaryStats = ({
|
||||||
|
userData,
|
||||||
|
timeData,
|
||||||
|
linguisticData,
|
||||||
|
summary,
|
||||||
|
onExplore,
|
||||||
|
}: SummaryStatsProps) => {
|
||||||
|
const wordCloudWords = useMemo(
|
||||||
|
() =>
|
||||||
|
convertFrequencyData(
|
||||||
|
(linguisticData?.word_frequencies ?? []).slice(0, MAX_WORDCLOUD_WORDS),
|
||||||
|
),
|
||||||
|
[linguisticData?.word_frequencies],
|
||||||
|
);
|
||||||
|
|
||||||
return (
|
const topUsersPreview = useMemo(
|
||||||
|
() => (userData?.top_users ?? []).slice(0, 100),
|
||||||
|
[userData?.top_users],
|
||||||
|
);
|
||||||
|
|
||||||
|
return (
|
||||||
<div style={styles.page}>
|
<div style={styles.page}>
|
||||||
|
<div style={{ ...styles.container, ...styles.grid }}>
|
||||||
|
<Card
|
||||||
|
label="Total Activity"
|
||||||
|
value={summary?.total_events ?? "-"}
|
||||||
|
sublabel="Posts + comments"
|
||||||
|
rightSlot={renderExploreButton(() => onExplore(buildAllRecordsSpec()))}
|
||||||
|
style={{ gridColumn: "span 4" }}
|
||||||
|
/>
|
||||||
|
<Card
|
||||||
|
label="Active People"
|
||||||
|
value={summary?.unique_users ?? "-"}
|
||||||
|
sublabel="Distinct users"
|
||||||
|
rightSlot={renderExploreButton(() => onExplore(buildAllRecordsSpec()))}
|
||||||
|
style={{ gridColumn: "span 4" }}
|
||||||
|
/>
|
||||||
|
<Card
|
||||||
|
label="Posts vs Comments"
|
||||||
|
value={
|
||||||
|
summary ? `${summary.total_posts} / ${summary.total_comments}` : "-"
|
||||||
|
}
|
||||||
|
sublabel={`Comments per post: ${summary?.comments_per_post ?? "-"}`}
|
||||||
|
rightSlot={renderExploreButton(() => onExplore(buildAllRecordsSpec()))}
|
||||||
|
style={{ gridColumn: "span 4" }}
|
||||||
|
/>
|
||||||
|
|
||||||
{/* main grid*/}
|
<Card
|
||||||
<div style={{ ...styles.container, ...styles.grid}}>
|
label="Time Range"
|
||||||
<Card
|
value={
|
||||||
label="Total Events"
|
summary?.time_range
|
||||||
value={summary?.total_events ?? "—"}
|
? formatDateRange(summary.time_range.start, summary.time_range.end)
|
||||||
sublabel="Posts + comments"
|
: "-"
|
||||||
style={{
|
}
|
||||||
gridColumn: "span 4"
|
sublabel="Based on dataset timestamps"
|
||||||
}}
|
rightSlot={renderExploreButton(() => onExplore(buildAllRecordsSpec()))}
|
||||||
/>
|
style={{ gridColumn: "span 4" }}
|
||||||
<Card
|
/>
|
||||||
label="Unique Users"
|
|
||||||
value={summary?.unique_users ?? "—"}
|
|
||||||
sublabel="Distinct authors"
|
|
||||||
style={{
|
|
||||||
gridColumn: "span 4"
|
|
||||||
}}
|
|
||||||
/>
|
|
||||||
<Card
|
|
||||||
label="Posts / Comments"
|
|
||||||
value={
|
|
||||||
summary
|
|
||||||
? `${summary.total_posts} / ${summary.total_comments}`
|
|
||||||
: "—"
|
|
||||||
}
|
|
||||||
sublabel={`Comments per post: ${summary?.comments_per_post ?? "—"}`}
|
|
||||||
style={{
|
|
||||||
gridColumn: "span 4"
|
|
||||||
}}
|
|
||||||
/>
|
|
||||||
|
|
||||||
<Card
|
<Card
|
||||||
label="Time Range"
|
label="One-Time Users"
|
||||||
value={
|
value={
|
||||||
summary?.time_range
|
typeof summary?.lurker_ratio === "number"
|
||||||
? formatDateRange(summary.time_range.start, summary.time_range.end)
|
? `${Math.round(summary.lurker_ratio * 100)}%`
|
||||||
: "—"
|
: "-"
|
||||||
}
|
}
|
||||||
sublabel="Based on dataset timestamps"
|
sublabel="Users with only one event"
|
||||||
style={{
|
rightSlot={renderExploreButton(() => onExplore(buildOneTimeUsersSpec()))}
|
||||||
gridColumn: "span 4"
|
style={{ gridColumn: "span 4" }}
|
||||||
}}
|
/>
|
||||||
/>
|
|
||||||
|
|
||||||
<Card
|
<Card
|
||||||
label="Lurker Ratio"
|
label="Sources"
|
||||||
value={
|
value={summary?.sources?.length ?? "-"}
|
||||||
typeof summary?.lurker_ratio === "number"
|
sublabel={
|
||||||
? `${Math.round(summary.lurker_ratio * 100)}%`
|
summary?.sources?.length
|
||||||
: "—"
|
? summary.sources.slice(0, 3).join(", ") +
|
||||||
}
|
(summary.sources.length > 3 ? "..." : "")
|
||||||
sublabel="Users with only 1 event"
|
: "-"
|
||||||
style={{
|
}
|
||||||
gridColumn: "span 4"
|
rightSlot={renderExploreButton(() => onExplore(buildAllRecordsSpec()))}
|
||||||
}}
|
style={{ gridColumn: "span 4" }}
|
||||||
/>
|
/>
|
||||||
|
|
||||||
<Card
|
|
||||||
label="Sources"
|
|
||||||
value={summary?.sources?.length ?? "—"}
|
|
||||||
sublabel={
|
|
||||||
summary?.sources?.length
|
|
||||||
? summary.sources.slice(0, 3).join(", ") +
|
|
||||||
(summary.sources.length > 3 ? "…" : "")
|
|
||||||
: "—"
|
|
||||||
}
|
|
||||||
style={{
|
|
||||||
gridColumn: "span 4"
|
|
||||||
}}
|
|
||||||
/>
|
|
||||||
|
|
||||||
{/* events per day */}
|
|
||||||
<div style={{ ...styles.card, gridColumn: "span 5" }}>
|
<div style={{ ...styles.card, gridColumn: "span 5" }}>
|
||||||
<h2 style={styles.sectionTitle}>Events per Day</h2>
|
<h2 style={styles.sectionTitle}>Activity Over Time</h2>
|
||||||
<p style={styles.sectionSubtitle}>Trend of activity over time</p>
|
<p style={styles.sectionSubtitle}>How much posting happened each day.</p>
|
||||||
|
|
||||||
<div style={styles.chartWrapper}>
|
<div style={styles.chartWrapper}>
|
||||||
<ResponsiveContainer width="100%" height="100%">
|
<ResponsiveContainer width="100%" height="100%">
|
||||||
<LineChart data={timeData?.events_per_day.filter((d) => new Date(d.date) >= new Date('2026-01-10'))}>
|
<LineChart
|
||||||
|
data={timeData?.events_per_day ?? []}
|
||||||
|
onClick={(state: unknown) => {
|
||||||
|
const payload = (state as { activePayload?: Array<{ payload?: { date?: string } }> })
|
||||||
|
?.activePayload?.[0]?.payload as
|
||||||
|
| { date?: string }
|
||||||
|
| undefined;
|
||||||
|
if (payload?.date) {
|
||||||
|
onExplore(buildDateBucketSpec(String(payload.date)));
|
||||||
|
}
|
||||||
|
}}
|
||||||
|
>
|
||||||
<CartesianGrid strokeDasharray="3 3" />
|
<CartesianGrid strokeDasharray="3 3" />
|
||||||
<XAxis dataKey="date" />
|
<XAxis dataKey="date" />
|
||||||
<YAxis />
|
<YAxis />
|
||||||
<Tooltip />
|
<Tooltip />
|
||||||
<Line type="monotone" dataKey="count" name="Events" />
|
<Line
|
||||||
</LineChart>
|
type="monotone"
|
||||||
|
dataKey="count"
|
||||||
|
name="Events"
|
||||||
|
isAnimationActive={false}
|
||||||
|
/>
|
||||||
|
</LineChart>
|
||||||
</ResponsiveContainer>
|
</ResponsiveContainer>
|
||||||
</div>
|
</div>
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
{/* Word Cloud */}
|
|
||||||
<div style={{ ...styles.card, gridColumn: "span 4" }}>
|
<div style={{ ...styles.card, gridColumn: "span 4" }}>
|
||||||
<h2 style={styles.sectionTitle}>Word Cloud</h2>
|
<h2 style={styles.sectionTitle}>Common Words</h2>
|
||||||
<p style={styles.sectionSubtitle}>Most common terms across events</p>
|
<p style={styles.sectionSubtitle}>
|
||||||
|
Frequently used words across the dataset.
|
||||||
|
</p>
|
||||||
|
|
||||||
<div style={styles.chartWrapper}>
|
<div style={styles.chartWrapper}>
|
||||||
<ReactWordcloud
|
<WordCloudPanel words={wordCloudWords} />
|
||||||
words={convertFrequencyData(contentData?.word_frequencies ?? [])}
|
</div>
|
||||||
options={{
|
|
||||||
rotations: 2,
|
|
||||||
rotationAngles: [0, 90],
|
|
||||||
fontSizes: [14, 60],
|
|
||||||
enableTooltip: true,
|
|
||||||
}}
|
|
||||||
/>
|
|
||||||
</div>
|
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
{/* Top Users */}
|
<div
|
||||||
<div style={{...styles.card, ...styles.scrollArea, gridColumn: "span 3",
|
style={{ ...styles.card, ...styles.scrollArea, gridColumn: "span 3" }}
|
||||||
}}
|
|
||||||
>
|
>
|
||||||
<h2 style={styles.sectionTitle}>Top Users</h2>
|
<h2 style={styles.sectionTitle}>Most Active Users</h2>
|
||||||
<p style={styles.sectionSubtitle}>Most active authors</p>
|
<p style={styles.sectionSubtitle}>Who posted the most events.</p>
|
||||||
|
|
||||||
<div style={styles.topUsersList}>
|
<div style={styles.topUsersList}>
|
||||||
{userData?.top_users.slice(0, 100).map((item) => (
|
{topUsersPreview.map((item) => (
|
||||||
<div
|
<div
|
||||||
key={`${item.author}-${item.source}`}
|
key={`${item.author}-${item.source}`}
|
||||||
style={{ ...styles.topUserItem, cursor: "pointer" }}
|
style={{ ...styles.topUserItem, cursor: "pointer" }}
|
||||||
onClick={() => setSelectedUser(item.author)}
|
onClick={() => onExplore(buildUserSpec(item.author))}
|
||||||
>
|
>
|
||||||
<div style={styles.topUserName}>{item.author}</div>
|
<div style={styles.topUserName}>{item.author}</div>
|
||||||
<div style={styles.topUserMeta}>
|
<div style={styles.topUserMeta}>
|
||||||
{item.source} • {item.count} events
|
{item.source} • {item.count} events
|
||||||
</div>
|
|
||||||
</div>
|
</div>
|
||||||
|
</div>
|
||||||
))}
|
))}
|
||||||
</div>
|
</div>
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
{/* Heatmap */}
|
|
||||||
<div style={{ ...styles.card, gridColumn: "span 12" }}>
|
<div style={{ ...styles.card, gridColumn: "span 12" }}>
|
||||||
<h2 style={styles.sectionTitle}>Heatmap</h2>
|
<h2 style={styles.sectionTitle}>Weekly Activity Pattern</h2>
|
||||||
<p style={styles.sectionSubtitle}>Activity density across time</p>
|
<p style={styles.sectionSubtitle}>
|
||||||
|
When activity tends to happen by weekday and hour.
|
||||||
|
</p>
|
||||||
|
|
||||||
<div style={styles.heatmapWrapper}>
|
<div style={styles.heatmapWrapper}>
|
||||||
<ActivityHeatmap data={timeData?.weekday_hour_heatmap ?? []} />
|
<ActivityHeatmap data={timeData?.weekday_hour_heatmap ?? []} />
|
||||||
</div>
|
</div>
|
||||||
</div>
|
</div>
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
<UserModal
|
|
||||||
open={!!selectedUser}
|
|
||||||
onClose={() => setSelectedUser(null)}
|
|
||||||
username={selectedUser ?? ""}
|
|
||||||
userData={selectedUserData}
|
|
||||||
/>
|
|
||||||
</div>
|
</div>
|
||||||
);
|
);
|
||||||
}
|
};
|
||||||
|
|
||||||
export default SummaryStats;
|
export default SummaryStats;
|
||||||
|
|||||||
@@ -11,7 +11,16 @@ type Props = {
|
|||||||
username: string;
|
username: string;
|
||||||
};
|
};
|
||||||
|
|
||||||
export default function UserModal({ open, onClose, userData, username }: Props) {
|
export default function UserModal({
|
||||||
|
open,
|
||||||
|
onClose,
|
||||||
|
userData,
|
||||||
|
username,
|
||||||
|
}: Props) {
|
||||||
|
const dominantEmotionEntry = Object.entries(
|
||||||
|
userData?.avg_emotions ?? {},
|
||||||
|
).sort((a, b) => b[1] - a[1])[0];
|
||||||
|
|
||||||
return (
|
return (
|
||||||
<Dialog open={open} onClose={onClose} style={styles.modalRoot}>
|
<Dialog open={open} onClose={onClose} style={styles.modalRoot}>
|
||||||
<div style={styles.modalBackdrop} />
|
<div style={styles.modalBackdrop} />
|
||||||
@@ -33,7 +42,9 @@ export default function UserModal({ open, onClose, userData, username }: Props)
|
|||||||
<p style={styles.sectionSubtitle}>No data for this user.</p>
|
<p style={styles.sectionSubtitle}>No data for this user.</p>
|
||||||
) : (
|
) : (
|
||||||
<div style={styles.topUsersList}>
|
<div style={styles.topUsersList}>
|
||||||
<div style={{...styles.topUserName, fontSize: 20}}>{userData.author}</div>
|
<div style={{ ...styles.topUserName, fontSize: 20 }}>
|
||||||
|
{userData.author}
|
||||||
|
</div>
|
||||||
<div style={styles.topUserItem}>
|
<div style={styles.topUserItem}>
|
||||||
<div style={styles.topUserName}>Posts</div>
|
<div style={styles.topUserName}>Posts</div>
|
||||||
<div style={styles.topUserMeta}>{userData.post}</div>
|
<div style={styles.topUserMeta}>{userData.post}</div>
|
||||||
@@ -62,7 +73,27 @@ export default function UserModal({ open, onClose, userData, username }: Props)
|
|||||||
<div style={styles.topUserItem}>
|
<div style={styles.topUserItem}>
|
||||||
<div style={styles.topUserName}>Vocab Richness</div>
|
<div style={styles.topUserName}>Vocab Richness</div>
|
||||||
<div style={styles.topUserMeta}>
|
<div style={styles.topUserMeta}>
|
||||||
{userData.vocab.vocab_richness} (avg {userData.vocab.avg_words_per_event} words/event)
|
{userData.vocab.vocab_richness} (avg{" "}
|
||||||
|
{userData.vocab.avg_words_per_event} words/event)
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
) : null}
|
||||||
|
|
||||||
|
{dominantEmotionEntry ? (
|
||||||
|
<div style={styles.topUserItem}>
|
||||||
|
<div style={styles.topUserName}>Dominant Avg Emotion</div>
|
||||||
|
<div style={styles.topUserMeta}>
|
||||||
|
{dominantEmotionEntry[0].replace("emotion_", "")} (
|
||||||
|
{dominantEmotionEntry[1].toFixed(3)})
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
) : null}
|
||||||
|
|
||||||
|
{userData.dominant_topic ? (
|
||||||
|
<div style={styles.topUserItem}>
|
||||||
|
<div style={styles.topUserName}>Most Common Topic</div>
|
||||||
|
<div style={styles.topUserMeta}>
|
||||||
|
{userData.dominant_topic.topic} ({userData.dominant_topic.count} events)
|
||||||
</div>
|
</div>
|
||||||
</div>
|
</div>
|
||||||
) : null}
|
) : null}
|
||||||
|
|||||||
@@ -1,49 +1,64 @@
|
|||||||
import { useEffect, useMemo, useRef, useState } from "react";
|
import { useEffect, useMemo, useRef, useState } from "react";
|
||||||
import ForceGraph3D from "react-force-graph-3d";
|
import ForceGraph3D from "react-force-graph-3d";
|
||||||
|
|
||||||
import {
|
import { type TopUser, type InteractionGraph } from "../types/ApiTypes";
|
||||||
type UserAnalysisResponse,
|
|
||||||
type InteractionGraph
|
|
||||||
} from '../types/ApiTypes';
|
|
||||||
|
|
||||||
import StatsStyling from "../styles/stats_styling";
|
import StatsStyling from "../styles/stats_styling";
|
||||||
import Card from "./Card";
|
import Card from "./Card";
|
||||||
|
import {
|
||||||
|
buildReplyPairSpec,
|
||||||
|
toText,
|
||||||
|
buildUserSpec,
|
||||||
|
type CorpusExplorerSpec,
|
||||||
|
} from "../utils/corpusExplorer";
|
||||||
|
|
||||||
const styles = StatsStyling;
|
const styles = StatsStyling;
|
||||||
|
|
||||||
type GraphLink = {
|
type GraphLink = {
|
||||||
source: string;
|
source: string;
|
||||||
target: string;
|
target: string;
|
||||||
value: number;
|
value: number;
|
||||||
};
|
};
|
||||||
|
|
||||||
function ApiToGraphData(apiData: InteractionGraph) {
|
function toGraphData(apiData: InteractionGraph) {
|
||||||
const nodes = Object.keys(apiData).map(username => ({ id: username }));
|
const links: GraphLink[] = [];
|
||||||
const links: GraphLink[] = [];
|
const connectedNodeIds = new Set<string>();
|
||||||
|
|
||||||
for (const [source, targets] of Object.entries(apiData)) {
|
for (const [source, targets] of Object.entries(apiData)) {
|
||||||
for (const [target, count] of Object.entries(targets)) {
|
for (const [target, count] of Object.entries(targets)) {
|
||||||
links.push({ source, target, value: count });
|
if (count < 2 || source === "[deleted]" || target === "[deleted]") {
|
||||||
}
|
continue;
|
||||||
|
}
|
||||||
|
links.push({ source, target, value: count });
|
||||||
|
connectedNodeIds.add(source);
|
||||||
|
connectedNodeIds.add(target);
|
||||||
}
|
}
|
||||||
|
}
|
||||||
// drop low-value and deleted interactions to reduce clutter
|
|
||||||
const filteredLinks = links.filter(link =>
|
|
||||||
link.value >= 2 &&
|
|
||||||
link.source !== "[deleted]" &&
|
|
||||||
link.target !== "[deleted]"
|
|
||||||
);
|
|
||||||
|
|
||||||
// also filter out nodes that are no longer connected after link filtering
|
const filteredNodes = Array.from(connectedNodeIds, (id) => ({ id }));
|
||||||
const connectedNodeIds = new Set(filteredLinks.flatMap(link => [link.source, link.target]));
|
|
||||||
const filteredNodes = nodes.filter(node => connectedNodeIds.has(node.id));
|
|
||||||
|
|
||||||
return { nodes: filteredNodes, links: filteredLinks};
|
return { nodes: filteredNodes, links };
|
||||||
}
|
}
|
||||||
|
|
||||||
|
type UserStatsProps = {
|
||||||
|
topUsers: TopUser[];
|
||||||
|
interactionGraph: InteractionGraph;
|
||||||
|
totalUsers: number;
|
||||||
|
mostCommentHeavyUser: { author: string; commentShare: number } | null;
|
||||||
|
onExplore: (spec: CorpusExplorerSpec) => void;
|
||||||
|
};
|
||||||
|
|
||||||
const UserStats = (props: { data: UserAnalysisResponse }) => {
|
const UserStats = ({
|
||||||
const graphData = useMemo(() => ApiToGraphData(props.data.interaction_graph), [props.data.interaction_graph]);
|
topUsers,
|
||||||
|
interactionGraph,
|
||||||
|
totalUsers,
|
||||||
|
mostCommentHeavyUser,
|
||||||
|
onExplore,
|
||||||
|
}: UserStatsProps) => {
|
||||||
|
const graphData = useMemo(
|
||||||
|
() => toGraphData(interactionGraph),
|
||||||
|
[interactionGraph],
|
||||||
|
);
|
||||||
const graphContainerRef = useRef<HTMLDivElement | null>(null);
|
const graphContainerRef = useRef<HTMLDivElement | null>(null);
|
||||||
const [graphSize, setGraphSize] = useState({ width: 720, height: 540 });
|
const [graphSize, setGraphSize] = useState({ width: 720, height: 540 });
|
||||||
|
|
||||||
@@ -61,88 +76,155 @@ const UserStats = (props: { data: UserAnalysisResponse }) => {
|
|||||||
return () => window.removeEventListener("resize", updateGraphSize);
|
return () => window.removeEventListener("resize", updateGraphSize);
|
||||||
}, []);
|
}, []);
|
||||||
|
|
||||||
const totalUsers = props.data.users.length;
|
|
||||||
const connectedUsers = graphData.nodes.length;
|
const connectedUsers = graphData.nodes.length;
|
||||||
const totalInteractions = graphData.links.reduce((sum, link) => sum + link.value, 0);
|
const totalInteractions = graphData.links.reduce(
|
||||||
const avgInteractionsPerConnectedUser = connectedUsers ? totalInteractions / connectedUsers : 0;
|
(sum, link) => sum + link.value,
|
||||||
|
0,
|
||||||
|
);
|
||||||
|
const avgInteractionsPerConnectedUser = connectedUsers
|
||||||
|
? totalInteractions / connectedUsers
|
||||||
|
: 0;
|
||||||
|
|
||||||
const strongestLink = graphData.links.reduce<GraphLink | null>((best, current) => {
|
const strongestLink = graphData.links.reduce<GraphLink | null>(
|
||||||
if (!best || current.value > best.value) {
|
(best, current) => {
|
||||||
return current;
|
if (!best || current.value > best.value) {
|
||||||
}
|
return current;
|
||||||
return best;
|
}
|
||||||
}, null);
|
return best;
|
||||||
|
},
|
||||||
|
null,
|
||||||
|
);
|
||||||
|
|
||||||
const highlyInteractiveUser = [...props.data.users].sort((a, b) => b.comment_share - a.comment_share)[0];
|
const mostActiveUser = topUsers.find((u) => u.author !== "[deleted]");
|
||||||
|
const strongestLinkSource = strongestLink ? toText(strongestLink.source) : "";
|
||||||
const mostActiveUser = props.data.top_users.find(u => u.author !== "[deleted]");
|
const strongestLinkTarget = strongestLink ? toText(strongestLink.target) : "";
|
||||||
|
|
||||||
return (
|
return (
|
||||||
<div style={styles.page}>
|
<div style={styles.page}>
|
||||||
<div style={{ ...styles.container, ...styles.grid }}>
|
<div style={{ ...styles.container, ...styles.grid }}>
|
||||||
<Card
|
<Card
|
||||||
label="Users"
|
label="Users"
|
||||||
value={totalUsers.toLocaleString()}
|
value={totalUsers.toLocaleString()}
|
||||||
sublabel={`${connectedUsers.toLocaleString()} users in filtered graph`}
|
sublabel={`${connectedUsers.toLocaleString()} users in filtered graph`}
|
||||||
style={{ gridColumn: "span 3" }}
|
style={{ gridColumn: "span 3" }}
|
||||||
/>
|
/>
|
||||||
<Card
|
<Card
|
||||||
label="Interactions"
|
label="Replies"
|
||||||
value={totalInteractions.toLocaleString()}
|
value={totalInteractions.toLocaleString()}
|
||||||
sublabel="Filtered links (2+ interactions)"
|
sublabel="Links with at least 2 replies"
|
||||||
style={{ gridColumn: "span 3" }}
|
style={{ gridColumn: "span 3" }}
|
||||||
/>
|
/>
|
||||||
<Card
|
<Card
|
||||||
label="Average Intensity"
|
label="Replies per Connected User"
|
||||||
value={avgInteractionsPerConnectedUser.toFixed(1)}
|
value={avgInteractionsPerConnectedUser.toFixed(1)}
|
||||||
sublabel="Interactions per connected user"
|
sublabel="Average from visible graph links"
|
||||||
style={{ gridColumn: "span 3" }}
|
style={{ gridColumn: "span 3" }}
|
||||||
/>
|
/>
|
||||||
<Card
|
<Card
|
||||||
label="Most Active User"
|
label="Most Active User"
|
||||||
value={mostActiveUser?.author ?? "—"}
|
value={mostActiveUser?.author ?? "-"}
|
||||||
sublabel={mostActiveUser ? `${mostActiveUser.count.toLocaleString()} events` : "No user activity found"}
|
sublabel={
|
||||||
style={{ gridColumn: "span 3" }}
|
mostActiveUser
|
||||||
/>
|
? `${mostActiveUser.count.toLocaleString()} events`
|
||||||
|
: "No user activity found"
|
||||||
|
}
|
||||||
|
rightSlot={
|
||||||
|
mostActiveUser ? (
|
||||||
|
<button
|
||||||
|
onClick={() => onExplore(buildUserSpec(mostActiveUser.author))}
|
||||||
|
style={styles.buttonSecondary}
|
||||||
|
>
|
||||||
|
Explore
|
||||||
|
</button>
|
||||||
|
) : null
|
||||||
|
}
|
||||||
|
style={{ gridColumn: "span 3" }}
|
||||||
|
/>
|
||||||
|
|
||||||
<Card
|
<Card
|
||||||
label="Strongest Connection"
|
label="Strongest User Link"
|
||||||
value={strongestLink ? `${strongestLink.source} -> ${strongestLink.target}` : "—"}
|
value={
|
||||||
sublabel={strongestLink ? `${strongestLink.value.toLocaleString()} interactions` : "No graph edges after filtering"}
|
strongestLinkSource && strongestLinkTarget
|
||||||
style={{ gridColumn: "span 6" }}
|
? `${strongestLinkSource} -> ${strongestLinkTarget}`
|
||||||
/>
|
: "-"
|
||||||
<Card
|
}
|
||||||
label="Most Reply-Driven User"
|
sublabel={
|
||||||
value={highlyInteractiveUser?.author ?? "—"}
|
strongestLink
|
||||||
sublabel={
|
? `${strongestLink.value.toLocaleString()} replies`
|
||||||
highlyInteractiveUser
|
: "No graph links after filtering"
|
||||||
? `${Math.round(highlyInteractiveUser.comment_share * 100)}% comments`
|
}
|
||||||
: "No user distribution available"
|
rightSlot={
|
||||||
}
|
strongestLinkSource && strongestLinkTarget ? (
|
||||||
style={{ gridColumn: "span 6" }}
|
<button
|
||||||
/>
|
onClick={() =>
|
||||||
|
onExplore(buildReplyPairSpec(strongestLinkSource, strongestLinkTarget))
|
||||||
|
}
|
||||||
|
style={styles.buttonSecondary}
|
||||||
|
>
|
||||||
|
Explore
|
||||||
|
</button>
|
||||||
|
) : null
|
||||||
|
}
|
||||||
|
style={{ gridColumn: "span 6" }}
|
||||||
|
/>
|
||||||
|
<Card
|
||||||
|
label="Most Comment-Heavy User"
|
||||||
|
value={mostCommentHeavyUser?.author ?? "-"}
|
||||||
|
sublabel={
|
||||||
|
mostCommentHeavyUser
|
||||||
|
? `${Math.round(mostCommentHeavyUser.commentShare * 100)}% comments`
|
||||||
|
: "No user distribution available"
|
||||||
|
}
|
||||||
|
rightSlot={
|
||||||
|
mostCommentHeavyUser ? (
|
||||||
|
<button
|
||||||
|
onClick={() => onExplore(buildUserSpec(mostCommentHeavyUser.author))}
|
||||||
|
style={styles.buttonSecondary}
|
||||||
|
>
|
||||||
|
Explore
|
||||||
|
</button>
|
||||||
|
) : null
|
||||||
|
}
|
||||||
|
style={{ gridColumn: "span 6" }}
|
||||||
|
/>
|
||||||
|
|
||||||
<div style={{ ...styles.card, gridColumn: "span 12" }}>
|
<div style={{ ...styles.card, gridColumn: "span 12" }}>
|
||||||
<h2 style={styles.sectionTitle}>User Interaction Graph</h2>
|
<h2 style={styles.sectionTitle}>User Interaction Graph</h2>
|
||||||
<p style={styles.sectionSubtitle}>
|
<p style={styles.sectionSubtitle}>
|
||||||
Nodes represent users and links represent conversation interactions.
|
Each node is a user, and each link shows replies between them.
|
||||||
</p>
|
</p>
|
||||||
<div ref={graphContainerRef} style={{ width: "100%", height: graphSize.height }}>
|
<div
|
||||||
<ForceGraph3D
|
ref={graphContainerRef}
|
||||||
width={graphSize.width}
|
style={{ width: "100%", height: graphSize.height }}
|
||||||
height={graphSize.height}
|
>
|
||||||
graphData={graphData}
|
<ForceGraph3D
|
||||||
nodeAutoColorBy="id"
|
width={graphSize.width}
|
||||||
linkDirectionalParticles={1}
|
height={graphSize.height}
|
||||||
linkDirectionalParticleSpeed={0.004}
|
graphData={graphData}
|
||||||
linkWidth={(link) => Math.sqrt(Number(link.value))}
|
nodeAutoColorBy="id"
|
||||||
nodeLabel={(node) => `${node.id}`}
|
linkDirectionalParticles={1}
|
||||||
/>
|
linkDirectionalParticleSpeed={0.004}
|
||||||
</div>
|
linkWidth={(link) => Math.sqrt(Number(link.value))}
|
||||||
|
nodeLabel={(node) => `${node.id}`}
|
||||||
|
onNodeClick={(node) => {
|
||||||
|
const userId = toText(node.id);
|
||||||
|
if (userId) {
|
||||||
|
onExplore(buildUserSpec(userId));
|
||||||
|
}
|
||||||
|
}}
|
||||||
|
onLinkClick={(link) => {
|
||||||
|
const source = toText(link.source);
|
||||||
|
const target = toText(link.target);
|
||||||
|
if (source && target) {
|
||||||
|
onExplore(buildReplyPairSpec(source, target));
|
||||||
|
}
|
||||||
|
}}
|
||||||
|
/>
|
||||||
</div>
|
</div>
|
||||||
</div>
|
</div>
|
||||||
|
</div>
|
||||||
</div>
|
</div>
|
||||||
);
|
);
|
||||||
}
|
};
|
||||||
|
|
||||||
export default UserStats;
|
export default UserStats;
|
||||||
|
|||||||
530
frontend/src/pages/AutoFetch.tsx
Normal file
@@ -0,0 +1,530 @@
|
|||||||
|
import axios from "axios";
|
||||||
|
import { useEffect, useState } from "react";
|
||||||
|
import { useNavigate } from "react-router-dom";
|
||||||
|
import StatsStyling from "../styles/stats_styling";
|
||||||
|
|
||||||
|
const styles = StatsStyling;
|
||||||
|
const API_BASE_URL = import.meta.env.VITE_BACKEND_URL;
|
||||||
|
|
||||||
|
type SourceOption = {
|
||||||
|
id: string;
|
||||||
|
label: string;
|
||||||
|
search_enabled?: boolean;
|
||||||
|
categories_enabled?: boolean;
|
||||||
|
searchEnabled?: boolean;
|
||||||
|
categoriesEnabled?: boolean;
|
||||||
|
};
|
||||||
|
|
||||||
|
type SourceConfig = {
|
||||||
|
sourceName: string;
|
||||||
|
limit: string;
|
||||||
|
search: string;
|
||||||
|
category: string;
|
||||||
|
};
|
||||||
|
|
||||||
|
type TopicMap = Record<string, string>;
|
||||||
|
|
||||||
|
const buildEmptySourceConfig = (sourceName = ""): SourceConfig => ({
|
||||||
|
sourceName,
|
||||||
|
limit: "100",
|
||||||
|
search: "",
|
||||||
|
category: "",
|
||||||
|
});
|
||||||
|
|
||||||
|
const supportsSearch = (source?: SourceOption): boolean =>
|
||||||
|
Boolean(source?.search_enabled ?? source?.searchEnabled);
|
||||||
|
|
||||||
|
const supportsCategories = (source?: SourceOption): boolean =>
|
||||||
|
Boolean(source?.categories_enabled ?? source?.categoriesEnabled);
|
||||||
|
|
||||||
|
const AutoFetchPage = () => {
|
||||||
|
const navigate = useNavigate();
|
||||||
|
const [datasetName, setDatasetName] = useState("");
|
||||||
|
const [sourceOptions, setSourceOptions] = useState<SourceOption[]>([]);
|
||||||
|
const [sourceConfigs, setSourceConfigs] = useState<SourceConfig[]>([]);
|
||||||
|
const [returnMessage, setReturnMessage] = useState("");
|
||||||
|
const [isLoadingSources, setIsLoadingSources] = useState(true);
|
||||||
|
const [isSubmitting, setIsSubmitting] = useState(false);
|
||||||
|
const [hasError, setHasError] = useState(false);
|
||||||
|
const [useCustomTopics, setUseCustomTopics] = useState(false);
|
||||||
|
const [customTopicsText, setCustomTopicsText] = useState("");
|
||||||
|
|
||||||
|
useEffect(() => {
|
||||||
|
axios
|
||||||
|
.get<SourceOption[]>(`${API_BASE_URL}/datasets/sources`)
|
||||||
|
.then((response) => {
|
||||||
|
const options = response.data || [];
|
||||||
|
setSourceOptions(options);
|
||||||
|
setSourceConfigs([buildEmptySourceConfig(options[0]?.id || "")]);
|
||||||
|
})
|
||||||
|
.catch((requestError: unknown) => {
|
||||||
|
setHasError(true);
|
||||||
|
if (axios.isAxiosError(requestError)) {
|
||||||
|
setReturnMessage(
|
||||||
|
`Failed to load available sources: ${String(
|
||||||
|
requestError.response?.data?.error || requestError.message,
|
||||||
|
)}`,
|
||||||
|
);
|
||||||
|
} else {
|
||||||
|
setReturnMessage("Failed to load available sources.");
|
||||||
|
}
|
||||||
|
})
|
||||||
|
.finally(() => {
|
||||||
|
setIsLoadingSources(false);
|
||||||
|
});
|
||||||
|
}, []);
|
||||||
|
|
||||||
|
const updateSourceConfig = (
|
||||||
|
index: number,
|
||||||
|
field: keyof SourceConfig,
|
||||||
|
value: string,
|
||||||
|
) => {
|
||||||
|
setSourceConfigs((previous) =>
|
||||||
|
previous.map((config, configIndex) =>
|
||||||
|
configIndex === index
|
||||||
|
? field === "sourceName"
|
||||||
|
? { ...config, sourceName: value, search: "", category: "" }
|
||||||
|
: { ...config, [field]: value }
|
||||||
|
: config,
|
||||||
|
),
|
||||||
|
);
|
||||||
|
};
|
||||||
|
|
||||||
|
const getSourceOption = (sourceName: string) =>
|
||||||
|
sourceOptions.find((option) => option.id === sourceName);
|
||||||
|
|
||||||
|
const addSourceConfig = () => {
|
||||||
|
setSourceConfigs((previous) => [
|
||||||
|
...previous,
|
||||||
|
buildEmptySourceConfig(sourceOptions[0]?.id || ""),
|
||||||
|
]);
|
||||||
|
};
|
||||||
|
|
||||||
|
const removeSourceConfig = (index: number) => {
|
||||||
|
setSourceConfigs((previous) =>
|
||||||
|
previous.filter((_, configIndex) => configIndex !== index),
|
||||||
|
);
|
||||||
|
};
|
||||||
|
|
||||||
|
const autoFetch = async () => {
|
||||||
|
const token = localStorage.getItem("access_token");
|
||||||
|
if (!token) {
|
||||||
|
setHasError(true);
|
||||||
|
setReturnMessage("You must be signed in to auto fetch a dataset.");
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
const normalizedDatasetName = datasetName.trim();
|
||||||
|
if (!normalizedDatasetName) {
|
||||||
|
setHasError(true);
|
||||||
|
setReturnMessage("Please add a dataset name before continuing.");
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
if (sourceConfigs.length === 0) {
|
||||||
|
setHasError(true);
|
||||||
|
setReturnMessage("Please add at least one source.");
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
const normalizedSources = sourceConfigs.map((source) => {
|
||||||
|
const sourceOption = getSourceOption(source.sourceName);
|
||||||
|
|
||||||
|
return {
|
||||||
|
name: source.sourceName,
|
||||||
|
limit: Number(source.limit || 100),
|
||||||
|
search: supportsSearch(sourceOption)
|
||||||
|
? source.search.trim() || undefined
|
||||||
|
: undefined,
|
||||||
|
category: supportsCategories(sourceOption)
|
||||||
|
? source.category.trim() || undefined
|
||||||
|
: undefined,
|
||||||
|
};
|
||||||
|
});
|
||||||
|
|
||||||
|
const invalidSource = normalizedSources.find(
|
||||||
|
(source) =>
|
||||||
|
!source.name || !Number.isFinite(source.limit) || source.limit <= 0,
|
||||||
|
);
|
||||||
|
|
||||||
|
if (invalidSource) {
|
||||||
|
setHasError(true);
|
||||||
|
setReturnMessage(
|
||||||
|
"Every source needs a name and a limit greater than zero.",
|
||||||
|
);
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
let normalizedTopics: TopicMap | undefined;
|
||||||
|
|
||||||
|
if (useCustomTopics) {
|
||||||
|
const customTopicsJson = customTopicsText.trim();
|
||||||
|
|
||||||
|
if (!customTopicsJson) {
|
||||||
|
setHasError(true);
|
||||||
|
setReturnMessage(
|
||||||
|
"Custom topics are enabled, so please provide a JSON topic map.",
|
||||||
|
);
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
let parsedTopics: unknown;
|
||||||
|
try {
|
||||||
|
parsedTopics = JSON.parse(customTopicsJson);
|
||||||
|
} catch {
|
||||||
|
setHasError(true);
|
||||||
|
setReturnMessage("Custom topic list must be valid JSON.");
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
if (
|
||||||
|
!parsedTopics ||
|
||||||
|
Array.isArray(parsedTopics) ||
|
||||||
|
typeof parsedTopics !== "object"
|
||||||
|
) {
|
||||||
|
setHasError(true);
|
||||||
|
setReturnMessage(
|
||||||
|
"Custom topic list must be a JSON object: {\"Topic\": \"keywords\"}.",
|
||||||
|
);
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
const entries = Object.entries(parsedTopics);
|
||||||
|
if (entries.length === 0) {
|
||||||
|
setHasError(true);
|
||||||
|
setReturnMessage("Custom topic list cannot be empty.");
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
const hasInvalidTopic = entries.some(
|
||||||
|
([topicName, keywords]) =>
|
||||||
|
!topicName.trim() ||
|
||||||
|
typeof keywords !== "string" ||
|
||||||
|
!keywords.trim(),
|
||||||
|
);
|
||||||
|
|
||||||
|
if (hasInvalidTopic) {
|
||||||
|
setHasError(true);
|
||||||
|
setReturnMessage(
|
||||||
|
"Every custom topic must have a non-empty name and keyword string.",
|
||||||
|
);
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
normalizedTopics = Object.fromEntries(
|
||||||
|
entries.map(([topicName, keywords]) => [
|
||||||
|
topicName.trim(),
|
||||||
|
String(keywords).trim(),
|
||||||
|
]),
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
const requestBody: {
|
||||||
|
name: string;
|
||||||
|
sources: Array<{
|
||||||
|
name: string;
|
||||||
|
limit: number;
|
||||||
|
search?: string;
|
||||||
|
category?: string;
|
||||||
|
}>;
|
||||||
|
topics?: TopicMap;
|
||||||
|
} = {
|
||||||
|
name: normalizedDatasetName,
|
||||||
|
sources: normalizedSources,
|
||||||
|
};
|
||||||
|
|
||||||
|
if (normalizedTopics) {
|
||||||
|
requestBody.topics = normalizedTopics;
|
||||||
|
}
|
||||||
|
|
||||||
|
try {
|
||||||
|
setIsSubmitting(true);
|
||||||
|
setHasError(false);
|
||||||
|
setReturnMessage("");
|
||||||
|
|
||||||
|
const response = await axios.post(
|
||||||
|
`${API_BASE_URL}/datasets/fetch`,
|
||||||
|
requestBody,
|
||||||
|
{
|
||||||
|
headers: {
|
||||||
|
Authorization: `Bearer ${token}`,
|
||||||
|
},
|
||||||
|
},
|
||||||
|
);
|
||||||
|
|
||||||
|
const datasetId = Number(response.data.dataset_id);
|
||||||
|
|
||||||
|
setReturnMessage(
|
||||||
|
`Auto fetch queued successfully (dataset #${datasetId}). Redirecting to processing status...`,
|
||||||
|
);
|
||||||
|
|
||||||
|
setTimeout(() => {
|
||||||
|
navigate(`/dataset/${datasetId}/status`);
|
||||||
|
}, 400);
|
||||||
|
} catch (requestError: unknown) {
|
||||||
|
setHasError(true);
|
||||||
|
if (axios.isAxiosError(requestError)) {
|
||||||
|
const message = String(
|
||||||
|
requestError.response?.data?.error ||
|
||||||
|
requestError.message ||
|
||||||
|
"Auto fetch failed.",
|
||||||
|
);
|
||||||
|
setReturnMessage(`Auto fetch failed: ${message}`);
|
||||||
|
} else {
|
||||||
|
setReturnMessage("Auto fetch failed due to an unexpected error.");
|
||||||
|
}
|
||||||
|
} finally {
|
||||||
|
setIsSubmitting(false);
|
||||||
|
}
|
||||||
|
};
|
||||||
|
|
||||||
|
return (
|
||||||
|
<div style={styles.page}>
|
||||||
|
<div style={styles.containerWide}>
|
||||||
|
<div style={{ ...styles.card, ...styles.headerBar }}>
|
||||||
|
<div>
|
||||||
|
<h1 style={styles.sectionHeaderTitle}>Auto Fetch Dataset</h1>
|
||||||
|
<p style={styles.sectionHeaderSubtitle}>
|
||||||
|
Select sources and fetch settings, then queue processing
|
||||||
|
automatically.
|
||||||
|
</p>
|
||||||
|
<p
|
||||||
|
style={{
|
||||||
|
...styles.subtleBodyText,
|
||||||
|
marginTop: 6,
|
||||||
|
color: "#9a6700",
|
||||||
|
}}
|
||||||
|
>
|
||||||
|
Warning: Fetching more than 250 posts from any single site can
|
||||||
|
take hours due to rate limits.
|
||||||
|
</p>
|
||||||
|
</div>
|
||||||
|
<button
|
||||||
|
type="button"
|
||||||
|
style={{
|
||||||
|
...styles.buttonPrimary,
|
||||||
|
opacity: isSubmitting || isLoadingSources ? 0.75 : 1,
|
||||||
|
}}
|
||||||
|
onClick={autoFetch}
|
||||||
|
disabled={isSubmitting || isLoadingSources}
|
||||||
|
>
|
||||||
|
{isSubmitting ? "Queueing..." : "Auto Fetch and Analyze"}
|
||||||
|
</button>
|
||||||
|
</div>
|
||||||
|
|
||||||
|
<div
|
||||||
|
style={{
|
||||||
|
...styles.grid,
|
||||||
|
marginTop: 14,
|
||||||
|
gridTemplateColumns: "repeat(auto-fit, minmax(280px, 1fr))",
|
||||||
|
}}
|
||||||
|
>
|
||||||
|
<div style={{ ...styles.card, gridColumn: "auto" }}>
|
||||||
|
<h2 style={{ ...styles.sectionTitle, color: "#24292f" }}>
|
||||||
|
Dataset Name
|
||||||
|
</h2>
|
||||||
|
<p style={styles.sectionSubtitle}>
|
||||||
|
Use a clear label so you can identify this run later.
|
||||||
|
</p>
|
||||||
|
<input
|
||||||
|
style={{ ...styles.input, ...styles.inputFullWidth }}
|
||||||
|
type="text"
|
||||||
|
placeholder="Example: r/cork subreddit - Jan 2026"
|
||||||
|
value={datasetName}
|
||||||
|
onChange={(event) => setDatasetName(event.target.value)}
|
||||||
|
/>
|
||||||
|
</div>
|
||||||
|
|
||||||
|
<div style={{ ...styles.card, gridColumn: "auto" }}>
|
||||||
|
<h2 style={{ ...styles.sectionTitle, color: "#24292f" }}>
|
||||||
|
Sources
|
||||||
|
</h2>
|
||||||
|
<p style={styles.sectionSubtitle}>
|
||||||
|
Configure source, limit, optional search, and optional category.
|
||||||
|
</p>
|
||||||
|
|
||||||
|
{isLoadingSources && (
|
||||||
|
<p style={styles.subtleBodyText}>Loading sources...</p>
|
||||||
|
)}
|
||||||
|
|
||||||
|
{!isLoadingSources && sourceOptions.length === 0 && (
|
||||||
|
<p style={styles.subtleBodyText}>
|
||||||
|
No source connectors are currently available.
|
||||||
|
</p>
|
||||||
|
)}
|
||||||
|
|
||||||
|
{!isLoadingSources && sourceOptions.length > 0 && (
|
||||||
|
<div
|
||||||
|
style={{ display: "flex", flexDirection: "column", gap: 10 }}
|
||||||
|
>
|
||||||
|
{sourceConfigs.map((source, index) => {
|
||||||
|
const sourceOption = getSourceOption(source.sourceName);
|
||||||
|
const searchEnabled = supportsSearch(sourceOption);
|
||||||
|
const categoriesEnabled = supportsCategories(sourceOption);
|
||||||
|
|
||||||
|
return (
|
||||||
|
<div
|
||||||
|
key={`source-${index}`}
|
||||||
|
style={{
|
||||||
|
border: "1px solid #d0d7de",
|
||||||
|
borderRadius: 8,
|
||||||
|
padding: 12,
|
||||||
|
background: "#f6f8fa",
|
||||||
|
display: "grid",
|
||||||
|
gap: 8,
|
||||||
|
}}
|
||||||
|
>
|
||||||
|
<select
|
||||||
|
value={source.sourceName}
|
||||||
|
style={{ ...styles.input, ...styles.inputFullWidth }}
|
||||||
|
onChange={(event) =>
|
||||||
|
updateSourceConfig(
|
||||||
|
index,
|
||||||
|
"sourceName",
|
||||||
|
event.target.value,
|
||||||
|
)
|
||||||
|
}
|
||||||
|
>
|
||||||
|
{sourceOptions.map((option) => (
|
||||||
|
<option key={option.id} value={option.id}>
|
||||||
|
{option.label}
|
||||||
|
</option>
|
||||||
|
))}
|
||||||
|
</select>
|
||||||
|
|
||||||
|
<input
|
||||||
|
type="number"
|
||||||
|
min={1}
|
||||||
|
value={source.limit}
|
||||||
|
placeholder="Limit"
|
||||||
|
style={{ ...styles.input, ...styles.inputFullWidth }}
|
||||||
|
onChange={(event) =>
|
||||||
|
updateSourceConfig(index, "limit", event.target.value)
|
||||||
|
}
|
||||||
|
/>
|
||||||
|
|
||||||
|
<input
|
||||||
|
type="text"
|
||||||
|
value={source.search}
|
||||||
|
placeholder={
|
||||||
|
searchEnabled
|
||||||
|
? "Search term (optional)"
|
||||||
|
: "Search not supported for this source"
|
||||||
|
}
|
||||||
|
style={{ ...styles.input, ...styles.inputFullWidth }}
|
||||||
|
disabled={!searchEnabled}
|
||||||
|
onChange={(event) =>
|
||||||
|
updateSourceConfig(
|
||||||
|
index,
|
||||||
|
"search",
|
||||||
|
event.target.value,
|
||||||
|
)
|
||||||
|
}
|
||||||
|
/>
|
||||||
|
|
||||||
|
<input
|
||||||
|
type="text"
|
||||||
|
value={source.category}
|
||||||
|
placeholder={
|
||||||
|
categoriesEnabled
|
||||||
|
? "Category (optional)"
|
||||||
|
: "Categories not supported for this source"
|
||||||
|
}
|
||||||
|
style={{ ...styles.input, ...styles.inputFullWidth }}
|
||||||
|
disabled={!categoriesEnabled}
|
||||||
|
onChange={(event) =>
|
||||||
|
updateSourceConfig(
|
||||||
|
index,
|
||||||
|
"category",
|
||||||
|
event.target.value,
|
||||||
|
)
|
||||||
|
}
|
||||||
|
/>
|
||||||
|
|
||||||
|
{sourceConfigs.length > 1 && (
|
||||||
|
<button
|
||||||
|
type="button"
|
||||||
|
style={styles.buttonSecondary}
|
||||||
|
onClick={() => removeSourceConfig(index)}
|
||||||
|
>
|
||||||
|
Remove source
|
||||||
|
</button>
|
||||||
|
)}
|
||||||
|
</div>
|
||||||
|
);
|
||||||
|
})}
|
||||||
|
|
||||||
|
<button
|
||||||
|
type="button"
|
||||||
|
style={styles.buttonSecondary}
|
||||||
|
onClick={addSourceConfig}
|
||||||
|
>
|
||||||
|
Add another source
|
||||||
|
</button>
|
||||||
|
</div>
|
||||||
|
)}
|
||||||
|
</div>
|
||||||
|
|
||||||
|
<div style={{ ...styles.card, gridColumn: "auto" }}>
|
||||||
|
<h2 style={{ ...styles.sectionTitle, color: "#24292f" }}>
|
||||||
|
Topic List
|
||||||
|
</h2>
|
||||||
|
<p style={styles.sectionSubtitle}>
|
||||||
|
Use the default topic list, or provide your own JSON topic map.
|
||||||
|
</p>
|
||||||
|
|
||||||
|
<label
|
||||||
|
style={{
|
||||||
|
display: "flex",
|
||||||
|
alignItems: "center",
|
||||||
|
gap: 8,
|
||||||
|
fontSize: 14,
|
||||||
|
color: "#24292f",
|
||||||
|
marginBottom: 10,
|
||||||
|
}}
|
||||||
|
>
|
||||||
|
<input
|
||||||
|
type="checkbox"
|
||||||
|
checked={useCustomTopics}
|
||||||
|
onChange={(event) => setUseCustomTopics(event.target.checked)}
|
||||||
|
/>
|
||||||
|
Use custom topic list
|
||||||
|
</label>
|
||||||
|
|
||||||
|
<textarea
|
||||||
|
value={customTopicsText}
|
||||||
|
onChange={(event) => setCustomTopicsText(event.target.value)}
|
||||||
|
disabled={!useCustomTopics}
|
||||||
|
placeholder='{"Politics": "election, policy, government", "Housing": "rent, landlords, tenancy"}'
|
||||||
|
style={{
|
||||||
|
...styles.input,
|
||||||
|
...styles.inputFullWidth,
|
||||||
|
minHeight: 170,
|
||||||
|
resize: "vertical",
|
||||||
|
fontFamily:
|
||||||
|
'"IBM Plex Mono", "Fira Code", "JetBrains Mono", monospace',
|
||||||
|
}}
|
||||||
|
/>
|
||||||
|
<p style={styles.subtleBodyText}>
|
||||||
|
Format: JSON object where each key is a topic and each value is a
|
||||||
|
keyword string.
|
||||||
|
</p>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
|
||||||
|
<div
|
||||||
|
style={{
|
||||||
|
...styles.card,
|
||||||
|
marginTop: 14,
|
||||||
|
...(hasError ? styles.alertCardError : styles.alertCardInfo),
|
||||||
|
}}
|
||||||
|
>
|
||||||
|
{returnMessage ||
|
||||||
|
"After queueing, your dataset is fetched and processed in the background automatically."}
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
);
|
||||||
|
};
|
||||||
|
|
||||||
|
export default AutoFetchPage;
|
||||||
@@ -1,341 +0,0 @@
|
|||||||
import axios from "axios";
|
|
||||||
import { useEffect, useState } from "react";
|
|
||||||
import { useNavigate } from "react-router-dom";
|
|
||||||
import StatsStyling from "../styles/stats_styling";
|
|
||||||
|
|
||||||
const styles = StatsStyling;
|
|
||||||
const API_BASE_URL = import.meta.env.VITE_BACKEND_URL;
|
|
||||||
|
|
||||||
type SourceOption = {
|
|
||||||
id: string;
|
|
||||||
label: string;
|
|
||||||
search_enabled?: boolean;
|
|
||||||
categories_enabled?: boolean;
|
|
||||||
searchEnabled?: boolean;
|
|
||||||
categoriesEnabled?: boolean;
|
|
||||||
};
|
|
||||||
|
|
||||||
type SourceConfig = {
|
|
||||||
sourceName: string;
|
|
||||||
limit: string;
|
|
||||||
search: string;
|
|
||||||
category: string;
|
|
||||||
};
|
|
||||||
|
|
||||||
const buildEmptySourceConfig = (sourceName = ""): SourceConfig => ({
|
|
||||||
sourceName,
|
|
||||||
limit: "100",
|
|
||||||
search: "",
|
|
||||||
category: "",
|
|
||||||
});
|
|
||||||
|
|
||||||
const supportsSearch = (source?: SourceOption): boolean =>
|
|
||||||
Boolean(source?.search_enabled ?? source?.searchEnabled);
|
|
||||||
|
|
||||||
const supportsCategories = (source?: SourceOption): boolean =>
|
|
||||||
Boolean(source?.categories_enabled ?? source?.categoriesEnabled);
|
|
||||||
|
|
||||||
const AutoScrapePage = () => {
|
|
||||||
const navigate = useNavigate();
|
|
||||||
const [datasetName, setDatasetName] = useState("");
|
|
||||||
const [sourceOptions, setSourceOptions] = useState<SourceOption[]>([]);
|
|
||||||
const [sourceConfigs, setSourceConfigs] = useState<SourceConfig[]>([]);
|
|
||||||
const [returnMessage, setReturnMessage] = useState("");
|
|
||||||
const [isLoadingSources, setIsLoadingSources] = useState(true);
|
|
||||||
const [isSubmitting, setIsSubmitting] = useState(false);
|
|
||||||
const [hasError, setHasError] = useState(false);
|
|
||||||
|
|
||||||
useEffect(() => {
|
|
||||||
axios
|
|
||||||
.get<SourceOption[]>(`${API_BASE_URL}/datasets/sources`)
|
|
||||||
.then((response) => {
|
|
||||||
const options = response.data || [];
|
|
||||||
setSourceOptions(options);
|
|
||||||
setSourceConfigs([buildEmptySourceConfig(options[0]?.id || "")]);
|
|
||||||
})
|
|
||||||
.catch((requestError: unknown) => {
|
|
||||||
setHasError(true);
|
|
||||||
if (axios.isAxiosError(requestError)) {
|
|
||||||
setReturnMessage(
|
|
||||||
`Failed to load available sources: ${String(
|
|
||||||
requestError.response?.data?.error || requestError.message
|
|
||||||
)}`
|
|
||||||
);
|
|
||||||
} else {
|
|
||||||
setReturnMessage("Failed to load available sources.");
|
|
||||||
}
|
|
||||||
})
|
|
||||||
.finally(() => {
|
|
||||||
setIsLoadingSources(false);
|
|
||||||
});
|
|
||||||
}, []);
|
|
||||||
|
|
||||||
const updateSourceConfig = (index: number, field: keyof SourceConfig, value: string) => {
|
|
||||||
setSourceConfigs((previous) =>
|
|
||||||
previous.map((config, configIndex) =>
|
|
||||||
configIndex === index
|
|
||||||
? field === "sourceName"
|
|
||||||
? { ...config, sourceName: value, search: "", category: "" }
|
|
||||||
: { ...config, [field]: value }
|
|
||||||
: config
|
|
||||||
)
|
|
||||||
);
|
|
||||||
};
|
|
||||||
|
|
||||||
const getSourceOption = (sourceName: string) =>
|
|
||||||
sourceOptions.find((option) => option.id === sourceName);
|
|
||||||
|
|
||||||
const addSourceConfig = () => {
|
|
||||||
setSourceConfigs((previous) => [
|
|
||||||
...previous,
|
|
||||||
buildEmptySourceConfig(sourceOptions[0]?.id || ""),
|
|
||||||
]);
|
|
||||||
};
|
|
||||||
|
|
||||||
const removeSourceConfig = (index: number) => {
|
|
||||||
setSourceConfigs((previous) => previous.filter((_, configIndex) => configIndex !== index));
|
|
||||||
};
|
|
||||||
|
|
||||||
const autoScrape = async () => {
|
|
||||||
const token = localStorage.getItem("access_token");
|
|
||||||
if (!token) {
|
|
||||||
setHasError(true);
|
|
||||||
setReturnMessage("You must be signed in to auto scrape a dataset.");
|
|
||||||
return;
|
|
||||||
}
|
|
||||||
|
|
||||||
const normalizedDatasetName = datasetName.trim();
|
|
||||||
if (!normalizedDatasetName) {
|
|
||||||
setHasError(true);
|
|
||||||
setReturnMessage("Please add a dataset name before continuing.");
|
|
||||||
return;
|
|
||||||
}
|
|
||||||
|
|
||||||
if (sourceConfigs.length === 0) {
|
|
||||||
setHasError(true);
|
|
||||||
setReturnMessage("Please add at least one source.");
|
|
||||||
return;
|
|
||||||
}
|
|
||||||
|
|
||||||
const normalizedSources = sourceConfigs.map((source) => {
|
|
||||||
const sourceOption = getSourceOption(source.sourceName);
|
|
||||||
|
|
||||||
return {
|
|
||||||
name: source.sourceName,
|
|
||||||
limit: Number(source.limit || 100),
|
|
||||||
search: supportsSearch(sourceOption) ? source.search.trim() || undefined : undefined,
|
|
||||||
category: supportsCategories(sourceOption)
|
|
||||||
? source.category.trim() || undefined
|
|
||||||
: undefined,
|
|
||||||
};
|
|
||||||
});
|
|
||||||
|
|
||||||
const invalidSource = normalizedSources.find(
|
|
||||||
(source) => !source.name || !Number.isFinite(source.limit) || source.limit <= 0
|
|
||||||
);
|
|
||||||
|
|
||||||
if (invalidSource) {
|
|
||||||
setHasError(true);
|
|
||||||
setReturnMessage("Every source needs a name and a limit greater than zero.");
|
|
||||||
return;
|
|
||||||
}
|
|
||||||
|
|
||||||
try {
|
|
||||||
setIsSubmitting(true);
|
|
||||||
setHasError(false);
|
|
||||||
setReturnMessage("");
|
|
||||||
|
|
||||||
const response = await axios.post(
|
|
||||||
`${API_BASE_URL}/datasets/scrape`,
|
|
||||||
{
|
|
||||||
name: normalizedDatasetName,
|
|
||||||
sources: normalizedSources,
|
|
||||||
},
|
|
||||||
{
|
|
||||||
headers: {
|
|
||||||
Authorization: `Bearer ${token}`,
|
|
||||||
},
|
|
||||||
}
|
|
||||||
);
|
|
||||||
|
|
||||||
const datasetId = Number(response.data.dataset_id);
|
|
||||||
|
|
||||||
setReturnMessage(
|
|
||||||
`Auto scrape queued successfully (dataset #${datasetId}). Redirecting to processing status...`
|
|
||||||
);
|
|
||||||
|
|
||||||
setTimeout(() => {
|
|
||||||
navigate(`/dataset/${datasetId}/status`);
|
|
||||||
}, 400);
|
|
||||||
} catch (requestError: unknown) {
|
|
||||||
setHasError(true);
|
|
||||||
if (axios.isAxiosError(requestError)) {
|
|
||||||
const message = String(
|
|
||||||
requestError.response?.data?.error || requestError.message || "Auto scrape failed."
|
|
||||||
);
|
|
||||||
setReturnMessage(`Auto scrape failed: ${message}`);
|
|
||||||
} else {
|
|
||||||
setReturnMessage("Auto scrape failed due to an unexpected error.");
|
|
||||||
}
|
|
||||||
} finally {
|
|
||||||
setIsSubmitting(false);
|
|
||||||
}
|
|
||||||
};
|
|
||||||
|
|
||||||
return (
|
|
||||||
<div style={styles.page}>
|
|
||||||
<div style={styles.containerWide}>
|
|
||||||
<div style={{ ...styles.card, ...styles.headerBar }}>
|
|
||||||
<div>
|
|
||||||
<h1 style={styles.sectionHeaderTitle}>Auto Scrape Dataset</h1>
|
|
||||||
<p style={styles.sectionHeaderSubtitle}>
|
|
||||||
Select sources and scrape settings, then queue processing automatically.
|
|
||||||
</p>
|
|
||||||
<p style={{ ...styles.subtleBodyText, marginTop: 6, color: "#9a6700" }}>
|
|
||||||
Warning: Scraping more than 250 posts from any single site can take hours.
|
|
||||||
</p>
|
|
||||||
</div>
|
|
||||||
<button
|
|
||||||
type="button"
|
|
||||||
style={{ ...styles.buttonPrimary, opacity: isSubmitting || isLoadingSources ? 0.75 : 1 }}
|
|
||||||
onClick={autoScrape}
|
|
||||||
disabled={isSubmitting || isLoadingSources}
|
|
||||||
>
|
|
||||||
{isSubmitting ? "Queueing..." : "Auto Scrape and Analyze"}
|
|
||||||
</button>
|
|
||||||
</div>
|
|
||||||
|
|
||||||
<div
|
|
||||||
style={{
|
|
||||||
...styles.grid,
|
|
||||||
marginTop: 14,
|
|
||||||
gridTemplateColumns: "repeat(auto-fit, minmax(280px, 1fr))",
|
|
||||||
}}
|
|
||||||
>
|
|
||||||
<div style={{ ...styles.card, gridColumn: "auto" }}>
|
|
||||||
<h2 style={{ ...styles.sectionTitle, color: "#24292f" }}>Dataset Name</h2>
|
|
||||||
<p style={styles.sectionSubtitle}>Use a clear label so you can identify this run later.</p>
|
|
||||||
<input
|
|
||||||
style={{ ...styles.input, ...styles.inputFullWidth }}
|
|
||||||
type="text"
|
|
||||||
placeholder="Example: r/cork subreddit - Jan 2026"
|
|
||||||
value={datasetName}
|
|
||||||
onChange={(event) => setDatasetName(event.target.value)}
|
|
||||||
/>
|
|
||||||
</div>
|
|
||||||
|
|
||||||
<div style={{ ...styles.card, gridColumn: "auto" }}>
|
|
||||||
<h2 style={{ ...styles.sectionTitle, color: "#24292f" }}>Sources</h2>
|
|
||||||
<p style={styles.sectionSubtitle}>
|
|
||||||
Configure source, limit, optional search, and optional category.
|
|
||||||
</p>
|
|
||||||
|
|
||||||
{isLoadingSources && <p style={styles.subtleBodyText}>Loading sources...</p>}
|
|
||||||
|
|
||||||
{!isLoadingSources && sourceOptions.length === 0 && (
|
|
||||||
<p style={styles.subtleBodyText}>No source connectors are currently available.</p>
|
|
||||||
)}
|
|
||||||
|
|
||||||
{!isLoadingSources && sourceOptions.length > 0 && (
|
|
||||||
<div style={{ display: "flex", flexDirection: "column", gap: 10 }}>
|
|
||||||
{sourceConfigs.map((source, index) => {
|
|
||||||
const sourceOption = getSourceOption(source.sourceName);
|
|
||||||
const searchEnabled = supportsSearch(sourceOption);
|
|
||||||
const categoriesEnabled = supportsCategories(sourceOption);
|
|
||||||
|
|
||||||
return (
|
|
||||||
<div
|
|
||||||
key={`source-${index}`}
|
|
||||||
style={{
|
|
||||||
border: "1px solid #d0d7de",
|
|
||||||
borderRadius: 8,
|
|
||||||
padding: 12,
|
|
||||||
background: "#f6f8fa",
|
|
||||||
display: "grid",
|
|
||||||
gap: 8,
|
|
||||||
}}
|
|
||||||
>
|
|
||||||
<select
|
|
||||||
value={source.sourceName}
|
|
||||||
style={{ ...styles.input, ...styles.inputFullWidth }}
|
|
||||||
onChange={(event) => updateSourceConfig(index, "sourceName", event.target.value)}
|
|
||||||
>
|
|
||||||
{sourceOptions.map((option) => (
|
|
||||||
<option key={option.id} value={option.id}>
|
|
||||||
{option.label}
|
|
||||||
</option>
|
|
||||||
))}
|
|
||||||
</select>
|
|
||||||
|
|
||||||
<input
|
|
||||||
type="number"
|
|
||||||
min={1}
|
|
||||||
value={source.limit}
|
|
||||||
placeholder="Limit"
|
|
||||||
style={{ ...styles.input, ...styles.inputFullWidth }}
|
|
||||||
onChange={(event) => updateSourceConfig(index, "limit", event.target.value)}
|
|
||||||
/>
|
|
||||||
|
|
||||||
<input
|
|
||||||
type="text"
|
|
||||||
value={source.search}
|
|
||||||
placeholder={
|
|
||||||
searchEnabled
|
|
||||||
? "Search term (optional)"
|
|
||||||
: "Search not supported for this source"
|
|
||||||
}
|
|
||||||
style={{ ...styles.input, ...styles.inputFullWidth }}
|
|
||||||
disabled={!searchEnabled}
|
|
||||||
onChange={(event) => updateSourceConfig(index, "search", event.target.value)}
|
|
||||||
/>
|
|
||||||
|
|
||||||
<input
|
|
||||||
type="text"
|
|
||||||
value={source.category}
|
|
||||||
placeholder={
|
|
||||||
categoriesEnabled
|
|
||||||
? "Category (optional)"
|
|
||||||
: "Categories not supported for this source"
|
|
||||||
}
|
|
||||||
style={{ ...styles.input, ...styles.inputFullWidth }}
|
|
||||||
disabled={!categoriesEnabled}
|
|
||||||
onChange={(event) => updateSourceConfig(index, "category", event.target.value)}
|
|
||||||
/>
|
|
||||||
|
|
||||||
{sourceConfigs.length > 1 && (
|
|
||||||
<button
|
|
||||||
type="button"
|
|
||||||
style={styles.buttonSecondary}
|
|
||||||
onClick={() => removeSourceConfig(index)}
|
|
||||||
>
|
|
||||||
Remove source
|
|
||||||
</button>
|
|
||||||
)}
|
|
||||||
</div>
|
|
||||||
);
|
|
||||||
})}
|
|
||||||
|
|
||||||
<button type="button" style={styles.buttonSecondary} onClick={addSourceConfig}>
|
|
||||||
Add another source
|
|
||||||
</button>
|
|
||||||
</div>
|
|
||||||
)}
|
|
||||||
</div>
|
|
||||||
</div>
|
|
||||||
|
|
||||||
<div
|
|
||||||
style={{
|
|
||||||
...styles.card,
|
|
||||||
marginTop: 14,
|
|
||||||
...(hasError ? styles.alertCardError : styles.alertCardInfo),
|
|
||||||
}}
|
|
||||||
>
|
|
||||||
{returnMessage ||
|
|
||||||
"After queueing, your dataset is fetched and processed in the background automatically."}
|
|
||||||
</div>
|
|
||||||
</div>
|
|
||||||
</div>
|
|
||||||
);
|
|
||||||
};
|
|
||||||
|
|
||||||
export default AutoScrapePage;
|
|
||||||
@@ -22,12 +22,10 @@ const DatasetEditPage = () => {
|
|||||||
const [isSaving, setIsSaving] = useState(false);
|
const [isSaving, setIsSaving] = useState(false);
|
||||||
const [isDeleting, setIsDeleting] = useState(false);
|
const [isDeleting, setIsDeleting] = useState(false);
|
||||||
const [isDeleteModalOpen, setIsDeleteModalOpen] = useState(false);
|
const [isDeleteModalOpen, setIsDeleteModalOpen] = useState(false);
|
||||||
const [hasError, setHasError] = useState(false);
|
|
||||||
|
|
||||||
const [datasetName, setDatasetName] = useState("");
|
const [datasetName, setDatasetName] = useState("");
|
||||||
useEffect(() => {
|
useEffect(() => {
|
||||||
if (!Number.isInteger(parsedDatasetId) || parsedDatasetId <= 0) {
|
if (!Number.isInteger(parsedDatasetId) || parsedDatasetId <= 0) {
|
||||||
setHasError(true);
|
|
||||||
setStatusMessage("Invalid dataset id.");
|
setStatusMessage("Invalid dataset id.");
|
||||||
setLoading(false);
|
setLoading(false);
|
||||||
return;
|
return;
|
||||||
@@ -35,7 +33,6 @@ const DatasetEditPage = () => {
|
|||||||
|
|
||||||
const token = localStorage.getItem("access_token");
|
const token = localStorage.getItem("access_token");
|
||||||
if (!token) {
|
if (!token) {
|
||||||
setHasError(true);
|
|
||||||
setStatusMessage("You must be signed in to edit datasets.");
|
setStatusMessage("You must be signed in to edit datasets.");
|
||||||
setLoading(false);
|
setLoading(false);
|
||||||
return;
|
return;
|
||||||
@@ -49,9 +46,10 @@ const DatasetEditPage = () => {
|
|||||||
setDatasetName(response.data.name || "");
|
setDatasetName(response.data.name || "");
|
||||||
})
|
})
|
||||||
.catch((error: unknown) => {
|
.catch((error: unknown) => {
|
||||||
setHasError(true);
|
|
||||||
if (axios.isAxiosError(error)) {
|
if (axios.isAxiosError(error)) {
|
||||||
setStatusMessage(String(error.response?.data?.error || error.message));
|
setStatusMessage(
|
||||||
|
String(error.response?.data?.error || error.message),
|
||||||
|
);
|
||||||
} else {
|
} else {
|
||||||
setStatusMessage("Could not get dataset info.");
|
setStatusMessage("Could not get dataset info.");
|
||||||
}
|
}
|
||||||
@@ -61,40 +59,39 @@ const DatasetEditPage = () => {
|
|||||||
});
|
});
|
||||||
}, [parsedDatasetId]);
|
}, [parsedDatasetId]);
|
||||||
|
|
||||||
|
|
||||||
const saveDatasetName = async (event: FormEvent<HTMLFormElement>) => {
|
const saveDatasetName = async (event: FormEvent<HTMLFormElement>) => {
|
||||||
event.preventDefault();
|
event.preventDefault();
|
||||||
|
|
||||||
const trimmedName = datasetName.trim();
|
const trimmedName = datasetName.trim();
|
||||||
if (!trimmedName) {
|
if (!trimmedName) {
|
||||||
setHasError(true);
|
|
||||||
setStatusMessage("Please enter a valid dataset name.");
|
setStatusMessage("Please enter a valid dataset name.");
|
||||||
return;
|
return;
|
||||||
}
|
}
|
||||||
|
|
||||||
const token = localStorage.getItem("access_token");
|
const token = localStorage.getItem("access_token");
|
||||||
if (!token) {
|
if (!token) {
|
||||||
setHasError(true);
|
|
||||||
setStatusMessage("You must be signed in to save changes.");
|
setStatusMessage("You must be signed in to save changes.");
|
||||||
return;
|
return;
|
||||||
}
|
}
|
||||||
|
|
||||||
try {
|
try {
|
||||||
setIsSaving(true);
|
setIsSaving(true);
|
||||||
setHasError(false);
|
|
||||||
setStatusMessage("");
|
setStatusMessage("");
|
||||||
|
|
||||||
await axios.patch(
|
await axios.patch(
|
||||||
`${API_BASE_URL}/dataset/${parsedDatasetId}`,
|
`${API_BASE_URL}/dataset/${parsedDatasetId}`,
|
||||||
{ name: trimmedName },
|
{ name: trimmedName },
|
||||||
{ headers: { Authorization: `Bearer ${token}` } }
|
{ headers: { Authorization: `Bearer ${token}` } },
|
||||||
);
|
);
|
||||||
|
|
||||||
navigate("/datasets", { replace: true });
|
navigate("/datasets", { replace: true });
|
||||||
} catch (error: unknown) {
|
} catch (error: unknown) {
|
||||||
setHasError(true);
|
|
||||||
if (axios.isAxiosError(error)) {
|
if (axios.isAxiosError(error)) {
|
||||||
setStatusMessage(String(error.response?.data?.error || error.message || "Save failed."));
|
setStatusMessage(
|
||||||
|
String(
|
||||||
|
error.response?.data?.error || error.message || "Save failed.",
|
||||||
|
),
|
||||||
|
);
|
||||||
} else {
|
} else {
|
||||||
setStatusMessage("Save failed due to an unexpected error.");
|
setStatusMessage("Save failed due to an unexpected error.");
|
||||||
}
|
}
|
||||||
@@ -106,7 +103,6 @@ const DatasetEditPage = () => {
|
|||||||
const deleteDataset = async () => {
|
const deleteDataset = async () => {
|
||||||
const deleteToken = localStorage.getItem("access_token");
|
const deleteToken = localStorage.getItem("access_token");
|
||||||
if (!deleteToken) {
|
if (!deleteToken) {
|
||||||
setHasError(true);
|
|
||||||
setStatusMessage("You must be signed in to delete datasets.");
|
setStatusMessage("You must be signed in to delete datasets.");
|
||||||
setIsDeleteModalOpen(false);
|
setIsDeleteModalOpen(false);
|
||||||
return;
|
return;
|
||||||
@@ -114,20 +110,21 @@ const DatasetEditPage = () => {
|
|||||||
|
|
||||||
try {
|
try {
|
||||||
setIsDeleting(true);
|
setIsDeleting(true);
|
||||||
setHasError(false);
|
|
||||||
setStatusMessage("");
|
setStatusMessage("");
|
||||||
|
|
||||||
await axios.delete(
|
await axios.delete(`${API_BASE_URL}/dataset/${parsedDatasetId}`, {
|
||||||
`${API_BASE_URL}/dataset/${parsedDatasetId}`,
|
headers: { Authorization: `Bearer ${deleteToken}` },
|
||||||
{ headers: { Authorization: `Bearer ${deleteToken}` } }
|
});
|
||||||
);
|
|
||||||
|
|
||||||
setIsDeleteModalOpen(false);
|
setIsDeleteModalOpen(false);
|
||||||
navigate("/datasets", { replace: true });
|
navigate("/datasets", { replace: true });
|
||||||
} catch (error: unknown) {
|
} catch (error: unknown) {
|
||||||
setHasError(true);
|
|
||||||
if (axios.isAxiosError(error)) {
|
if (axios.isAxiosError(error)) {
|
||||||
setStatusMessage(String(error.response?.data?.error || error.message || "Delete failed."));
|
setStatusMessage(
|
||||||
|
String(
|
||||||
|
error.response?.data?.error || error.message || "Delete failed.",
|
||||||
|
),
|
||||||
|
);
|
||||||
} else {
|
} else {
|
||||||
setStatusMessage("Delete failed due to an unexpected error.");
|
setStatusMessage("Delete failed due to an unexpected error.");
|
||||||
}
|
}
|
||||||
@@ -142,7 +139,9 @@ const DatasetEditPage = () => {
|
|||||||
<div style={{ ...styles.card, ...styles.headerBar }}>
|
<div style={{ ...styles.card, ...styles.headerBar }}>
|
||||||
<div>
|
<div>
|
||||||
<h1 style={styles.sectionHeaderTitle}>Edit Dataset</h1>
|
<h1 style={styles.sectionHeaderTitle}>Edit Dataset</h1>
|
||||||
<p style={styles.sectionHeaderSubtitle}>Update the dataset name shown in your datasets list.</p>
|
<p style={styles.sectionHeaderSubtitle}>
|
||||||
|
Update the dataset name shown in your datasets list.
|
||||||
|
</p>
|
||||||
</div>
|
</div>
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
@@ -173,8 +172,8 @@ const DatasetEditPage = () => {
|
|||||||
style={styles.buttonDanger}
|
style={styles.buttonDanger}
|
||||||
onClick={() => setIsDeleteModalOpen(true)}
|
onClick={() => setIsDeleteModalOpen(true)}
|
||||||
disabled={isSaving || isDeleting}
|
disabled={isSaving || isDeleting}
|
||||||
>
|
>
|
||||||
Delete Dataset
|
Delete Dataset
|
||||||
</button>
|
</button>
|
||||||
|
|
||||||
<button
|
<button
|
||||||
@@ -187,15 +186,16 @@ const DatasetEditPage = () => {
|
|||||||
</button>
|
</button>
|
||||||
<button
|
<button
|
||||||
type="submit"
|
type="submit"
|
||||||
style={{ ...styles.buttonPrimary, opacity: loading || isSaving ? 0.75 : 1 }}
|
style={{
|
||||||
|
...styles.buttonPrimary,
|
||||||
|
opacity: loading || isSaving ? 0.75 : 1,
|
||||||
|
}}
|
||||||
disabled={loading || isSaving || isDeleting}
|
disabled={loading || isSaving || isDeleting}
|
||||||
>
|
>
|
||||||
{isSaving ? "Saving..." : "Save"}
|
{isSaving ? "Saving..." : "Save"}
|
||||||
</button>
|
</button>
|
||||||
|
|
||||||
{loading
|
{loading ? "Loading dataset details..." : statusMessage}
|
||||||
? "Loading dataset details..."
|
|
||||||
: statusMessage}
|
|
||||||
</div>
|
</div>
|
||||||
</form>
|
</form>
|
||||||
|
|
||||||
|
|||||||
@@ -3,10 +3,10 @@ import axios from "axios";
|
|||||||
import { useNavigate, useParams } from "react-router-dom";
|
import { useNavigate, useParams } from "react-router-dom";
|
||||||
import StatsStyling from "../styles/stats_styling";
|
import StatsStyling from "../styles/stats_styling";
|
||||||
|
|
||||||
const API_BASE_URL = import.meta.env.VITE_BACKEND_URL
|
const API_BASE_URL = import.meta.env.VITE_BACKEND_URL;
|
||||||
|
|
||||||
type DatasetStatusResponse = {
|
type DatasetStatusResponse = {
|
||||||
status?: "processing" | "complete" | "error";
|
status?: "fetching" | "processing" | "complete" | "error";
|
||||||
status_message?: string | null;
|
status_message?: string | null;
|
||||||
completed_at?: string | null;
|
completed_at?: string | null;
|
||||||
};
|
};
|
||||||
@@ -17,7 +17,8 @@ const DatasetStatusPage = () => {
|
|||||||
const navigate = useNavigate();
|
const navigate = useNavigate();
|
||||||
const { datasetId } = useParams<{ datasetId: string }>();
|
const { datasetId } = useParams<{ datasetId: string }>();
|
||||||
const [loading, setLoading] = useState(true);
|
const [loading, setLoading] = useState(true);
|
||||||
const [status, setStatus] = useState<DatasetStatusResponse["status"]>("processing");
|
const [status, setStatus] =
|
||||||
|
useState<DatasetStatusResponse["status"]>("processing");
|
||||||
const [statusMessage, setStatusMessage] = useState("");
|
const [statusMessage, setStatusMessage] = useState("");
|
||||||
const parsedDatasetId = useMemo(() => Number(datasetId), [datasetId]);
|
const parsedDatasetId = useMemo(() => Number(datasetId), [datasetId]);
|
||||||
|
|
||||||
@@ -34,7 +35,7 @@ const DatasetStatusPage = () => {
|
|||||||
const pollStatus = async () => {
|
const pollStatus = async () => {
|
||||||
try {
|
try {
|
||||||
const response = await axios.get<DatasetStatusResponse>(
|
const response = await axios.get<DatasetStatusResponse>(
|
||||||
`${API_BASE_URL}/dataset/${parsedDatasetId}/status`
|
`${API_BASE_URL}/dataset/${parsedDatasetId}/status`,
|
||||||
);
|
);
|
||||||
|
|
||||||
const nextStatus = response.data.status ?? "processing";
|
const nextStatus = response.data.status ?? "processing";
|
||||||
@@ -51,7 +52,9 @@ const DatasetStatusPage = () => {
|
|||||||
setLoading(false);
|
setLoading(false);
|
||||||
setStatus("error");
|
setStatus("error");
|
||||||
if (axios.isAxiosError(error)) {
|
if (axios.isAxiosError(error)) {
|
||||||
const message = String(error.response?.data?.error || error.message || "Request failed");
|
const message = String(
|
||||||
|
error.response?.data?.error || error.message || "Request failed",
|
||||||
|
);
|
||||||
setStatusMessage(message);
|
setStatusMessage(message);
|
||||||
} else {
|
} else {
|
||||||
setStatusMessage("Unable to fetch dataset status.");
|
setStatusMessage("Unable to fetch dataset status.");
|
||||||
@@ -73,7 +76,8 @@ const DatasetStatusPage = () => {
|
|||||||
};
|
};
|
||||||
}, [navigate, parsedDatasetId, status]);
|
}, [navigate, parsedDatasetId, status]);
|
||||||
|
|
||||||
const isProcessing = loading || status === "processing";
|
const isProcessing =
|
||||||
|
loading || status === "fetching" || status === "processing";
|
||||||
const isError = status === "error";
|
const isError = status === "error";
|
||||||
|
|
||||||
return (
|
return (
|
||||||
@@ -81,26 +85,37 @@ const DatasetStatusPage = () => {
|
|||||||
<div style={styles.containerNarrow}>
|
<div style={styles.containerNarrow}>
|
||||||
<div style={{ ...styles.card, marginTop: 28 }}>
|
<div style={{ ...styles.card, marginTop: 28 }}>
|
||||||
<h1 style={styles.sectionHeaderTitle}>
|
<h1 style={styles.sectionHeaderTitle}>
|
||||||
{isProcessing ? "Processing dataset..." : isError ? "Dataset processing failed" : "Dataset ready"}
|
{isProcessing
|
||||||
|
? "Processing dataset..."
|
||||||
|
: isError
|
||||||
|
? "Dataset processing failed"
|
||||||
|
: "Dataset ready"}
|
||||||
</h1>
|
</h1>
|
||||||
|
|
||||||
<p style={{ ...styles.sectionSubtitle, marginTop: 10 }}>
|
<p style={{ ...styles.sectionSubtitle, marginTop: 10 }}>
|
||||||
{isProcessing &&
|
{isProcessing &&
|
||||||
"Your dataset is being analyzed. This page will redirect to stats automatically once complete."}
|
"Your dataset is being analyzed. This page will redirect to stats automatically once complete."}
|
||||||
{isError && "There was an issue while processing your dataset. Please review the error details."}
|
{isError &&
|
||||||
{status === "complete" && "Processing complete. Redirecting to your stats now..."}
|
"There was an issue while processing your dataset. Please review the error details."}
|
||||||
|
{status === "complete" &&
|
||||||
|
"Processing complete. Redirecting to your stats now..."}
|
||||||
</p>
|
</p>
|
||||||
|
|
||||||
<div
|
<div
|
||||||
style={{
|
style={{
|
||||||
...styles.card,
|
...styles.card,
|
||||||
...styles.statusMessageCard,
|
...styles.statusMessageCard,
|
||||||
borderColor: isError ? "rgba(185, 28, 28, 0.28)" : "rgba(0,0,0,0.06)",
|
borderColor: isError
|
||||||
|
? "rgba(185, 28, 28, 0.28)"
|
||||||
|
: "rgba(0,0,0,0.06)",
|
||||||
background: isError ? "#fff5f5" : "#ffffff",
|
background: isError ? "#fff5f5" : "#ffffff",
|
||||||
color: isError ? "#991b1b" : "#374151",
|
color: isError ? "#991b1b" : "#374151",
|
||||||
}}
|
}}
|
||||||
>
|
>
|
||||||
{statusMessage || (isProcessing ? "Waiting for updates from the worker queue..." : "No details provided.")}
|
{statusMessage ||
|
||||||
|
(isProcessing
|
||||||
|
? "Waiting for updates from the worker queue..."
|
||||||
|
: "No details provided.")}
|
||||||
</div>
|
</div>
|
||||||
</div>
|
</div>
|
||||||
</div>
|
</div>
|
||||||
|
|||||||
@@ -39,7 +39,9 @@ const DatasetsPage = () => {
|
|||||||
})
|
})
|
||||||
.catch((requestError: unknown) => {
|
.catch((requestError: unknown) => {
|
||||||
if (axios.isAxiosError(requestError)) {
|
if (axios.isAxiosError(requestError)) {
|
||||||
setError(String(requestError.response?.data?.error || requestError.message));
|
setError(
|
||||||
|
String(requestError.response?.data?.error || requestError.message),
|
||||||
|
);
|
||||||
} else {
|
} else {
|
||||||
setError("Failed to load datasets.");
|
setError("Failed to load datasets.");
|
||||||
}
|
}
|
||||||
@@ -61,13 +63,28 @@ const DatasetsPage = () => {
|
|||||||
</div>
|
</div>
|
||||||
|
|
||||||
<div style={styles.loadingSkeleton}>
|
<div style={styles.loadingSkeleton}>
|
||||||
<div style={{ ...styles.loadingSkeletonLine, ...styles.loadingSkeletonLineLong }} />
|
<div
|
||||||
<div style={{ ...styles.loadingSkeletonLine, ...styles.loadingSkeletonLineMed }} />
|
style={{
|
||||||
<div style={{ ...styles.loadingSkeletonLine, ...styles.loadingSkeletonLineShort }} />
|
...styles.loadingSkeletonLine,
|
||||||
|
...styles.loadingSkeletonLineLong,
|
||||||
|
}}
|
||||||
|
/>
|
||||||
|
<div
|
||||||
|
style={{
|
||||||
|
...styles.loadingSkeletonLine,
|
||||||
|
...styles.loadingSkeletonLineMed,
|
||||||
|
}}
|
||||||
|
/>
|
||||||
|
<div
|
||||||
|
style={{
|
||||||
|
...styles.loadingSkeletonLine,
|
||||||
|
...styles.loadingSkeletonLineShort,
|
||||||
|
}}
|
||||||
|
/>
|
||||||
</div>
|
</div>
|
||||||
</div>
|
</div>
|
||||||
</div>
|
</div>
|
||||||
)
|
);
|
||||||
}
|
}
|
||||||
|
|
||||||
return (
|
return (
|
||||||
@@ -81,15 +98,19 @@ const DatasetsPage = () => {
|
|||||||
</p>
|
</p>
|
||||||
</div>
|
</div>
|
||||||
<div style={styles.controlsWrapped}>
|
<div style={styles.controlsWrapped}>
|
||||||
<button type="button" style={styles.buttonPrimary} onClick={() => navigate("/upload")}>
|
<button
|
||||||
|
type="button"
|
||||||
|
style={styles.buttonPrimary}
|
||||||
|
onClick={() => navigate("/upload")}
|
||||||
|
>
|
||||||
Upload New Dataset
|
Upload New Dataset
|
||||||
</button>
|
</button>
|
||||||
<button
|
<button
|
||||||
type="button"
|
type="button"
|
||||||
style={styles.buttonSecondary}
|
style={styles.buttonSecondary}
|
||||||
onClick={() => navigate("/auto-scrape")}
|
onClick={() => navigate("/auto-fetch")}
|
||||||
>
|
>
|
||||||
Auto Scrape Dataset
|
Auto Fetch Dataset
|
||||||
</button>
|
</button>
|
||||||
</div>
|
</div>
|
||||||
</div>
|
</div>
|
||||||
@@ -116,20 +137,25 @@ const DatasetsPage = () => {
|
|||||||
)}
|
)}
|
||||||
|
|
||||||
{!error && datasets.length > 0 && (
|
{!error && datasets.length > 0 && (
|
||||||
<div style={{ ...styles.card, marginTop: 14, padding: 0, overflow: "hidden" }}>
|
<div
|
||||||
|
style={{
|
||||||
|
...styles.card,
|
||||||
|
marginTop: 14,
|
||||||
|
padding: 0,
|
||||||
|
overflow: "hidden",
|
||||||
|
}}
|
||||||
|
>
|
||||||
<ul style={styles.listNoBullets}>
|
<ul style={styles.listNoBullets}>
|
||||||
{datasets.map((dataset) => {
|
{datasets.map((dataset) => {
|
||||||
const isComplete = dataset.status === "complete" || dataset.status === "error";
|
const isComplete =
|
||||||
|
dataset.status === "complete" || dataset.status === "error";
|
||||||
const editPath = `/dataset/${dataset.id}/edit`;
|
const editPath = `/dataset/${dataset.id}/edit`;
|
||||||
const targetPath = isComplete
|
const targetPath = isComplete
|
||||||
? `/dataset/${dataset.id}/stats`
|
? `/dataset/${dataset.id}/stats`
|
||||||
: `/dataset/${dataset.id}/status`;
|
: `/dataset/${dataset.id}/status`;
|
||||||
|
|
||||||
return (
|
return (
|
||||||
<li
|
<li key={dataset.id} style={styles.datasetListItem}>
|
||||||
key={dataset.id}
|
|
||||||
style={styles.datasetListItem}
|
|
||||||
>
|
|
||||||
<div style={{ minWidth: 0 }}>
|
<div style={{ minWidth: 0 }}>
|
||||||
<div style={styles.datasetName}>
|
<div style={styles.datasetName}>
|
||||||
{dataset.name || `Dataset #${dataset.id}`}
|
{dataset.name || `Dataset #${dataset.id}`}
|
||||||
@@ -145,19 +171,23 @@ const DatasetsPage = () => {
|
|||||||
</div>
|
</div>
|
||||||
|
|
||||||
<div>
|
<div>
|
||||||
{ isComplete &&
|
{isComplete && (
|
||||||
<button
|
<button
|
||||||
type="button"
|
type="button"
|
||||||
style={{...styles.buttonSecondary, "margin": "5px"}}
|
style={{ ...styles.buttonSecondary, margin: "5px" }}
|
||||||
onClick={() => navigate(editPath)}
|
onClick={() => navigate(editPath)}
|
||||||
>
|
>
|
||||||
Edit Dataset
|
Edit Dataset
|
||||||
</button>
|
</button>
|
||||||
}
|
)}
|
||||||
|
|
||||||
<button
|
<button
|
||||||
type="button"
|
type="button"
|
||||||
style={isComplete ? styles.buttonPrimary : styles.buttonSecondary}
|
style={
|
||||||
|
isComplete
|
||||||
|
? styles.buttonPrimary
|
||||||
|
: styles.buttonSecondary
|
||||||
|
}
|
||||||
onClick={() => navigate(targetPath)}
|
onClick={() => navigate(targetPath)}
|
||||||
>
|
>
|
||||||
{isComplete ? "Open stats" : "View status"}
|
{isComplete ? "Open stats" : "View status"}
|
||||||
|
|||||||
@@ -3,7 +3,7 @@ import axios from "axios";
|
|||||||
import { useNavigate } from "react-router-dom";
|
import { useNavigate } from "react-router-dom";
|
||||||
import StatsStyling from "../styles/stats_styling";
|
import StatsStyling from "../styles/stats_styling";
|
||||||
|
|
||||||
const API_BASE_URL = import.meta.env.VITE_BACKEND_URL
|
const API_BASE_URL = import.meta.env.VITE_BACKEND_URL;
|
||||||
|
|
||||||
const styles = StatsStyling;
|
const styles = StatsStyling;
|
||||||
|
|
||||||
@@ -44,13 +44,17 @@ const LoginPage = () => {
|
|||||||
|
|
||||||
try {
|
try {
|
||||||
if (isRegisterMode) {
|
if (isRegisterMode) {
|
||||||
await axios.post(`${API_BASE_URL}/register`, { username, email, password });
|
await axios.post(`${API_BASE_URL}/register`, {
|
||||||
|
username,
|
||||||
|
email,
|
||||||
|
password,
|
||||||
|
});
|
||||||
setInfo("Account created. You can now sign in.");
|
setInfo("Account created. You can now sign in.");
|
||||||
setIsRegisterMode(false);
|
setIsRegisterMode(false);
|
||||||
} else {
|
} else {
|
||||||
const response = await axios.post<{ access_token: string }>(
|
const response = await axios.post<{ access_token: string }>(
|
||||||
`${API_BASE_URL}/login`,
|
`${API_BASE_URL}/login`,
|
||||||
{ username, password }
|
{ username, password },
|
||||||
);
|
);
|
||||||
|
|
||||||
const token = response.data.access_token;
|
const token = response.data.access_token;
|
||||||
@@ -61,7 +65,11 @@ const LoginPage = () => {
|
|||||||
} catch (requestError: unknown) {
|
} catch (requestError: unknown) {
|
||||||
if (axios.isAxiosError(requestError)) {
|
if (axios.isAxiosError(requestError)) {
|
||||||
setError(
|
setError(
|
||||||
String(requestError.response?.data?.error || requestError.message || "Request failed")
|
String(
|
||||||
|
requestError.response?.data?.error ||
|
||||||
|
requestError.message ||
|
||||||
|
"Request failed",
|
||||||
|
),
|
||||||
);
|
);
|
||||||
} else {
|
} else {
|
||||||
setError("Unexpected error occurred.");
|
setError("Unexpected error occurred.");
|
||||||
@@ -73,90 +81,86 @@ const LoginPage = () => {
|
|||||||
|
|
||||||
return (
|
return (
|
||||||
<div style={styles.containerAuth}>
|
<div style={styles.containerAuth}>
|
||||||
<div style={{ ...styles.card, ...styles.authCard }}>
|
<div style={{ ...styles.card, ...styles.authCard }}>
|
||||||
<div style={styles.headingBlock}>
|
<div style={styles.headingBlock}>
|
||||||
<h1 style={styles.headingXl}>
|
<h1 style={styles.headingXl}>
|
||||||
{isRegisterMode ? "Create your account" : "Welcome back"}
|
{isRegisterMode ? "Create your account" : "Welcome back"}
|
||||||
</h1>
|
</h1>
|
||||||
<p style={styles.mutedText}>
|
<p style={styles.mutedText}>
|
||||||
{isRegisterMode
|
{isRegisterMode
|
||||||
? "Register to start uploading and exploring your dataset insights."
|
? "Register to start uploading and exploring your dataset insights."
|
||||||
: "Sign in to continue to your analytics workspace."}
|
: "Sign in to continue to your analytics workspace."}
|
||||||
</p>
|
</p>
|
||||||
</div>
|
|
||||||
|
|
||||||
<form onSubmit={handleSubmit} style={styles.authForm}>
|
|
||||||
<input
|
|
||||||
type="text"
|
|
||||||
placeholder="Username"
|
|
||||||
style={{ ...styles.input, ...styles.authControl }}
|
|
||||||
value={username}
|
|
||||||
onChange={(event) => setUsername(event.target.value)}
|
|
||||||
required
|
|
||||||
/>
|
|
||||||
|
|
||||||
{isRegisterMode && (
|
|
||||||
<input
|
|
||||||
type="email"
|
|
||||||
placeholder="Email"
|
|
||||||
style={{ ...styles.input, ...styles.authControl }}
|
|
||||||
value={email}
|
|
||||||
onChange={(event) => setEmail(event.target.value)}
|
|
||||||
required
|
|
||||||
/>
|
|
||||||
)}
|
|
||||||
|
|
||||||
<input
|
|
||||||
type="password"
|
|
||||||
placeholder="Password"
|
|
||||||
style={{ ...styles.input, ...styles.authControl }}
|
|
||||||
value={password}
|
|
||||||
onChange={(event) => setPassword(event.target.value)}
|
|
||||||
required
|
|
||||||
/>
|
|
||||||
|
|
||||||
<button
|
|
||||||
type="submit"
|
|
||||||
style={{ ...styles.buttonPrimary, ...styles.authControl, marginTop: 2 }}
|
|
||||||
disabled={loading}
|
|
||||||
>
|
|
||||||
{loading
|
|
||||||
? "Please wait..."
|
|
||||||
: isRegisterMode
|
|
||||||
? "Create account"
|
|
||||||
: "Sign in"}
|
|
||||||
</button>
|
|
||||||
</form>
|
|
||||||
|
|
||||||
{error && (
|
|
||||||
<p style={styles.authErrorText}>
|
|
||||||
{error}
|
|
||||||
</p>
|
|
||||||
)}
|
|
||||||
|
|
||||||
{info && (
|
|
||||||
<p style={styles.authInfoText}>
|
|
||||||
{info}
|
|
||||||
</p>
|
|
||||||
)}
|
|
||||||
|
|
||||||
<div style={styles.authSwitchRow}>
|
|
||||||
<span style={styles.authSwitchLabel}>
|
|
||||||
{isRegisterMode ? "Already have an account?" : "New here?"}
|
|
||||||
</span>
|
|
||||||
<button
|
|
||||||
type="button"
|
|
||||||
style={styles.authSwitchButton}
|
|
||||||
onClick={() => {
|
|
||||||
setError("");
|
|
||||||
setInfo("");
|
|
||||||
setIsRegisterMode((value) => !value);
|
|
||||||
}}
|
|
||||||
>
|
|
||||||
{isRegisterMode ? "Switch to sign in" : "Create account"}
|
|
||||||
</button>
|
|
||||||
</div>
|
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
|
<form onSubmit={handleSubmit} style={styles.authForm}>
|
||||||
|
<input
|
||||||
|
type="text"
|
||||||
|
placeholder="Username"
|
||||||
|
style={{ ...styles.input, ...styles.authControl }}
|
||||||
|
value={username}
|
||||||
|
onChange={(event) => setUsername(event.target.value)}
|
||||||
|
required
|
||||||
|
/>
|
||||||
|
|
||||||
|
{isRegisterMode && (
|
||||||
|
<input
|
||||||
|
type="email"
|
||||||
|
placeholder="Email"
|
||||||
|
style={{ ...styles.input, ...styles.authControl }}
|
||||||
|
value={email}
|
||||||
|
onChange={(event) => setEmail(event.target.value)}
|
||||||
|
required
|
||||||
|
/>
|
||||||
|
)}
|
||||||
|
|
||||||
|
<input
|
||||||
|
type="password"
|
||||||
|
placeholder="Password"
|
||||||
|
style={{ ...styles.input, ...styles.authControl }}
|
||||||
|
value={password}
|
||||||
|
onChange={(event) => setPassword(event.target.value)}
|
||||||
|
required
|
||||||
|
/>
|
||||||
|
|
||||||
|
<button
|
||||||
|
type="submit"
|
||||||
|
style={{
|
||||||
|
...styles.buttonPrimary,
|
||||||
|
...styles.authControl,
|
||||||
|
marginTop: 2,
|
||||||
|
}}
|
||||||
|
disabled={loading}
|
||||||
|
>
|
||||||
|
{loading
|
||||||
|
? "Please wait..."
|
||||||
|
: isRegisterMode
|
||||||
|
? "Create account"
|
||||||
|
: "Sign in"}
|
||||||
|
</button>
|
||||||
|
</form>
|
||||||
|
|
||||||
|
{error && <p style={styles.authErrorText}>{error}</p>}
|
||||||
|
|
||||||
|
{info && <p style={styles.authInfoText}>{info}</p>}
|
||||||
|
|
||||||
|
<div style={styles.authSwitchRow}>
|
||||||
|
<span style={styles.authSwitchLabel}>
|
||||||
|
{isRegisterMode ? "Already have an account?" : "New here?"}
|
||||||
|
</span>
|
||||||
|
<button
|
||||||
|
type="button"
|
||||||
|
style={styles.authSwitchButton}
|
||||||
|
onClick={() => {
|
||||||
|
setError("");
|
||||||
|
setInfo("");
|
||||||
|
setIsRegisterMode((value) => !value);
|
||||||
|
}}
|
||||||
|
>
|
||||||
|
{isRegisterMode ? "Switch to sign in" : "Create account"}
|
||||||
|
</button>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
</div>
|
</div>
|
||||||
);
|
);
|
||||||
};
|
};
|
||||||
|
|||||||
@@ -1,4 +1,4 @@
|
|||||||
import { useEffect, useState, useRef } from "react";
|
import { useEffect, useRef, useState } from "react";
|
||||||
import axios from "axios";
|
import axios from "axios";
|
||||||
import { useParams } from "react-router-dom";
|
import { useParams } from "react-router-dom";
|
||||||
import StatsStyling from "../styles/stats_styling";
|
import StatsStyling from "../styles/stats_styling";
|
||||||
@@ -8,48 +8,269 @@ import UserStats from "../components/UserStats";
|
|||||||
import LinguisticStats from "../components/LinguisticStats";
|
import LinguisticStats from "../components/LinguisticStats";
|
||||||
import InteractionalStats from "../components/InteractionalStats";
|
import InteractionalStats from "../components/InteractionalStats";
|
||||||
import CulturalStats from "../components/CulturalStats";
|
import CulturalStats from "../components/CulturalStats";
|
||||||
|
import CorpusExplorer from "../components/CorpusExplorer";
|
||||||
|
|
||||||
import {
|
import {
|
||||||
type SummaryResponse,
|
type SummaryResponse,
|
||||||
type UserAnalysisResponse,
|
|
||||||
type TimeAnalysisResponse,
|
type TimeAnalysisResponse,
|
||||||
type ContentAnalysisResponse,
|
type User,
|
||||||
type UserEndpointResponse,
|
type UserEndpointResponse,
|
||||||
type LinguisticAnalysisResponse,
|
type LinguisticAnalysisResponse,
|
||||||
type EmotionalAnalysisResponse,
|
type EmotionalAnalysisResponse,
|
||||||
type InteractionAnalysisResponse,
|
type InteractionAnalysisResponse,
|
||||||
type CulturalAnalysisResponse
|
type CulturalAnalysisResponse,
|
||||||
} from '../types/ApiTypes'
|
} from "../types/ApiTypes";
|
||||||
|
import {
|
||||||
|
buildExplorerContext,
|
||||||
|
type CorpusExplorerSpec,
|
||||||
|
type DatasetRecord,
|
||||||
|
} from "../utils/corpusExplorer";
|
||||||
|
|
||||||
const API_BASE_URL = import.meta.env.VITE_BACKEND_URL
|
const API_BASE_URL = import.meta.env.VITE_BACKEND_URL;
|
||||||
const styles = StatsStyling;
|
const styles = StatsStyling;
|
||||||
const DELETED_USERS = ["[deleted]"];
|
const DELETED_USERS = ["[deleted]", "automoderator"];
|
||||||
|
|
||||||
const isDeletedUser = (value: string | null | undefined) => (
|
const isDeletedUser = (value: string | null | undefined) =>
|
||||||
DELETED_USERS.includes((value ?? "").trim().toLowerCase())
|
DELETED_USERS.includes((value ?? "").trim().toLowerCase());
|
||||||
);
|
|
||||||
|
type ActiveView =
|
||||||
|
| "summary"
|
||||||
|
| "emotional"
|
||||||
|
| "user"
|
||||||
|
| "linguistic"
|
||||||
|
| "interactional"
|
||||||
|
| "cultural";
|
||||||
|
|
||||||
|
type UserStatsMeta = {
|
||||||
|
totalUsers: number;
|
||||||
|
mostCommentHeavyUser: { author: string; commentShare: number } | null;
|
||||||
|
};
|
||||||
|
|
||||||
|
type ExplorerState = {
|
||||||
|
open: boolean;
|
||||||
|
title: string;
|
||||||
|
description: string;
|
||||||
|
emptyMessage: string;
|
||||||
|
records: DatasetRecord[];
|
||||||
|
loading: boolean;
|
||||||
|
error: string;
|
||||||
|
};
|
||||||
|
|
||||||
|
const EMPTY_EXPLORER_STATE: ExplorerState = {
|
||||||
|
open: false,
|
||||||
|
title: "Corpus Explorer",
|
||||||
|
description: "",
|
||||||
|
emptyMessage: "No records found.",
|
||||||
|
records: [],
|
||||||
|
loading: false,
|
||||||
|
error: "",
|
||||||
|
};
|
||||||
|
|
||||||
|
const createExplorerState = (
|
||||||
|
spec: CorpusExplorerSpec,
|
||||||
|
patch: Partial<ExplorerState> = {},
|
||||||
|
): ExplorerState => ({
|
||||||
|
open: true,
|
||||||
|
title: spec.title,
|
||||||
|
description: spec.description,
|
||||||
|
emptyMessage: spec.emptyMessage ?? "No matching records found.",
|
||||||
|
records: [],
|
||||||
|
loading: false,
|
||||||
|
error: "",
|
||||||
|
...patch,
|
||||||
|
});
|
||||||
|
|
||||||
|
const compareRecordsByNewest = (a: DatasetRecord, b: DatasetRecord) => {
|
||||||
|
const aValue = String(a.dt ?? a.date ?? a.timestamp ?? "");
|
||||||
|
const bValue = String(b.dt ?? b.date ?? b.timestamp ?? "");
|
||||||
|
return bValue.localeCompare(aValue);
|
||||||
|
};
|
||||||
|
|
||||||
|
const parseJsonLikePayload = (value: string): unknown => {
|
||||||
|
const normalized = value
|
||||||
|
.replace(/\uFEFF/g, "")
|
||||||
|
.replace(/,\s*([}\]])/g, "$1")
|
||||||
|
.replace(/(:\s*)(NaN|Infinity|-Infinity)\b/g, "$1null")
|
||||||
|
.replace(/(\[\s*)(NaN|Infinity|-Infinity)\b/g, "$1null")
|
||||||
|
.replace(/(,\s*)(NaN|Infinity|-Infinity)\b/g, "$1null")
|
||||||
|
.replace(/(:\s*)None\b/g, "$1null")
|
||||||
|
.replace(/(:\s*)True\b/g, "$1true")
|
||||||
|
.replace(/(:\s*)False\b/g, "$1false")
|
||||||
|
.replace(/(\[\s*)None\b/g, "$1null")
|
||||||
|
.replace(/(\[\s*)True\b/g, "$1true")
|
||||||
|
.replace(/(\[\s*)False\b/g, "$1false")
|
||||||
|
.replace(/(,\s*)None\b/g, "$1null")
|
||||||
|
.replace(/(,\s*)True\b/g, "$1true")
|
||||||
|
.replace(/(,\s*)False\b/g, "$1false");
|
||||||
|
|
||||||
|
return JSON.parse(normalized);
|
||||||
|
};
|
||||||
|
|
||||||
|
const tryParseRecords = (value: string) => {
|
||||||
|
try {
|
||||||
|
return normalizeRecordPayload(parseJsonLikePayload(value));
|
||||||
|
} catch {
|
||||||
|
return null;
|
||||||
|
}
|
||||||
|
};
|
||||||
|
|
||||||
|
const parseRecordStringPayload = (payload: string): DatasetRecord[] | null => {
|
||||||
|
const trimmed = payload.trim();
|
||||||
|
if (!trimmed) {
|
||||||
|
return [];
|
||||||
|
}
|
||||||
|
|
||||||
|
const direct = tryParseRecords(trimmed);
|
||||||
|
if (direct) {
|
||||||
|
return direct;
|
||||||
|
}
|
||||||
|
|
||||||
|
const ndjsonLines = trimmed
|
||||||
|
.split(/\r?\n/)
|
||||||
|
.map((line) => line.trim())
|
||||||
|
.filter(Boolean);
|
||||||
|
if (ndjsonLines.length > 0) {
|
||||||
|
try {
|
||||||
|
return ndjsonLines.map((line) => parseJsonLikePayload(line)) as DatasetRecord[];
|
||||||
|
} catch {
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
const bracketStart = trimmed.indexOf("[");
|
||||||
|
const bracketEnd = trimmed.lastIndexOf("]");
|
||||||
|
if (bracketStart !== -1 && bracketEnd > bracketStart) {
|
||||||
|
const parsed = tryParseRecords(trimmed.slice(bracketStart, bracketEnd + 1));
|
||||||
|
if (parsed) {
|
||||||
|
return parsed;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
const braceStart = trimmed.indexOf("{");
|
||||||
|
const braceEnd = trimmed.lastIndexOf("}");
|
||||||
|
if (braceStart !== -1 && braceEnd > braceStart) {
|
||||||
|
const parsed = tryParseRecords(trimmed.slice(braceStart, braceEnd + 1));
|
||||||
|
if (parsed) {
|
||||||
|
return parsed;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
return null;
|
||||||
|
};
|
||||||
|
|
||||||
|
const normalizeRecordPayload = (payload: unknown): DatasetRecord[] => {
|
||||||
|
if (typeof payload === "string") {
|
||||||
|
const parsed = parseRecordStringPayload(payload);
|
||||||
|
if (parsed) {
|
||||||
|
return parsed;
|
||||||
|
}
|
||||||
|
|
||||||
|
const preview = payload.trim().slice(0, 120).replace(/\s+/g, " ");
|
||||||
|
throw new Error(
|
||||||
|
`Corpus endpoint returned a non-JSON string payload.${
|
||||||
|
preview ? ` Response preview: ${preview}` : ""
|
||||||
|
}`,
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
if (
|
||||||
|
payload &&
|
||||||
|
typeof payload === "object" &&
|
||||||
|
"error" in payload &&
|
||||||
|
typeof (payload as { error?: unknown }).error === "string"
|
||||||
|
) {
|
||||||
|
throw new Error((payload as { error: string }).error);
|
||||||
|
}
|
||||||
|
|
||||||
|
if (Array.isArray(payload)) {
|
||||||
|
return payload as DatasetRecord[];
|
||||||
|
}
|
||||||
|
|
||||||
|
if (
|
||||||
|
payload &&
|
||||||
|
typeof payload === "object" &&
|
||||||
|
"data" in payload &&
|
||||||
|
Array.isArray((payload as { data?: unknown }).data)
|
||||||
|
) {
|
||||||
|
return (payload as { data: DatasetRecord[] }).data;
|
||||||
|
}
|
||||||
|
|
||||||
|
if (
|
||||||
|
payload &&
|
||||||
|
typeof payload === "object" &&
|
||||||
|
"records" in payload &&
|
||||||
|
Array.isArray((payload as { records?: unknown }).records)
|
||||||
|
) {
|
||||||
|
return (payload as { records: DatasetRecord[] }).records;
|
||||||
|
}
|
||||||
|
|
||||||
|
if (
|
||||||
|
payload &&
|
||||||
|
typeof payload === "object" &&
|
||||||
|
"rows" in payload &&
|
||||||
|
Array.isArray((payload as { rows?: unknown }).rows)
|
||||||
|
) {
|
||||||
|
return (payload as { rows: DatasetRecord[] }).rows;
|
||||||
|
}
|
||||||
|
|
||||||
|
if (
|
||||||
|
payload &&
|
||||||
|
typeof payload === "object" &&
|
||||||
|
"result" in payload &&
|
||||||
|
Array.isArray((payload as { result?: unknown }).result)
|
||||||
|
) {
|
||||||
|
return (payload as { result: DatasetRecord[] }).result;
|
||||||
|
}
|
||||||
|
|
||||||
|
if (payload && typeof payload === "object") {
|
||||||
|
const values = Object.values(payload);
|
||||||
|
if (values.length === 1 && Array.isArray(values[0])) {
|
||||||
|
return values[0] as DatasetRecord[];
|
||||||
|
}
|
||||||
|
if (values.every((value) => value && typeof value === "object")) {
|
||||||
|
return values as DatasetRecord[];
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
throw new Error("Corpus endpoint returned an unexpected payload.");
|
||||||
|
};
|
||||||
|
|
||||||
const StatPage = () => {
|
const StatPage = () => {
|
||||||
const { datasetId: routeDatasetId } = useParams<{ datasetId: string }>();
|
const { datasetId: routeDatasetId } = useParams<{ datasetId: string }>();
|
||||||
const [error, setError] = useState('');
|
const [error, setError] = useState("");
|
||||||
const [loading, setLoading] = useState(false);
|
const [loading, setLoading] = useState(false);
|
||||||
const [activeView, setActiveView] = useState<"summary" | "emotional" | "user" | "linguistic" | "interactional" | "cultural">("summary");
|
const [activeView, setActiveView] = useState<ActiveView>("summary");
|
||||||
|
|
||||||
const [userData, setUserData] = useState<UserAnalysisResponse | null>(null);
|
const [userData, setUserData] = useState<UserEndpointResponse | null>(null);
|
||||||
const [timeData, setTimeData] = useState<TimeAnalysisResponse | null>(null);
|
const [timeData, setTimeData] = useState<TimeAnalysisResponse | null>(null);
|
||||||
const [contentData, setContentData] = useState<ContentAnalysisResponse | null>(null);
|
const [linguisticData, setLinguisticData] =
|
||||||
const [linguisticData, setLinguisticData] = useState<LinguisticAnalysisResponse | null>(null);
|
useState<LinguisticAnalysisResponse | null>(null);
|
||||||
const [interactionData, setInteractionData] = useState<InteractionAnalysisResponse | null>(null);
|
const [emotionalData, setEmotionalData] =
|
||||||
const [culturalData, setCulturalData] = useState<CulturalAnalysisResponse | null>(null);
|
useState<EmotionalAnalysisResponse | null>(null);
|
||||||
|
const [interactionData, setInteractionData] =
|
||||||
|
useState<InteractionAnalysisResponse | null>(null);
|
||||||
|
const [culturalData, setCulturalData] =
|
||||||
|
useState<CulturalAnalysisResponse | null>(null);
|
||||||
const [summary, setSummary] = useState<SummaryResponse | null>(null);
|
const [summary, setSummary] = useState<SummaryResponse | null>(null);
|
||||||
|
const [userStatsMeta, setUserStatsMeta] = useState<UserStatsMeta>({
|
||||||
|
totalUsers: 0,
|
||||||
|
mostCommentHeavyUser: null,
|
||||||
|
});
|
||||||
|
const [appliedFilters, setAppliedFilters] = useState<Record<string, string>>({});
|
||||||
|
const [allRecords, setAllRecords] = useState<DatasetRecord[] | null>(null);
|
||||||
|
const [allRecordsKey, setAllRecordsKey] = useState("");
|
||||||
|
const [explorerState, setExplorerState] = useState<ExplorerState>(
|
||||||
|
EMPTY_EXPLORER_STATE,
|
||||||
|
);
|
||||||
|
|
||||||
const searchInputRef = useRef<HTMLInputElement>(null);
|
const searchInputRef = useRef<HTMLInputElement>(null);
|
||||||
const beforeDateRef = useRef<HTMLInputElement>(null);
|
const beforeDateRef = useRef<HTMLInputElement>(null);
|
||||||
const afterDateRef = useRef<HTMLInputElement>(null);
|
const afterDateRef = useRef<HTMLInputElement>(null);
|
||||||
|
|
||||||
const parsedDatasetId = Number(routeDatasetId ?? "");
|
const parsedDatasetId = Number(routeDatasetId ?? "");
|
||||||
const datasetId = Number.isInteger(parsedDatasetId) && parsedDatasetId > 0 ? parsedDatasetId : null;
|
const datasetId =
|
||||||
|
Number.isInteger(parsedDatasetId) && parsedDatasetId > 0
|
||||||
|
? parsedDatasetId
|
||||||
|
: null;
|
||||||
|
|
||||||
const getFilterParams = () => {
|
const getFilterParams = () => {
|
||||||
const params: Record<string, string> = {};
|
const params: Record<string, string> = {};
|
||||||
@@ -83,6 +304,59 @@ const StatPage = () => {
|
|||||||
};
|
};
|
||||||
};
|
};
|
||||||
|
|
||||||
|
const getFilterKey = (params: Record<string, string>) =>
|
||||||
|
JSON.stringify(Object.entries(params).sort(([a], [b]) => a.localeCompare(b)));
|
||||||
|
|
||||||
|
const ensureFilteredRecords = async () => {
|
||||||
|
if (!datasetId) {
|
||||||
|
throw new Error("Missing dataset id.");
|
||||||
|
}
|
||||||
|
|
||||||
|
const authHeaders = getAuthHeaders();
|
||||||
|
if (!authHeaders) {
|
||||||
|
throw new Error("You must be signed in to load corpus records.");
|
||||||
|
}
|
||||||
|
|
||||||
|
const filterKey = getFilterKey(appliedFilters);
|
||||||
|
if (allRecords && allRecordsKey === filterKey) {
|
||||||
|
return allRecords;
|
||||||
|
}
|
||||||
|
|
||||||
|
const response = await axios.get<unknown>(
|
||||||
|
`${API_BASE_URL}/dataset/${datasetId}/all`,
|
||||||
|
{
|
||||||
|
params: appliedFilters,
|
||||||
|
headers: authHeaders,
|
||||||
|
},
|
||||||
|
);
|
||||||
|
|
||||||
|
const normalizedRecords = normalizeRecordPayload(response.data);
|
||||||
|
|
||||||
|
setAllRecords(normalizedRecords);
|
||||||
|
setAllRecordsKey(filterKey);
|
||||||
|
return normalizedRecords;
|
||||||
|
};
|
||||||
|
|
||||||
|
const openExplorer = async (spec: CorpusExplorerSpec) => {
|
||||||
|
setExplorerState(createExplorerState(spec, { loading: true }));
|
||||||
|
|
||||||
|
try {
|
||||||
|
const records = await ensureFilteredRecords();
|
||||||
|
const context = buildExplorerContext(records);
|
||||||
|
const matched = records
|
||||||
|
.filter((record) => spec.matcher(record, context))
|
||||||
|
.sort(compareRecordsByNewest);
|
||||||
|
|
||||||
|
setExplorerState(createExplorerState(spec, { records: matched }));
|
||||||
|
} catch (e) {
|
||||||
|
setExplorerState(
|
||||||
|
createExplorerState(spec, {
|
||||||
|
error: `Failed to load corpus records: ${String(e)}`,
|
||||||
|
}),
|
||||||
|
);
|
||||||
|
}
|
||||||
|
};
|
||||||
|
|
||||||
const getStats = (params: Record<string, string> = {}) => {
|
const getStats = (params: Record<string, string> = {}) => {
|
||||||
if (!datasetId) {
|
if (!datasetId) {
|
||||||
setError("Missing dataset id. Open /dataset/<id>/stats.");
|
setError("Missing dataset id. Open /dataset/<id>/stats.");
|
||||||
@@ -97,6 +371,10 @@ const StatPage = () => {
|
|||||||
|
|
||||||
setError("");
|
setError("");
|
||||||
setLoading(true);
|
setLoading(true);
|
||||||
|
setAppliedFilters(params);
|
||||||
|
setAllRecords(null);
|
||||||
|
setAllRecordsKey("");
|
||||||
|
setExplorerState((current) => ({ ...current, open: false }));
|
||||||
|
|
||||||
Promise.all([
|
Promise.all([
|
||||||
axios.get<TimeAnalysisResponse>(`${API_BASE_URL}/dataset/${datasetId}/temporal`, {
|
axios.get<TimeAnalysisResponse>(`${API_BASE_URL}/dataset/${datasetId}/temporal`, {
|
||||||
@@ -107,18 +385,24 @@ const StatPage = () => {
|
|||||||
params,
|
params,
|
||||||
headers: authHeaders,
|
headers: authHeaders,
|
||||||
}),
|
}),
|
||||||
axios.get<LinguisticAnalysisResponse>(`${API_BASE_URL}/dataset/${datasetId}/linguistic`, {
|
axios.get<LinguisticAnalysisResponse>(
|
||||||
params,
|
`${API_BASE_URL}/dataset/${datasetId}/linguistic`,
|
||||||
headers: authHeaders,
|
{
|
||||||
}),
|
params,
|
||||||
|
headers: authHeaders,
|
||||||
|
},
|
||||||
|
),
|
||||||
axios.get<EmotionalAnalysisResponse>(`${API_BASE_URL}/dataset/${datasetId}/emotional`, {
|
axios.get<EmotionalAnalysisResponse>(`${API_BASE_URL}/dataset/${datasetId}/emotional`, {
|
||||||
params,
|
params,
|
||||||
headers: authHeaders,
|
headers: authHeaders,
|
||||||
}),
|
}),
|
||||||
axios.get<InteractionAnalysisResponse>(`${API_BASE_URL}/dataset/${datasetId}/interactional`, {
|
axios.get<InteractionAnalysisResponse>(
|
||||||
params,
|
`${API_BASE_URL}/dataset/${datasetId}/interactional`,
|
||||||
headers: authHeaders,
|
{
|
||||||
}),
|
params,
|
||||||
|
headers: authHeaders,
|
||||||
|
},
|
||||||
|
),
|
||||||
axios.get<SummaryResponse>(`${API_BASE_URL}/dataset/${datasetId}/summary`, {
|
axios.get<SummaryResponse>(`${API_BASE_URL}/dataset/${datasetId}/summary`, {
|
||||||
params,
|
params,
|
||||||
headers: authHeaders,
|
headers: authHeaders,
|
||||||
@@ -127,85 +411,111 @@ const StatPage = () => {
|
|||||||
params,
|
params,
|
||||||
headers: authHeaders,
|
headers: authHeaders,
|
||||||
}),
|
}),
|
||||||
])
|
])
|
||||||
.then(([timeRes, userRes, linguisticRes, emotionalRes, interactionRes, summaryRes, culturalRes]) => {
|
.then(
|
||||||
const usersList = userRes.data.users ?? [];
|
([
|
||||||
const topUsersList = userRes.data.top_users ?? [];
|
timeRes,
|
||||||
const interactionGraphRaw = interactionRes.data?.interaction_graph ?? {};
|
userRes,
|
||||||
const topPairsRaw = interactionRes.data?.top_interaction_pairs ?? [];
|
linguisticRes,
|
||||||
|
emotionalRes,
|
||||||
|
interactionRes,
|
||||||
|
summaryRes,
|
||||||
|
culturalRes,
|
||||||
|
]) => {
|
||||||
|
const usersList = userRes.data.users ?? [];
|
||||||
|
const topUsersList = userRes.data.top_users ?? [];
|
||||||
|
const interactionGraphRaw = interactionRes.data?.interaction_graph ?? {};
|
||||||
|
const topPairsRaw = interactionRes.data?.top_interaction_pairs ?? [];
|
||||||
|
|
||||||
const filteredUsers: typeof usersList = [];
|
const filteredUsers: typeof usersList = [];
|
||||||
for (const user of usersList) {
|
for (const user of usersList) {
|
||||||
if (isDeletedUser(user.author)) continue;
|
if (isDeletedUser(user.author)) continue;
|
||||||
filteredUsers.push(user);
|
filteredUsers.push(user);
|
||||||
}
|
|
||||||
|
|
||||||
const filteredTopUsers: typeof topUsersList = [];
|
|
||||||
for (const user of topUsersList) {
|
|
||||||
if (isDeletedUser(user.author)) continue;
|
|
||||||
filteredTopUsers.push(user);
|
|
||||||
}
|
|
||||||
|
|
||||||
const filteredInteractionGraph: Record<string, Record<string, number>> = {};
|
|
||||||
for (const [source, targets] of Object.entries(interactionGraphRaw)) {
|
|
||||||
if (isDeletedUser(source)) {
|
|
||||||
continue;
|
|
||||||
}
|
}
|
||||||
|
|
||||||
const nextTargets: Record<string, number> = {};
|
const filteredTopUsers: typeof topUsersList = [];
|
||||||
for (const [target, count] of Object.entries(targets)) {
|
for (const user of topUsersList) {
|
||||||
if (isDeletedUser(target)) {
|
if (isDeletedUser(user.author)) continue;
|
||||||
|
filteredTopUsers.push(user);
|
||||||
|
}
|
||||||
|
|
||||||
|
let mostCommentHeavyUser: UserStatsMeta["mostCommentHeavyUser"] = null;
|
||||||
|
for (const user of filteredUsers) {
|
||||||
|
const currentShare = user.comment_share ?? 0;
|
||||||
|
if (!mostCommentHeavyUser || currentShare > mostCommentHeavyUser.commentShare) {
|
||||||
|
mostCommentHeavyUser = {
|
||||||
|
author: user.author,
|
||||||
|
commentShare: currentShare,
|
||||||
|
};
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
const topAuthors = new Set(filteredTopUsers.map((entry) => entry.author));
|
||||||
|
const summaryUsers: User[] = [];
|
||||||
|
for (const user of filteredUsers) {
|
||||||
|
if (topAuthors.has(user.author)) {
|
||||||
|
summaryUsers.push(user);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
const filteredInteractionGraph: Record<string, Record<string, number>> = {};
|
||||||
|
for (const [source, targets] of Object.entries(interactionGraphRaw)) {
|
||||||
|
if (isDeletedUser(source)) {
|
||||||
continue;
|
continue;
|
||||||
}
|
}
|
||||||
nextTargets[target] = count;
|
|
||||||
|
const nextTargets: Record<string, number> = {};
|
||||||
|
for (const [target, count] of Object.entries(targets)) {
|
||||||
|
if (isDeletedUser(target)) {
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
nextTargets[target] = count;
|
||||||
|
}
|
||||||
|
|
||||||
|
filteredInteractionGraph[source] = nextTargets;
|
||||||
}
|
}
|
||||||
|
|
||||||
filteredInteractionGraph[source] = nextTargets;
|
const filteredTopInteractionPairs: typeof topPairsRaw = [];
|
||||||
}
|
for (const pairEntry of topPairsRaw) {
|
||||||
|
const pair = pairEntry[0];
|
||||||
const filteredTopInteractionPairs: typeof topPairsRaw = [];
|
const source = pair[0];
|
||||||
for (const pairEntry of topPairsRaw) {
|
const target = pair[1];
|
||||||
const pair = pairEntry[0];
|
if (isDeletedUser(source) || isDeletedUser(target)) {
|
||||||
const source = pair[0];
|
continue;
|
||||||
const target = pair[1];
|
}
|
||||||
if (isDeletedUser(source) || isDeletedUser(target)) {
|
filteredTopInteractionPairs.push(pairEntry);
|
||||||
continue;
|
|
||||||
}
|
}
|
||||||
filteredTopInteractionPairs.push(pairEntry);
|
|
||||||
}
|
|
||||||
|
|
||||||
const combinedUserData: UserAnalysisResponse = {
|
const filteredUserData: UserEndpointResponse = {
|
||||||
...userRes.data,
|
users: summaryUsers,
|
||||||
users: filteredUsers,
|
top_users: filteredTopUsers,
|
||||||
top_users: filteredTopUsers,
|
};
|
||||||
interaction_graph: filteredInteractionGraph,
|
|
||||||
};
|
|
||||||
|
|
||||||
const combinedContentData: ContentAnalysisResponse = {
|
const filteredInteractionData: InteractionAnalysisResponse = {
|
||||||
...linguisticRes.data,
|
...interactionRes.data,
|
||||||
...emotionalRes.data,
|
interaction_graph: filteredInteractionGraph,
|
||||||
};
|
top_interaction_pairs: filteredTopInteractionPairs,
|
||||||
|
};
|
||||||
|
|
||||||
const filteredInteractionData: InteractionAnalysisResponse = {
|
const filteredSummary: SummaryResponse = {
|
||||||
...interactionRes.data,
|
...summaryRes.data,
|
||||||
interaction_graph: filteredInteractionGraph,
|
unique_users: filteredUsers.length,
|
||||||
top_interaction_pairs: filteredTopInteractionPairs,
|
};
|
||||||
};
|
|
||||||
|
|
||||||
const filteredSummary: SummaryResponse = {
|
setUserData(filteredUserData);
|
||||||
...summaryRes.data,
|
setUserStatsMeta({
|
||||||
unique_users: filteredUsers.length,
|
totalUsers: filteredUsers.length,
|
||||||
};
|
mostCommentHeavyUser,
|
||||||
|
});
|
||||||
setUserData(combinedUserData);
|
setTimeData(timeRes.data || null);
|
||||||
setTimeData(timeRes.data || null);
|
setLinguisticData(linguisticRes.data || null);
|
||||||
setContentData(combinedContentData);
|
setEmotionalData(emotionalRes.data || null);
|
||||||
setLinguisticData(linguisticRes.data || null);
|
setInteractionData(filteredInteractionData || null);
|
||||||
setInteractionData(filteredInteractionData || null);
|
setCulturalData(culturalRes.data || null);
|
||||||
setCulturalData(culturalRes.data || null);
|
setSummary(filteredSummary || null);
|
||||||
setSummary(filteredSummary || null);
|
},
|
||||||
})
|
)
|
||||||
.catch((e) => setError("Failed to load statistics: " + String(e)))
|
.catch((e) => setError(`Failed to load statistics: ${String(e)}`))
|
||||||
.finally(() => setLoading(false));
|
.finally(() => setLoading(false));
|
||||||
};
|
};
|
||||||
|
|
||||||
@@ -228,12 +538,15 @@ const StatPage = () => {
|
|||||||
|
|
||||||
useEffect(() => {
|
useEffect(() => {
|
||||||
setError("");
|
setError("");
|
||||||
|
setAllRecords(null);
|
||||||
|
setAllRecordsKey("");
|
||||||
|
setExplorerState(EMPTY_EXPLORER_STATE);
|
||||||
if (!datasetId) {
|
if (!datasetId) {
|
||||||
setError("Missing dataset id. Open /dataset/<id>/stats.");
|
setError("Missing dataset id. Open /dataset/<id>/stats.");
|
||||||
return;
|
return;
|
||||||
}
|
}
|
||||||
getStats();
|
getStats();
|
||||||
}, [datasetId])
|
}, [datasetId]);
|
||||||
|
|
||||||
if (loading) {
|
if (loading) {
|
||||||
return (
|
return (
|
||||||
@@ -243,155 +556,217 @@ const StatPage = () => {
|
|||||||
<div style={styles.loadingSpinner} />
|
<div style={styles.loadingSpinner} />
|
||||||
<div>
|
<div>
|
||||||
<h2 style={styles.loadingTitle}>Loading analytics</h2>
|
<h2 style={styles.loadingTitle}>Loading analytics</h2>
|
||||||
<p style={styles.loadingSubtitle}>Fetching summary, timeline, user, and content insights.</p>
|
<p style={styles.loadingSubtitle}>
|
||||||
|
Fetching summary, timeline, user, and content insights.
|
||||||
|
</p>
|
||||||
</div>
|
</div>
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
<div style={styles.loadingSkeleton}>
|
<div style={styles.loadingSkeleton}>
|
||||||
<div style={{ ...styles.loadingSkeletonLine, ...styles.loadingSkeletonLineLong }} />
|
<div
|
||||||
<div style={{ ...styles.loadingSkeletonLine, ...styles.loadingSkeletonLineMed }} />
|
style={{
|
||||||
<div style={{ ...styles.loadingSkeletonLine, ...styles.loadingSkeletonLineShort }} />
|
...styles.loadingSkeletonLine,
|
||||||
|
...styles.loadingSkeletonLineLong,
|
||||||
|
}}
|
||||||
|
/>
|
||||||
|
<div
|
||||||
|
style={{
|
||||||
|
...styles.loadingSkeletonLine,
|
||||||
|
...styles.loadingSkeletonLineMed,
|
||||||
|
}}
|
||||||
|
/>
|
||||||
|
<div
|
||||||
|
style={{
|
||||||
|
...styles.loadingSkeletonLine,
|
||||||
|
...styles.loadingSkeletonLineShort,
|
||||||
|
}}
|
||||||
|
/>
|
||||||
</div>
|
</div>
|
||||||
</div>
|
</div>
|
||||||
</div>
|
</div>
|
||||||
);
|
);
|
||||||
}
|
}
|
||||||
if (error) return <p style={{...styles.page}}>{error}</p>;
|
if (error) return <p style={{ ...styles.page }}>{error}</p>;
|
||||||
|
|
||||||
return (
|
return (
|
||||||
<div style={styles.page}>
|
<div style={styles.page}>
|
||||||
<div style={{ ...styles.container, ...styles.card, ...styles.headerBar }}>
|
<div style={{ ...styles.container, ...styles.card, ...styles.headerBar }}>
|
||||||
<div style={styles.controls}>
|
<div style={styles.controls}>
|
||||||
<input
|
<input
|
||||||
type="text"
|
type="text"
|
||||||
id="query"
|
id="query"
|
||||||
ref={searchInputRef}
|
ref={searchInputRef}
|
||||||
placeholder="Search events..."
|
placeholder="Search events..."
|
||||||
style={styles.input}
|
style={styles.input}
|
||||||
/>
|
/>
|
||||||
|
|
||||||
<input
|
<input
|
||||||
type="date"
|
type="date"
|
||||||
ref={beforeDateRef}
|
ref={beforeDateRef}
|
||||||
placeholder="Search before date"
|
placeholder="Search before date"
|
||||||
style={styles.input}
|
style={styles.input}
|
||||||
/>
|
/>
|
||||||
|
|
||||||
<input
|
<input
|
||||||
type="date"
|
type="date"
|
||||||
ref={afterDateRef}
|
ref={afterDateRef}
|
||||||
placeholder="Search before date"
|
placeholder="Search before date"
|
||||||
style={styles.input}
|
style={styles.input}
|
||||||
/>
|
/>
|
||||||
|
|
||||||
<button onClick={onSubmitFilters} style={styles.buttonPrimary}>
|
<button onClick={onSubmitFilters} style={styles.buttonPrimary}>
|
||||||
Search
|
Search
|
||||||
</button>
|
</button>
|
||||||
|
|
||||||
<button onClick={resetFilters} style={styles.buttonSecondary}>
|
<button onClick={resetFilters} style={styles.buttonSecondary}>
|
||||||
Reset
|
Reset
|
||||||
</button>
|
</button>
|
||||||
</div>
|
|
||||||
|
|
||||||
<div style={styles.dashboardMeta}>Analytics Dashboard</div>
|
|
||||||
<div style={styles.dashboardMeta}>Dataset #{datasetId ?? "-"}</div>
|
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
<div style={{ ...styles.container, ...styles.tabsRow, justifyContent: "center" }}>
|
<div style={styles.dashboardMeta}>Analytics Dashboard</div>
|
||||||
<button
|
<div style={styles.dashboardMeta}>Dataset #{datasetId ?? "-"}</div>
|
||||||
onClick={() => setActiveView("summary")}
|
</div>
|
||||||
style={activeView === "summary" ? styles.buttonPrimary : styles.buttonSecondary}
|
|
||||||
>
|
|
||||||
Summary
|
|
||||||
</button>
|
|
||||||
<button
|
|
||||||
onClick={() => setActiveView("emotional")}
|
|
||||||
style={activeView === "emotional" ? styles.buttonPrimary : styles.buttonSecondary}
|
|
||||||
>
|
|
||||||
Emotional
|
|
||||||
</button>
|
|
||||||
|
|
||||||
<button
|
<div
|
||||||
onClick={() => setActiveView("user")}
|
style={{
|
||||||
style={activeView === "user" ? styles.buttonPrimary : styles.buttonSecondary}
|
...styles.container,
|
||||||
|
...styles.tabsRow,
|
||||||
|
justifyContent: "center",
|
||||||
|
}}
|
||||||
>
|
>
|
||||||
Users
|
<button
|
||||||
</button>
|
onClick={() => setActiveView("summary")}
|
||||||
<button
|
style={
|
||||||
onClick={() => setActiveView("linguistic")}
|
activeView === "summary" ? styles.buttonPrimary : styles.buttonSecondary
|
||||||
style={activeView === "linguistic" ? styles.buttonPrimary : styles.buttonSecondary}
|
}
|
||||||
>
|
>
|
||||||
Linguistic
|
Summary
|
||||||
</button>
|
</button>
|
||||||
<button
|
<button
|
||||||
onClick={() => setActiveView("interactional")}
|
onClick={() => setActiveView("emotional")}
|
||||||
style={activeView === "interactional" ? styles.buttonPrimary : styles.buttonSecondary}
|
style={
|
||||||
>
|
activeView === "emotional"
|
||||||
Interactional
|
? styles.buttonPrimary
|
||||||
</button>
|
: styles.buttonSecondary
|
||||||
<button
|
}
|
||||||
onClick={() => setActiveView("cultural")}
|
>
|
||||||
style={activeView === "cultural" ? styles.buttonPrimary : styles.buttonSecondary}
|
Emotional
|
||||||
>
|
</button>
|
||||||
Cultural
|
|
||||||
</button>
|
|
||||||
</div>
|
|
||||||
|
|
||||||
{activeView === "summary" && (
|
<button
|
||||||
<SummaryStats
|
onClick={() => setActiveView("user")}
|
||||||
userData={userData}
|
style={activeView === "user" ? styles.buttonPrimary : styles.buttonSecondary}
|
||||||
timeData={timeData}
|
>
|
||||||
contentData={contentData}
|
Users
|
||||||
summary={summary}
|
</button>
|
||||||
|
<button
|
||||||
|
onClick={() => setActiveView("linguistic")}
|
||||||
|
style={
|
||||||
|
activeView === "linguistic"
|
||||||
|
? styles.buttonPrimary
|
||||||
|
: styles.buttonSecondary
|
||||||
|
}
|
||||||
|
>
|
||||||
|
Linguistic
|
||||||
|
</button>
|
||||||
|
<button
|
||||||
|
onClick={() => setActiveView("interactional")}
|
||||||
|
style={
|
||||||
|
activeView === "interactional"
|
||||||
|
? styles.buttonPrimary
|
||||||
|
: styles.buttonSecondary
|
||||||
|
}
|
||||||
|
>
|
||||||
|
Interactional
|
||||||
|
</button>
|
||||||
|
<button
|
||||||
|
onClick={() => setActiveView("cultural")}
|
||||||
|
style={
|
||||||
|
activeView === "cultural" ? styles.buttonPrimary : styles.buttonSecondary
|
||||||
|
}
|
||||||
|
>
|
||||||
|
Cultural
|
||||||
|
</button>
|
||||||
|
</div>
|
||||||
|
|
||||||
|
{activeView === "summary" && (
|
||||||
|
<SummaryStats
|
||||||
|
userData={userData}
|
||||||
|
timeData={timeData}
|
||||||
|
linguisticData={linguisticData}
|
||||||
|
summary={summary}
|
||||||
|
onExplore={openExplorer}
|
||||||
|
/>
|
||||||
|
)}
|
||||||
|
|
||||||
|
{activeView === "emotional" && emotionalData && (
|
||||||
|
<EmotionalStats emotionalData={emotionalData} onExplore={openExplorer} />
|
||||||
|
)}
|
||||||
|
|
||||||
|
{activeView === "emotional" && !emotionalData && (
|
||||||
|
<div style={{ ...styles.container, ...styles.card, marginTop: 16 }}>
|
||||||
|
No emotional data available.
|
||||||
|
</div>
|
||||||
|
)}
|
||||||
|
|
||||||
|
{activeView === "user" && userData && interactionData && (
|
||||||
|
<UserStats
|
||||||
|
topUsers={userData.top_users}
|
||||||
|
interactionGraph={interactionData.interaction_graph}
|
||||||
|
totalUsers={userStatsMeta.totalUsers}
|
||||||
|
mostCommentHeavyUser={userStatsMeta.mostCommentHeavyUser}
|
||||||
|
onExplore={openExplorer}
|
||||||
|
/>
|
||||||
|
)}
|
||||||
|
|
||||||
|
{activeView === "user" && (!userData || !interactionData) && (
|
||||||
|
<div style={{ ...styles.container, ...styles.card, marginTop: 16 }}>
|
||||||
|
No user network data available.
|
||||||
|
</div>
|
||||||
|
)}
|
||||||
|
|
||||||
|
{activeView === "linguistic" && linguisticData && (
|
||||||
|
<LinguisticStats data={linguisticData} onExplore={openExplorer} />
|
||||||
|
)}
|
||||||
|
|
||||||
|
{activeView === "linguistic" && !linguisticData && (
|
||||||
|
<div style={{ ...styles.container, ...styles.card, marginTop: 16 }}>
|
||||||
|
No linguistic data available.
|
||||||
|
</div>
|
||||||
|
)}
|
||||||
|
|
||||||
|
{activeView === "interactional" && interactionData && (
|
||||||
|
<InteractionalStats data={interactionData} />
|
||||||
|
)}
|
||||||
|
|
||||||
|
{activeView === "interactional" && !interactionData && (
|
||||||
|
<div style={{ ...styles.container, ...styles.card, marginTop: 16 }}>
|
||||||
|
No interactional data available.
|
||||||
|
</div>
|
||||||
|
)}
|
||||||
|
|
||||||
|
{activeView === "cultural" && culturalData && (
|
||||||
|
<CulturalStats data={culturalData} onExplore={openExplorer} />
|
||||||
|
)}
|
||||||
|
|
||||||
|
{activeView === "cultural" && !culturalData && (
|
||||||
|
<div style={{ ...styles.container, ...styles.card, marginTop: 16 }}>
|
||||||
|
No cultural data available.
|
||||||
|
</div>
|
||||||
|
)}
|
||||||
|
|
||||||
|
<CorpusExplorer
|
||||||
|
open={explorerState.open}
|
||||||
|
onClose={() => setExplorerState((current) => ({ ...current, open: false }))}
|
||||||
|
title={explorerState.title}
|
||||||
|
description={explorerState.description}
|
||||||
|
records={explorerState.records}
|
||||||
|
loading={explorerState.loading}
|
||||||
|
error={explorerState.error}
|
||||||
|
emptyMessage={explorerState.emptyMessage}
|
||||||
/>
|
/>
|
||||||
)}
|
</div>
|
||||||
|
);
|
||||||
{activeView === "emotional" && contentData && (
|
};
|
||||||
<EmotionalStats contentData={contentData} />
|
|
||||||
)}
|
|
||||||
|
|
||||||
{activeView === "emotional" && !contentData && (
|
|
||||||
<div style={{ ...styles.container, ...styles.card, marginTop: 16 }}>
|
|
||||||
No emotional data available.
|
|
||||||
</div>
|
|
||||||
)}
|
|
||||||
|
|
||||||
{activeView === "user" && userData && (
|
|
||||||
<UserStats data={userData} />
|
|
||||||
)}
|
|
||||||
|
|
||||||
{activeView === "linguistic" && linguisticData && (
|
|
||||||
<LinguisticStats data={linguisticData} />
|
|
||||||
)}
|
|
||||||
|
|
||||||
{activeView === "linguistic" && !linguisticData && (
|
|
||||||
<div style={{ ...styles.container, ...styles.card, marginTop: 16 }}>
|
|
||||||
No linguistic data available.
|
|
||||||
</div>
|
|
||||||
)}
|
|
||||||
|
|
||||||
{activeView === "interactional" && interactionData && (
|
|
||||||
<InteractionalStats data={interactionData} />
|
|
||||||
)}
|
|
||||||
|
|
||||||
{activeView === "interactional" && !interactionData && (
|
|
||||||
<div style={{ ...styles.container, ...styles.card, marginTop: 16 }}>
|
|
||||||
No interactional data available.
|
|
||||||
</div>
|
|
||||||
)}
|
|
||||||
|
|
||||||
{activeView === "cultural" && culturalData && (
|
|
||||||
<CulturalStats data={culturalData} />
|
|
||||||
)}
|
|
||||||
|
|
||||||
{activeView === "cultural" && !culturalData && (
|
|
||||||
<div style={{ ...styles.container, ...styles.card, marginTop: 16 }}>
|
|
||||||
No cultural data available.
|
|
||||||
</div>
|
|
||||||
)}
|
|
||||||
|
|
||||||
</div>
|
|
||||||
);
|
|
||||||
}
|
|
||||||
|
|
||||||
export default StatPage;
|
export default StatPage;
|
||||||
|
|||||||
@@ -4,7 +4,7 @@ import { useNavigate } from "react-router-dom";
|
|||||||
import StatsStyling from "../styles/stats_styling";
|
import StatsStyling from "../styles/stats_styling";
|
||||||
|
|
||||||
const styles = StatsStyling;
|
const styles = StatsStyling;
|
||||||
const API_BASE_URL = import.meta.env.VITE_BACKEND_URL
|
const API_BASE_URL = import.meta.env.VITE_BACKEND_URL;
|
||||||
|
|
||||||
const UploadPage = () => {
|
const UploadPage = () => {
|
||||||
const [datasetName, setDatasetName] = useState("");
|
const [datasetName, setDatasetName] = useState("");
|
||||||
@@ -40,16 +40,20 @@ const UploadPage = () => {
|
|||||||
setHasError(false);
|
setHasError(false);
|
||||||
setReturnMessage("");
|
setReturnMessage("");
|
||||||
|
|
||||||
const response = await axios.post(`${API_BASE_URL}/datasets/upload`, formData, {
|
const response = await axios.post(
|
||||||
headers: {
|
`${API_BASE_URL}/datasets/upload`,
|
||||||
"Content-Type": "multipart/form-data",
|
formData,
|
||||||
|
{
|
||||||
|
headers: {
|
||||||
|
"Content-Type": "multipart/form-data",
|
||||||
|
},
|
||||||
},
|
},
|
||||||
});
|
);
|
||||||
|
|
||||||
const datasetId = Number(response.data.dataset_id);
|
const datasetId = Number(response.data.dataset_id);
|
||||||
|
|
||||||
setReturnMessage(
|
setReturnMessage(
|
||||||
`Upload queued successfully (dataset #${datasetId}). Redirecting to processing status...`
|
`Upload queued successfully (dataset #${datasetId}). Redirecting to processing status...`,
|
||||||
);
|
);
|
||||||
|
|
||||||
setTimeout(() => {
|
setTimeout(() => {
|
||||||
@@ -58,7 +62,9 @@ const UploadPage = () => {
|
|||||||
} catch (error: unknown) {
|
} catch (error: unknown) {
|
||||||
setHasError(true);
|
setHasError(true);
|
||||||
if (axios.isAxiosError(error)) {
|
if (axios.isAxiosError(error)) {
|
||||||
const message = String(error.response?.data?.error || error.message || "Upload failed.");
|
const message = String(
|
||||||
|
error.response?.data?.error || error.message || "Upload failed.",
|
||||||
|
);
|
||||||
setReturnMessage(`Upload failed: ${message}`);
|
setReturnMessage(`Upload failed: ${message}`);
|
||||||
} else {
|
} else {
|
||||||
setReturnMessage("Upload failed due to an unexpected error.");
|
setReturnMessage("Upload failed due to an unexpected error.");
|
||||||
@@ -75,12 +81,16 @@ const UploadPage = () => {
|
|||||||
<div>
|
<div>
|
||||||
<h1 style={styles.sectionHeaderTitle}>Upload Dataset</h1>
|
<h1 style={styles.sectionHeaderTitle}>Upload Dataset</h1>
|
||||||
<p style={styles.sectionHeaderSubtitle}>
|
<p style={styles.sectionHeaderSubtitle}>
|
||||||
Name your dataset, then upload posts and topic map files to generate analytics.
|
Name your dataset, then upload posts and topic map files to
|
||||||
|
generate analytics.
|
||||||
</p>
|
</p>
|
||||||
</div>
|
</div>
|
||||||
<button
|
<button
|
||||||
type="button"
|
type="button"
|
||||||
style={{ ...styles.buttonPrimary, opacity: isSubmitting ? 0.75 : 1 }}
|
style={{
|
||||||
|
...styles.buttonPrimary,
|
||||||
|
opacity: isSubmitting ? 0.75 : 1,
|
||||||
|
}}
|
||||||
onClick={uploadFiles}
|
onClick={uploadFiles}
|
||||||
disabled={isSubmitting}
|
disabled={isSubmitting}
|
||||||
>
|
>
|
||||||
@@ -96,8 +106,12 @@ const UploadPage = () => {
|
|||||||
}}
|
}}
|
||||||
>
|
>
|
||||||
<div style={{ ...styles.card, gridColumn: "auto" }}>
|
<div style={{ ...styles.card, gridColumn: "auto" }}>
|
||||||
<h2 style={{ ...styles.sectionTitle, color: "#24292f" }}>Dataset Name</h2>
|
<h2 style={{ ...styles.sectionTitle, color: "#24292f" }}>
|
||||||
<p style={styles.sectionSubtitle}>Use a clear label so you can identify this upload later.</p>
|
Dataset Name
|
||||||
|
</h2>
|
||||||
|
<p style={styles.sectionSubtitle}>
|
||||||
|
Use a clear label so you can identify this upload later.
|
||||||
|
</p>
|
||||||
<input
|
<input
|
||||||
style={{ ...styles.input, ...styles.inputFullWidth }}
|
style={{ ...styles.input, ...styles.inputFullWidth }}
|
||||||
type="text"
|
type="text"
|
||||||
@@ -108,8 +122,12 @@ const UploadPage = () => {
|
|||||||
</div>
|
</div>
|
||||||
|
|
||||||
<div style={{ ...styles.card, gridColumn: "auto" }}>
|
<div style={{ ...styles.card, gridColumn: "auto" }}>
|
||||||
<h2 style={{ ...styles.sectionTitle, color: "#24292f" }}>Posts File (.jsonl)</h2>
|
<h2 style={{ ...styles.sectionTitle, color: "#24292f" }}>
|
||||||
<p style={styles.sectionSubtitle}>Upload the raw post records export.</p>
|
Posts File (.jsonl)
|
||||||
|
</h2>
|
||||||
|
<p style={styles.sectionSubtitle}>
|
||||||
|
Upload the raw post records export.
|
||||||
|
</p>
|
||||||
<input
|
<input
|
||||||
style={{ ...styles.input, ...styles.inputFullWidth }}
|
style={{ ...styles.input, ...styles.inputFullWidth }}
|
||||||
type="file"
|
type="file"
|
||||||
@@ -122,16 +140,24 @@ const UploadPage = () => {
|
|||||||
</div>
|
</div>
|
||||||
|
|
||||||
<div style={{ ...styles.card, gridColumn: "auto" }}>
|
<div style={{ ...styles.card, gridColumn: "auto" }}>
|
||||||
<h2 style={{ ...styles.sectionTitle, color: "#24292f" }}>Topics File (.json)</h2>
|
<h2 style={{ ...styles.sectionTitle, color: "#24292f" }}>
|
||||||
<p style={styles.sectionSubtitle}>Upload your topic bucket mapping file.</p>
|
Topics File (.json)
|
||||||
|
</h2>
|
||||||
|
<p style={styles.sectionSubtitle}>
|
||||||
|
Upload your topic bucket mapping file.
|
||||||
|
</p>
|
||||||
<input
|
<input
|
||||||
style={{ ...styles.input, ...styles.inputFullWidth }}
|
style={{ ...styles.input, ...styles.inputFullWidth }}
|
||||||
type="file"
|
type="file"
|
||||||
accept=".json"
|
accept=".json"
|
||||||
onChange={(event) => setTopicBucketFile(event.target.files?.[0] ?? null)}
|
onChange={(event) =>
|
||||||
|
setTopicBucketFile(event.target.files?.[0] ?? null)
|
||||||
|
}
|
||||||
/>
|
/>
|
||||||
<p style={styles.subtleBodyText}>
|
<p style={styles.subtleBodyText}>
|
||||||
{topicBucketFile ? `Selected: ${topicBucketFile.name}` : "No file selected"}
|
{topicBucketFile
|
||||||
|
? `Selected: ${topicBucketFile.name}`
|
||||||
|
: "No file selected"}
|
||||||
</p>
|
</p>
|
||||||
</div>
|
</div>
|
||||||
</div>
|
</div>
|
||||||
@@ -143,7 +169,8 @@ const UploadPage = () => {
|
|||||||
...(hasError ? styles.alertCardError : styles.alertCardInfo),
|
...(hasError ? styles.alertCardError : styles.alertCardInfo),
|
||||||
}}
|
}}
|
||||||
>
|
>
|
||||||
{returnMessage || "After upload, your dataset is queued for processing and you'll land on stats."}
|
{returnMessage ||
|
||||||
|
"After upload, your dataset is queued for processing and you'll land on stats."}
|
||||||
</div>
|
</div>
|
||||||
</div>
|
</div>
|
||||||
</div>
|
</div>
|
||||||
|
|||||||
@@ -1,4 +1,5 @@
|
|||||||
import { ResponsiveHeatMap } from "@nivo/heatmap";
|
import { ResponsiveHeatMap } from "@nivo/heatmap";
|
||||||
|
import { memo, useMemo } from "react";
|
||||||
|
|
||||||
type ApiRow = Record<number, number>;
|
type ApiRow = Record<number, number>;
|
||||||
type ActivityHeatmapProps = {
|
type ActivityHeatmapProps = {
|
||||||
@@ -25,8 +26,7 @@ const DAYS = [
|
|||||||
"Sunday",
|
"Sunday",
|
||||||
];
|
];
|
||||||
|
|
||||||
const hourLabel = (h: number) =>
|
const hourLabel = (h: number) => `${h.toString().padStart(2, "0")}:00`;
|
||||||
`${h.toString().padStart(2, "0")}:00`;
|
|
||||||
|
|
||||||
const convertWeeklyData = (dataset: ApiRow[]): ChartSeries[] => {
|
const convertWeeklyData = (dataset: ApiRow[]): ChartSeries[] => {
|
||||||
return dataset.map((dayData, index) => ({
|
return dataset.map((dayData, index) => ({
|
||||||
@@ -40,32 +40,37 @@ const convertWeeklyData = (dataset: ApiRow[]): ChartSeries[] => {
|
|||||||
}));
|
}));
|
||||||
};
|
};
|
||||||
|
|
||||||
|
|
||||||
const ActivityHeatmap = ({ data }: ActivityHeatmapProps) => {
|
const ActivityHeatmap = ({ data }: ActivityHeatmapProps) => {
|
||||||
const convertedData = convertWeeklyData(data);
|
const convertedData = useMemo(() => convertWeeklyData(data), [data]);
|
||||||
|
|
||||||
const maxValue = Math.max(
|
const maxValue = useMemo(() => {
|
||||||
...convertedData.flatMap(day =>
|
let max = 0;
|
||||||
day.data.map(point => point.y)
|
for (const day of convertedData) {
|
||||||
)
|
for (const point of day.data) {
|
||||||
|
if (point.y > max) {
|
||||||
|
max = point.y;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return max;
|
||||||
|
}, [convertedData]);
|
||||||
|
|
||||||
|
return (
|
||||||
|
<ResponsiveHeatMap
|
||||||
|
data={convertedData}
|
||||||
|
valueFormat=">-.2s"
|
||||||
|
axisTop={{ tickRotation: -90 }}
|
||||||
|
axisRight={{ legend: "Weekday", legendOffset: 70 }}
|
||||||
|
axisLeft={{ legend: "Weekday", legendOffset: -72 }}
|
||||||
|
colors={{
|
||||||
|
type: "diverging",
|
||||||
|
scheme: "red_yellow_blue",
|
||||||
|
divergeAt: 0.3,
|
||||||
|
minValue: 0,
|
||||||
|
maxValue: maxValue,
|
||||||
|
}}
|
||||||
|
/>
|
||||||
);
|
);
|
||||||
|
};
|
||||||
|
|
||||||
return (
|
export default memo(ActivityHeatmap);
|
||||||
<ResponsiveHeatMap
|
|
||||||
data={convertedData}
|
|
||||||
valueFormat=">-.2s"
|
|
||||||
axisTop={{ tickRotation: -90 }}
|
|
||||||
axisRight={{ legend: 'Weekday', legendOffset: 70 }}
|
|
||||||
axisLeft={{ legend: 'Weekday', legendOffset: -72 }}
|
|
||||||
colors={{
|
|
||||||
type: 'diverging',
|
|
||||||
scheme: 'red_yellow_blue',
|
|
||||||
divergeAt: 0.3,
|
|
||||||
minValue: 0,
|
|
||||||
maxValue: maxValue
|
|
||||||
}}
|
|
||||||
/>
|
|
||||||
)
|
|
||||||
}
|
|
||||||
|
|
||||||
export default ActivityHeatmap;
|
|
||||||
|
|||||||
@@ -17,7 +17,7 @@ type Emotion = {
|
|||||||
emotion_sadness: number;
|
emotion_sadness: number;
|
||||||
};
|
};
|
||||||
|
|
||||||
// User
|
// User
|
||||||
type TopUser = {
|
type TopUser = {
|
||||||
author: string;
|
author: string;
|
||||||
source: string;
|
source: string;
|
||||||
@@ -34,12 +34,19 @@ type Vocab = {
|
|||||||
top_words: FrequencyWord[];
|
top_words: FrequencyWord[];
|
||||||
};
|
};
|
||||||
|
|
||||||
|
type DominantTopic = {
|
||||||
|
topic: string;
|
||||||
|
count: number;
|
||||||
|
};
|
||||||
|
|
||||||
type User = {
|
type User = {
|
||||||
author: string;
|
author: string;
|
||||||
post: number;
|
post: number;
|
||||||
comment: number;
|
comment: number;
|
||||||
comment_post_ratio: number;
|
comment_post_ratio: number;
|
||||||
comment_share: number;
|
comment_share: number;
|
||||||
|
avg_emotions?: Record<string, number>;
|
||||||
|
dominant_topic?: DominantTopic | null;
|
||||||
vocab?: Vocab | null;
|
vocab?: Vocab | null;
|
||||||
};
|
};
|
||||||
|
|
||||||
@@ -56,7 +63,7 @@ type UserAnalysisResponse = {
|
|||||||
interaction_graph: InteractionGraph;
|
interaction_graph: InteractionGraph;
|
||||||
};
|
};
|
||||||
|
|
||||||
// Time
|
// Time
|
||||||
type EventsPerDay = {
|
type EventsPerDay = {
|
||||||
date: Date;
|
date: Date;
|
||||||
count: number;
|
count: number;
|
||||||
@@ -124,7 +131,7 @@ type EmotionalAnalysisResponse = {
|
|||||||
emotion_by_source?: EmotionBySource[];
|
emotion_by_source?: EmotionBySource[];
|
||||||
};
|
};
|
||||||
|
|
||||||
// Interactional
|
// Interactional
|
||||||
type ConversationConcentration = {
|
type ConversationConcentration = {
|
||||||
total_commenting_authors: number;
|
total_commenting_authors: number;
|
||||||
top_10pct_author_count: number;
|
top_10pct_author_count: number;
|
||||||
@@ -134,7 +141,6 @@ type ConversationConcentration = {
|
|||||||
};
|
};
|
||||||
|
|
||||||
type InteractionAnalysisResponse = {
|
type InteractionAnalysisResponse = {
|
||||||
average_thread_depth?: number;
|
|
||||||
top_interaction_pairs?: [[string, string], number][];
|
top_interaction_pairs?: [[string, string], number][];
|
||||||
conversation_concentration?: ConversationConcentration;
|
conversation_concentration?: ConversationConcentration;
|
||||||
interaction_graph: InteractionGraph;
|
interaction_graph: InteractionGraph;
|
||||||
@@ -162,6 +168,10 @@ type StanceMarkers = {
|
|||||||
certainty_per_1k_tokens: number;
|
certainty_per_1k_tokens: number;
|
||||||
deontic_per_1k_tokens: number;
|
deontic_per_1k_tokens: number;
|
||||||
permission_per_1k_tokens: number;
|
permission_per_1k_tokens: number;
|
||||||
|
hedge_emotion_avg?: Record<string, number>;
|
||||||
|
certainty_emotion_avg?: Record<string, number>;
|
||||||
|
deontic_emotion_avg?: Record<string, number>;
|
||||||
|
permission_emotion_avg?: Record<string, number>;
|
||||||
};
|
};
|
||||||
|
|
||||||
type EntityEmotionAggregate = {
|
type EntityEmotionAggregate = {
|
||||||
@@ -179,7 +189,7 @@ type CulturalAnalysisResponse = {
|
|||||||
avg_emotion_per_entity?: AverageEmotionPerEntity;
|
avg_emotion_per_entity?: AverageEmotionPerEntity;
|
||||||
};
|
};
|
||||||
|
|
||||||
// Summary
|
// Summary
|
||||||
type SummaryResponse = {
|
type SummaryResponse = {
|
||||||
total_events: number;
|
total_events: number;
|
||||||
total_posts: number;
|
total_posts: number;
|
||||||
@@ -194,7 +204,7 @@ type SummaryResponse = {
|
|||||||
sources: string[];
|
sources: string[];
|
||||||
};
|
};
|
||||||
|
|
||||||
// Filter
|
// Filter
|
||||||
type FilterResponse = {
|
type FilterResponse = {
|
||||||
rows: number;
|
rows: number;
|
||||||
data: any;
|
data: any;
|
||||||
@@ -202,6 +212,7 @@ type FilterResponse = {
|
|||||||
|
|
||||||
export type {
|
export type {
|
||||||
TopUser,
|
TopUser,
|
||||||
|
DominantTopic,
|
||||||
Vocab,
|
Vocab,
|
||||||
User,
|
User,
|
||||||
InteractionGraph,
|
InteractionGraph,
|
||||||
|
|||||||
371
frontend/src/utils/corpusExplorer.ts
Normal file
@@ -0,0 +1,371 @@
|
|||||||
|
type EntityRecord = {
|
||||||
|
text?: string;
|
||||||
|
[key: string]: unknown;
|
||||||
|
};
|
||||||
|
|
||||||
|
type DatasetRecord = {
|
||||||
|
id?: string | number;
|
||||||
|
post_id?: string | number | null;
|
||||||
|
parent_id?: string | number | null;
|
||||||
|
author?: string | null;
|
||||||
|
title?: string | null;
|
||||||
|
content?: string | null;
|
||||||
|
timestamp?: string | number | null;
|
||||||
|
date?: string | null;
|
||||||
|
dt?: string | null;
|
||||||
|
hour?: number | null;
|
||||||
|
weekday?: string | null;
|
||||||
|
reply_to?: string | number | null;
|
||||||
|
source?: string | null;
|
||||||
|
topic?: string | null;
|
||||||
|
topic_confidence?: number | null;
|
||||||
|
type?: string | null;
|
||||||
|
ner_entities?: EntityRecord[] | null;
|
||||||
|
emotion_anger?: number | null;
|
||||||
|
emotion_disgust?: number | null;
|
||||||
|
emotion_fear?: number | null;
|
||||||
|
emotion_joy?: number | null;
|
||||||
|
emotion_sadness?: number | null;
|
||||||
|
[key: string]: unknown;
|
||||||
|
};
|
||||||
|
|
||||||
|
type CorpusExplorerContext = {
|
||||||
|
authorByPostId: Map<string, string>;
|
||||||
|
authorEventCounts: Map<string, number>;
|
||||||
|
authorCommentCounts: Map<string, number>;
|
||||||
|
};
|
||||||
|
|
||||||
|
type CorpusExplorerSpec = {
|
||||||
|
title: string;
|
||||||
|
description: string;
|
||||||
|
emptyMessage?: string;
|
||||||
|
matcher: (record: DatasetRecord, context: CorpusExplorerContext) => boolean;
|
||||||
|
};
|
||||||
|
|
||||||
|
const IN_GROUP_PATTERN = /\b(we|us|our|ourselves)\b/gi;
|
||||||
|
const OUT_GROUP_PATTERN = /\b(they|them|their|themselves)\b/gi;
|
||||||
|
const HEDGE_PATTERN = /\b(maybe|perhaps|possibly|probably|likely|seems|seem|i think|i feel|i guess|kind of|sort of|somewhat)\b/i;
|
||||||
|
const CERTAINTY_PATTERN = /\b(definitely|certainly|clearly|obviously|undeniably|always|never)\b/i;
|
||||||
|
const DEONTIC_PATTERN = /\b(must|should|need|needs|have to|has to|ought|required|require)\b/i;
|
||||||
|
const PERMISSION_PATTERN = /\b(can|allowed|okay|ok|permitted)\b/i;
|
||||||
|
const EMOTION_KEYS = [
|
||||||
|
"emotion_anger",
|
||||||
|
"emotion_disgust",
|
||||||
|
"emotion_fear",
|
||||||
|
"emotion_joy",
|
||||||
|
"emotion_sadness",
|
||||||
|
] as const;
|
||||||
|
|
||||||
|
const toText = (value: unknown) => {
|
||||||
|
if (typeof value === "string") {
|
||||||
|
return value;
|
||||||
|
}
|
||||||
|
|
||||||
|
if (typeof value === "number" || typeof value === "boolean") {
|
||||||
|
return String(value);
|
||||||
|
}
|
||||||
|
|
||||||
|
if (value && typeof value === "object" && "id" in value) {
|
||||||
|
const id = (value as { id?: unknown }).id;
|
||||||
|
if (typeof id === "string" || typeof id === "number") {
|
||||||
|
return String(id);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
return "";
|
||||||
|
};
|
||||||
|
|
||||||
|
const normalize = (value: unknown) => toText(value).trim().toLowerCase();
|
||||||
|
const getAuthor = (record: DatasetRecord) => toText(record.author).trim();
|
||||||
|
|
||||||
|
const getRecordText = (record: DatasetRecord) =>
|
||||||
|
`${record.title ?? ""} ${record.content ?? ""}`.trim();
|
||||||
|
|
||||||
|
const escapeRegExp = (value: string) =>
|
||||||
|
value.replace(/[.*+?^${}()|[\]\\]/g, "\\$&");
|
||||||
|
|
||||||
|
const buildPhrasePattern = (phrase: string) => {
|
||||||
|
const tokens = phrase
|
||||||
|
.toLowerCase()
|
||||||
|
.trim()
|
||||||
|
.split(/\s+/)
|
||||||
|
.filter(Boolean)
|
||||||
|
.map(escapeRegExp);
|
||||||
|
|
||||||
|
if (!tokens.length) {
|
||||||
|
return null;
|
||||||
|
}
|
||||||
|
|
||||||
|
return new RegExp(`\\b${tokens.join("\\s+")}\\b`, "i");
|
||||||
|
};
|
||||||
|
|
||||||
|
const countMatches = (pattern: RegExp, text: string) =>
|
||||||
|
Array.from(text.matchAll(new RegExp(pattern.source, "gi"))).length;
|
||||||
|
|
||||||
|
const getDateBucket = (record: DatasetRecord) => {
|
||||||
|
if (typeof record.date === "string" && record.date) {
|
||||||
|
return record.date.slice(0, 10);
|
||||||
|
}
|
||||||
|
|
||||||
|
if (typeof record.dt === "string" && record.dt) {
|
||||||
|
return record.dt.slice(0, 10);
|
||||||
|
}
|
||||||
|
|
||||||
|
if (typeof record.timestamp === "number") {
|
||||||
|
return new Date(record.timestamp * 1000).toISOString().slice(0, 10);
|
||||||
|
}
|
||||||
|
|
||||||
|
if (typeof record.timestamp === "string" && record.timestamp) {
|
||||||
|
const numeric = Number(record.timestamp);
|
||||||
|
if (Number.isFinite(numeric)) {
|
||||||
|
return new Date(numeric * 1000).toISOString().slice(0, 10);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
return "";
|
||||||
|
};
|
||||||
|
|
||||||
|
const getDominantEmotion = (record: DatasetRecord) => {
|
||||||
|
let bestKey = "";
|
||||||
|
let bestValue = Number.NEGATIVE_INFINITY;
|
||||||
|
|
||||||
|
for (const key of EMOTION_KEYS) {
|
||||||
|
const value = Number(record[key] ?? Number.NEGATIVE_INFINITY);
|
||||||
|
if (value > bestValue) {
|
||||||
|
bestValue = value;
|
||||||
|
bestKey = key;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
return bestKey.replace("emotion_", "");
|
||||||
|
};
|
||||||
|
|
||||||
|
const matchesPhrase = (record: DatasetRecord, phrase: string) => {
|
||||||
|
const pattern = buildPhrasePattern(phrase);
|
||||||
|
if (!pattern) {
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
|
||||||
|
return pattern.test(getRecordText(record));
|
||||||
|
};
|
||||||
|
|
||||||
|
const recordIdentityBucket = (record: DatasetRecord) => {
|
||||||
|
const text = getRecordText(record);
|
||||||
|
const inHits = countMatches(IN_GROUP_PATTERN, text);
|
||||||
|
const outHits = countMatches(OUT_GROUP_PATTERN, text);
|
||||||
|
|
||||||
|
if (inHits > outHits) {
|
||||||
|
return "in";
|
||||||
|
}
|
||||||
|
|
||||||
|
if (outHits > inHits) {
|
||||||
|
return "out";
|
||||||
|
}
|
||||||
|
|
||||||
|
return "tie";
|
||||||
|
};
|
||||||
|
|
||||||
|
const buildExplorerContext = (records: DatasetRecord[]): CorpusExplorerContext => {
|
||||||
|
const authorByPostId = new Map<string, string>();
|
||||||
|
const authorEventCounts = new Map<string, number>();
|
||||||
|
const authorCommentCounts = new Map<string, number>();
|
||||||
|
|
||||||
|
for (const record of records) {
|
||||||
|
const author = getAuthor(record);
|
||||||
|
if (!author) {
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
|
||||||
|
authorEventCounts.set(author, (authorEventCounts.get(author) ?? 0) + 1);
|
||||||
|
|
||||||
|
if (record.type === "comment") {
|
||||||
|
authorCommentCounts.set(author, (authorCommentCounts.get(author) ?? 0) + 1);
|
||||||
|
}
|
||||||
|
|
||||||
|
if (record.post_id !== null && record.post_id !== undefined) {
|
||||||
|
authorByPostId.set(String(record.post_id), author);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
return { authorByPostId, authorEventCounts, authorCommentCounts };
|
||||||
|
};
|
||||||
|
|
||||||
|
const buildAllRecordsSpec = (): CorpusExplorerSpec => ({
|
||||||
|
title: "Corpus Explorer",
|
||||||
|
description: "All records in the current filtered dataset.",
|
||||||
|
emptyMessage: "No records match the current filters.",
|
||||||
|
matcher: () => true,
|
||||||
|
});
|
||||||
|
|
||||||
|
const buildUserSpec = (author: string): CorpusExplorerSpec => {
|
||||||
|
const target = normalize(author);
|
||||||
|
|
||||||
|
return {
|
||||||
|
title: `User: ${author}`,
|
||||||
|
description: `All records authored by ${author}.`,
|
||||||
|
emptyMessage: `No records found for ${author}.`,
|
||||||
|
matcher: (record) => normalize(record.author) === target,
|
||||||
|
};
|
||||||
|
};
|
||||||
|
|
||||||
|
const buildTopicSpec = (topic: string): CorpusExplorerSpec => {
|
||||||
|
const target = normalize(topic);
|
||||||
|
|
||||||
|
return {
|
||||||
|
title: `Topic: ${topic}`,
|
||||||
|
description: `Records assigned to the ${topic} topic bucket.`,
|
||||||
|
emptyMessage: `No records found in the ${topic} topic bucket.`,
|
||||||
|
matcher: (record) => normalize(record.topic) === target,
|
||||||
|
};
|
||||||
|
};
|
||||||
|
|
||||||
|
const buildDateBucketSpec = (date: string): CorpusExplorerSpec => ({
|
||||||
|
title: `Date Bucket: ${date}`,
|
||||||
|
description: `Records from the ${date} activity bucket.`,
|
||||||
|
emptyMessage: `No records found on ${date}.`,
|
||||||
|
matcher: (record) => getDateBucket(record) === date,
|
||||||
|
});
|
||||||
|
|
||||||
|
const buildWordSpec = (word: string): CorpusExplorerSpec => ({
|
||||||
|
title: `Word: ${word}`,
|
||||||
|
description: `Records containing the word ${word}.`,
|
||||||
|
emptyMessage: `No records mention ${word}.`,
|
||||||
|
matcher: (record) => matchesPhrase(record, word),
|
||||||
|
});
|
||||||
|
|
||||||
|
const buildNgramSpec = (ngram: string): CorpusExplorerSpec => ({
|
||||||
|
title: `N-gram: ${ngram}`,
|
||||||
|
description: `Records containing the phrase ${ngram}.`,
|
||||||
|
emptyMessage: `No records contain the phrase ${ngram}.`,
|
||||||
|
matcher: (record) => matchesPhrase(record, ngram),
|
||||||
|
});
|
||||||
|
|
||||||
|
const buildEntitySpec = (entity: string): CorpusExplorerSpec => {
|
||||||
|
const target = normalize(entity);
|
||||||
|
|
||||||
|
return {
|
||||||
|
title: `Entity: ${entity}`,
|
||||||
|
description: `Records mentioning the ${entity} entity.`,
|
||||||
|
emptyMessage: `No records found for the ${entity} entity.`,
|
||||||
|
matcher: (record) => {
|
||||||
|
const entities = Array.isArray(record.ner_entities) ? record.ner_entities : [];
|
||||||
|
return entities.some((item) => normalize(item?.text) === target) || matchesPhrase(record, entity);
|
||||||
|
},
|
||||||
|
};
|
||||||
|
};
|
||||||
|
|
||||||
|
const buildSourceSpec = (source: string): CorpusExplorerSpec => {
|
||||||
|
const target = normalize(source);
|
||||||
|
|
||||||
|
return {
|
||||||
|
title: `Source: ${source}`,
|
||||||
|
description: `Records from the ${source} source.`,
|
||||||
|
emptyMessage: `No records found for ${source}.`,
|
||||||
|
matcher: (record) => normalize(record.source) === target,
|
||||||
|
};
|
||||||
|
};
|
||||||
|
|
||||||
|
const buildDominantEmotionSpec = (emotion: string): CorpusExplorerSpec => {
|
||||||
|
const target = normalize(emotion);
|
||||||
|
|
||||||
|
return {
|
||||||
|
title: `Dominant Emotion: ${emotion}`,
|
||||||
|
description: `Records where ${emotion} is the strongest emotion score.`,
|
||||||
|
emptyMessage: `No records found with dominant emotion ${emotion}.`,
|
||||||
|
matcher: (record) => getDominantEmotion(record) === target,
|
||||||
|
};
|
||||||
|
};
|
||||||
|
|
||||||
|
const buildReplyPairSpec = (source: string, target: string): CorpusExplorerSpec => {
|
||||||
|
const sourceName = normalize(source);
|
||||||
|
const targetName = normalize(target);
|
||||||
|
|
||||||
|
return {
|
||||||
|
title: `Reply Path: ${source} -> ${target}`,
|
||||||
|
description: `Reply records authored by ${source} in response to ${target}.`,
|
||||||
|
emptyMessage: `No reply records found for ${source} -> ${target}.`,
|
||||||
|
matcher: (record, context) => {
|
||||||
|
if (normalize(record.author) !== sourceName) {
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
|
||||||
|
const replyTo = record.reply_to;
|
||||||
|
if (replyTo === null || replyTo === undefined || replyTo === "") {
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
|
||||||
|
return normalize(context.authorByPostId.get(String(replyTo))) === targetName;
|
||||||
|
},
|
||||||
|
};
|
||||||
|
};
|
||||||
|
|
||||||
|
const buildOneTimeUsersSpec = (): CorpusExplorerSpec => ({
|
||||||
|
title: "One-Time Users",
|
||||||
|
description: "Records written by authors who appear exactly once in the filtered corpus.",
|
||||||
|
emptyMessage: "No one-time-user records found.",
|
||||||
|
matcher: (record, context) => {
|
||||||
|
const author = getAuthor(record);
|
||||||
|
return !!author && context.authorEventCounts.get(author) === 1;
|
||||||
|
},
|
||||||
|
});
|
||||||
|
|
||||||
|
const buildIdentityBucketSpec = (bucket: "in" | "out" | "tie"): CorpusExplorerSpec => {
|
||||||
|
const labels = {
|
||||||
|
in: "In-Group Posts",
|
||||||
|
out: "Out-Group Posts",
|
||||||
|
tie: "Balanced Posts",
|
||||||
|
} as const;
|
||||||
|
|
||||||
|
return {
|
||||||
|
title: labels[bucket],
|
||||||
|
description: `Records in the ${labels[bucket].toLowerCase()} cultural bucket.`,
|
||||||
|
emptyMessage: `No records found for ${labels[bucket].toLowerCase()}.`,
|
||||||
|
matcher: (record) => recordIdentityBucket(record) === bucket,
|
||||||
|
};
|
||||||
|
};
|
||||||
|
|
||||||
|
const buildPatternSpec = (
|
||||||
|
title: string,
|
||||||
|
description: string,
|
||||||
|
pattern: RegExp,
|
||||||
|
): CorpusExplorerSpec => ({
|
||||||
|
title,
|
||||||
|
description,
|
||||||
|
emptyMessage: `No records found for ${title.toLowerCase()}.`,
|
||||||
|
matcher: (record) => pattern.test(getRecordText(record)),
|
||||||
|
});
|
||||||
|
|
||||||
|
const buildHedgeSpec = () =>
|
||||||
|
buildPatternSpec("Hedging Words", "Records containing hedging language.", HEDGE_PATTERN);
|
||||||
|
|
||||||
|
const buildCertaintySpec = () =>
|
||||||
|
buildPatternSpec("Certainty Words", "Records containing certainty language.", CERTAINTY_PATTERN);
|
||||||
|
|
||||||
|
const buildDeonticSpec = () =>
|
||||||
|
buildPatternSpec("Need/Should Words", "Records containing deontic language.", DEONTIC_PATTERN);
|
||||||
|
|
||||||
|
const buildPermissionSpec = () =>
|
||||||
|
buildPatternSpec("Permission Words", "Records containing permission language.", PERMISSION_PATTERN);
|
||||||
|
|
||||||
|
export type { DatasetRecord, CorpusExplorerSpec };
|
||||||
|
export {
|
||||||
|
buildAllRecordsSpec,
|
||||||
|
buildCertaintySpec,
|
||||||
|
buildDateBucketSpec,
|
||||||
|
buildDeonticSpec,
|
||||||
|
buildDominantEmotionSpec,
|
||||||
|
buildEntitySpec,
|
||||||
|
buildExplorerContext,
|
||||||
|
buildHedgeSpec,
|
||||||
|
buildIdentityBucketSpec,
|
||||||
|
buildNgramSpec,
|
||||||
|
buildOneTimeUsersSpec,
|
||||||
|
buildPermissionSpec,
|
||||||
|
buildReplyPairSpec,
|
||||||
|
buildSourceSpec,
|
||||||
|
buildTopicSpec,
|
||||||
|
buildUserSpec,
|
||||||
|
buildWordSpec,
|
||||||
|
getDateBucket,
|
||||||
|
toText,
|
||||||
|
};
|
||||||
@@ -3,7 +3,7 @@ const DEFAULT_TITLE = "Ethnograph View";
|
|||||||
const STATIC_TITLES: Record<string, string> = {
|
const STATIC_TITLES: Record<string, string> = {
|
||||||
"/login": "Sign In",
|
"/login": "Sign In",
|
||||||
"/upload": "Upload Dataset",
|
"/upload": "Upload Dataset",
|
||||||
"/auto-scrape": "Auto Scrape Dataset",
|
"/auto-fetch": "Auto Fetch Dataset",
|
||||||
"/datasets": "My Datasets",
|
"/datasets": "My Datasets",
|
||||||
};
|
};
|
||||||
|
|
||||||
@@ -13,7 +13,7 @@ export const getDocumentTitle = (pathname: string) => {
|
|||||||
}
|
}
|
||||||
|
|
||||||
if (pathname.includes("stats")) {
|
if (pathname.includes("stats")) {
|
||||||
return "Ethnography Analysis"
|
return "Ethnography Analysis";
|
||||||
}
|
}
|
||||||
|
|
||||||
return STATIC_TITLES[pathname] ?? DEFAULT_TITLE;
|
return STATIC_TITLES[pathname] ?? DEFAULT_TITLE;
|
||||||
|
|||||||
4
main.py
@@ -1,4 +0,0 @@
|
|||||||
import server.app
|
|
||||||
|
|
||||||
if __name__ == "__main__":
|
|
||||||
server.app.app.run(debug=True)
|
|
||||||
BIN
report/img/analysis_bar.png
Normal file
|
After Width: | Height: | Size: 26 KiB |
BIN
report/img/architecture.png
Normal file
|
After Width: | Height: | Size: 70 KiB |
BIN
report/img/cork_temporal.png
Normal file
|
After Width: | Height: | Size: 274 KiB |
BIN
report/img/flooding_posts.png
Normal file
|
After Width: | Height: | Size: 90 KiB |
BIN
report/img/frontend.png
Normal file
|
After Width: | Height: | Size: 302 KiB |
BIN
report/img/gantt.png
Normal file
|
After Width: | Height: | Size: 50 KiB |
BIN
report/img/heatmap.png
Normal file
|
After Width: | Height: | Size: 86 KiB |
BIN
report/img/interaction_graph.png
Normal file
|
After Width: | Height: | Size: 114 KiB |
BIN
report/img/kpi_card.png
Normal file
|
After Width: | Height: | Size: 8.7 KiB |
BIN
report/img/moods.png
Normal file
|
After Width: | Height: | Size: 16 KiB |
BIN
report/img/navbar.png
Normal file
|
After Width: | Height: | Size: 14 KiB |
BIN
report/img/ngrams.png
Normal file
|
After Width: | Height: | Size: 38 KiB |
BIN
report/img/nlp_backoff.png
Normal file
|
After Width: | Height: | Size: 143 KiB |
BIN
report/img/pipeline.png
Normal file
|
After Width: | Height: | Size: 26 KiB |
BIN
report/img/reddit_bot.png
Normal file
|
After Width: | Height: | Size: 232 KiB |
BIN
report/img/schema.png
Normal file
|
After Width: | Height: | Size: 64 KiB |
BIN
report/img/signature.jpg
Normal file
|
After Width: | Height: | Size: 152 KiB |
BIN
report/img/stance_markers.png
Normal file
|
After Width: | Height: | Size: 111 KiB |
BIN
report/img/topic_emotions.png
Normal file
|
After Width: | Height: | Size: 17 KiB |
BIN
report/img/ucc_crest.png
Normal file
|
After Width: | Height: | Size: 27 KiB |
1401
report/main.tex
Normal file
149
report/references.bib
Normal file
@@ -0,0 +1,149 @@
|
|||||||
|
@online{reddit_api,
|
||||||
|
author = {{Reddit Inc.}},
|
||||||
|
title = {Reddit API Documentation},
|
||||||
|
year = {2025},
|
||||||
|
url = {https://www.reddit.com/dev/api/},
|
||||||
|
urldate = {2026-04-08}
|
||||||
|
}
|
||||||
|
|
||||||
|
@misc{hartmann2022emotionenglish,
|
||||||
|
author={Hartmann, Jochen},
|
||||||
|
title={Emotion English DistilRoBERTa-base},
|
||||||
|
year={2022},
|
||||||
|
howpublished = {\url{https://huggingface.co/j-hartmann/emotion-english-distilroberta-base/}},
|
||||||
|
}
|
||||||
|
|
||||||
|
@misc{all_mpnet_base_v2,
|
||||||
|
author={Microsoft Research},
|
||||||
|
title={All-MPNet-Base-V2},
|
||||||
|
year={2021},
|
||||||
|
howpublished = {\url{https://huggingface.co/sentence-transformers/all-mpnet-base-v2}},
|
||||||
|
}
|
||||||
|
|
||||||
|
@misc{minilm_l6_v2,
|
||||||
|
author={Microsoft Research},
|
||||||
|
title={MiniLM-L6-V2},
|
||||||
|
year={2021},
|
||||||
|
howpublished = {\url{https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2}},
|
||||||
|
}
|
||||||
|
|
||||||
|
@misc{dslim_bert_base_ner,
|
||||||
|
author={deepset},
|
||||||
|
title={dslim/bert-base-NER},
|
||||||
|
year={2018},
|
||||||
|
howpublished = {\url{https://huggingface.co/dslim/bert-base-NER}},
|
||||||
|
}
|
||||||
|
|
||||||
|
@inproceedings{demszky2020goemotions,
|
||||||
|
author = {Demszky, Dorottya and Movshovitz-Attias, Dana and Ko, Jeongwoo and Cowen, Alan and Nemade, Gaurav and Ravi, Sujith},
|
||||||
|
booktitle = {58th Annual Meeting of the Association for Computational Linguistics (ACL)},
|
||||||
|
title = {{GoEmotions: A Dataset of Fine-Grained Emotions}},
|
||||||
|
year = {2020}
|
||||||
|
}
|
||||||
|
|
||||||
|
@article{dominguez2007virtual,
|
||||||
|
author = {Domínguez, Daniel and Beaulieu, Anne and Estalella, Adolfo and Gómez, Edgar and Schnettler, Bernt and Read, Rosie},
|
||||||
|
title = {Virtual Ethnography},
|
||||||
|
journal = {Forum Qualitative Sozialforschung / Forum: Qualitative Social Research},
|
||||||
|
year = {2007},
|
||||||
|
volume = {8},
|
||||||
|
number = {3},
|
||||||
|
url = {http://nbn-resolving.de/urn:nbn:de:0114-fqs0703E19}
|
||||||
|
}
|
||||||
|
|
||||||
|
@article{sun2014lurkers,
|
||||||
|
author = {Sun, Na and Rau, Pei-Luen Patrick and Ma, Liang},
|
||||||
|
title = {Understanding Lurkers in Online Communities: A Literature Review},
|
||||||
|
journal = {Computers in Human Behavior},
|
||||||
|
year = {2014},
|
||||||
|
volume = {38},
|
||||||
|
pages = {110--117},
|
||||||
|
doi = {10.1016/j.chb.2014.05.022}
|
||||||
|
}
|
||||||
|
|
||||||
|
@article{ahmad2024sentiment,
|
||||||
|
author = {Ahmad, Waqar and others},
|
||||||
|
title = {Recent Advancements and Challenges of NLP-based Sentiment Analysis: A State-of-the-art Review},
|
||||||
|
journal = {Natural Language Processing Journal},
|
||||||
|
year = {2024},
|
||||||
|
doi = {10.1016/j.nlp.2024.100059}
|
||||||
|
}
|
||||||
|
|
||||||
|
@article{coleman2010ethnographic,
|
||||||
|
ISSN = {00846570},
|
||||||
|
URL = {http://www.jstor.org/stable/25735124},
|
||||||
|
abstract = {This review surveys and divides the ethnographic corpus on digital media into three broad but overlapping categories: the cultural politics of digital media, the vernacular cultures of digital media, and the prosaics of digital media. Engaging these three categories of scholarship on digital media, I consider how ethnographers are exploring the complex relationships between the local practices and global implications of digital media, their materiality and politics, and thier banal, as well as profound, presence in cultural life and modes of communication. I consider the way these media have become central to the articulation of cherished beliefs, ritual practices, and modes of being in the world; the fact that digital media culturally matters is undeniable but showing how, where, and why it matters is necessary to push against peculiarly narrow presumptions about the universality of digital experience.},
|
||||||
|
author = {E. Gabriella Coleman},
|
||||||
|
journal = {Annual Review of Anthropology},
|
||||||
|
pages = {487--505},
|
||||||
|
publisher = {Annual Reviews},
|
||||||
|
title = {Ethnographic Approaches to Digital Media},
|
||||||
|
urldate = {2026-04-15},
|
||||||
|
volume = {39},
|
||||||
|
year = {2010}
|
||||||
|
}
|
||||||
|
|
||||||
|
@article{shen2021stance,
|
||||||
|
author = {Shen, Qian and Tao, Yating},
|
||||||
|
title = {Stance Markers in {English} Medical Research Articles and Newspaper Opinion Columns: A Comparative Corpus-Based Study},
|
||||||
|
journal = {PLOS ONE},
|
||||||
|
volume = {16},
|
||||||
|
number = {3},
|
||||||
|
pages = {e0247981},
|
||||||
|
year = {2021},
|
||||||
|
doi = {10.1371/journal.pone.0247981}
|
||||||
|
}
|
||||||
|
|
||||||
|
@incollection{medvedev2019anatomy,
|
||||||
|
author = {Medvedev, Alexey N. and Lambiotte, Renaud and Delvenne, Jean-Charles},
|
||||||
|
title = {The Anatomy of Reddit: An Overview of Academic Research},
|
||||||
|
booktitle = {Dynamics On and Of Complex Networks III},
|
||||||
|
series = {Springer Proceedings in Complexity},
|
||||||
|
publisher = {Springer},
|
||||||
|
year = {2019},
|
||||||
|
pages = {183--204}
|
||||||
|
}
|
||||||
|
|
||||||
|
@misc{cook2023ethnography,
|
||||||
|
author = {Cook, Chloe},
|
||||||
|
title = {What is the Difference Between Ethnography and Digital Ethnography?},
|
||||||
|
year = {2023},
|
||||||
|
month = jan,
|
||||||
|
day = {19},
|
||||||
|
howpublished = {\url{https://ethosapp.com/blog/what-is-the-difference-between-ethnography-and-digital-ethnography/}},
|
||||||
|
note = {Accessed: 2026-04-16},
|
||||||
|
organization = {EthOS}
|
||||||
|
}
|
||||||
|
|
||||||
|
@misc{giuffre2026sentiment,
|
||||||
|
author = {Giuffre, Steven},
|
||||||
|
title = {What is Sentiment Analysis?},
|
||||||
|
year = {2026},
|
||||||
|
month = mar,
|
||||||
|
howpublished = {\url{https://www.vonage.com/resources/articles/sentiment-analysis/}},
|
||||||
|
note = {Accessed: 2026-04-16},
|
||||||
|
organization = {Vonage}
|
||||||
|
}
|
||||||
|
|
||||||
|
@misc{mungalpara2022stemming,
|
||||||
|
author = {Mungalpara, Jaimin},
|
||||||
|
title = {Stemming Lemmatization Stopwords and {N}-Grams in {NLP}},
|
||||||
|
year = {2022},
|
||||||
|
month = jul,
|
||||||
|
day = {26},
|
||||||
|
howpublished = {\url{https://jaimin-ml2001.medium.com/stemming-lemmatization-stopwords-and-n-grams-in-nlp-96f8e8b6aa6f}},
|
||||||
|
note = {Accessed: 2026-04-16},
|
||||||
|
organization = {Medium}
|
||||||
|
}
|
||||||
|
|
||||||
|
@misc{chugani2025ethicalscraping,
|
||||||
|
author = {Chugani, Vinod},
|
||||||
|
title = {Ethical Web Scraping: Principles and Practices},
|
||||||
|
year = {2025},
|
||||||
|
month = apr,
|
||||||
|
day = {21},
|
||||||
|
howpublished = {\url{https://www.datacamp.com/blog/ethical-web-scraping}},
|
||||||
|
note = {Accessed: 2026-04-16},
|
||||||
|
organization = {DataCamp}
|
||||||
|
}
|
||||||
|
|
||||||
@@ -16,3 +16,4 @@ Requests==2.32.5
|
|||||||
sentence_transformers==5.2.2
|
sentence_transformers==5.2.2
|
||||||
torch==2.10.0
|
torch==2.10.0
|
||||||
transformers==5.1.0
|
transformers==5.1.0
|
||||||
|
gunicorn==25.3.0
|
||||||
|
|||||||
@@ -15,7 +15,8 @@ class CulturalAnalysis:
|
|||||||
|
|
||||||
emotion_exclusions = {"emotion_neutral", "emotion_surprise"}
|
emotion_exclusions = {"emotion_neutral", "emotion_surprise"}
|
||||||
emotion_cols = [
|
emotion_cols = [
|
||||||
c for c in df.columns
|
c
|
||||||
|
for c in df.columns
|
||||||
if c.startswith("emotion_") and c not in emotion_exclusions
|
if c.startswith("emotion_") and c not in emotion_exclusions
|
||||||
]
|
]
|
||||||
|
|
||||||
@@ -40,7 +41,6 @@ class CulturalAnalysis:
|
|||||||
"out_group_usage": out_count,
|
"out_group_usage": out_count,
|
||||||
"in_group_ratio": round(in_count / max(total_tokens, 1), 5),
|
"in_group_ratio": round(in_count / max(total_tokens, 1), 5),
|
||||||
"out_group_ratio": round(out_count / max(total_tokens, 1), 5),
|
"out_group_ratio": round(out_count / max(total_tokens, 1), 5),
|
||||||
|
|
||||||
"in_group_posts": int(in_mask.sum()),
|
"in_group_posts": int(in_mask.sum()),
|
||||||
"out_group_posts": int(out_mask.sum()),
|
"out_group_posts": int(out_mask.sum()),
|
||||||
"tie_posts": int(tie_mask.sum()),
|
"tie_posts": int(tie_mask.sum()),
|
||||||
@@ -49,20 +49,40 @@ class CulturalAnalysis:
|
|||||||
if emotion_cols:
|
if emotion_cols:
|
||||||
emo = df[emotion_cols].apply(pd.to_numeric, errors="coerce").fillna(0.0)
|
emo = df[emotion_cols].apply(pd.to_numeric, errors="coerce").fillna(0.0)
|
||||||
|
|
||||||
in_avg = emo.loc[in_mask].mean() if in_mask.any() else pd.Series(0.0, index=emotion_cols)
|
in_avg = (
|
||||||
out_avg = emo.loc[out_mask].mean() if out_mask.any() else pd.Series(0.0, index=emotion_cols)
|
emo.loc[in_mask].mean()
|
||||||
|
if in_mask.any()
|
||||||
|
else pd.Series(0.0, index=emotion_cols)
|
||||||
|
)
|
||||||
|
out_avg = (
|
||||||
|
emo.loc[out_mask].mean()
|
||||||
|
if out_mask.any()
|
||||||
|
else pd.Series(0.0, index=emotion_cols)
|
||||||
|
)
|
||||||
|
|
||||||
result["in_group_emotion_avg"] = in_avg.to_dict()
|
result["in_group_emotion_avg"] = in_avg.to_dict()
|
||||||
result["out_group_emotion_avg"] = out_avg.to_dict()
|
result["out_group_emotion_avg"] = out_avg.to_dict()
|
||||||
|
|
||||||
return result
|
return result
|
||||||
|
|
||||||
def get_stance_markers(self, df: pd.DataFrame) -> dict[str, Any]:
|
def get_stance_markers(self, df: pd.DataFrame) -> dict[str, Any]:
|
||||||
s = df[self.content_col].fillna("").astype(str)
|
s = df[self.content_col].fillna("").astype(str)
|
||||||
|
emotion_exclusions = {"emotion_neutral", "emotion_surprise"}
|
||||||
|
emotion_cols = [
|
||||||
|
c
|
||||||
|
for c in df.columns
|
||||||
|
if c.startswith("emotion_") and c not in emotion_exclusions
|
||||||
|
]
|
||||||
|
|
||||||
hedge_pattern = re.compile(r"\b(maybe|perhaps|possibly|probably|likely|seems|seem|i think|i feel|i guess|kind of|sort of|somewhat)\b")
|
hedge_pattern = re.compile(
|
||||||
certainty_pattern = re.compile(r"\b(definitely|certainly|clearly|obviously|undeniably|always|never)\b")
|
r"\b(maybe|perhaps|possibly|probably|likely|seems|seem|i think|i feel|i guess|kind of|sort of|somewhat)\b"
|
||||||
deontic_pattern = re.compile(r"\b(must|should|need|needs|have to|has to|ought|required|require)\b")
|
)
|
||||||
|
certainty_pattern = re.compile(
|
||||||
|
r"\b(definitely|certainly|clearly|obviously|undeniably|always|never)\b"
|
||||||
|
)
|
||||||
|
deontic_pattern = re.compile(
|
||||||
|
r"\b(must|should|need|needs|have to|has to|ought|required|require)\b"
|
||||||
|
)
|
||||||
permission_pattern = re.compile(r"\b(can|allowed|okay|ok|permitted)\b")
|
permission_pattern = re.compile(r"\b(can|allowed|okay|ok|permitted)\b")
|
||||||
|
|
||||||
hedge_counts = s.str.count(hedge_pattern)
|
hedge_counts = s.str.count(hedge_pattern)
|
||||||
@@ -70,31 +90,73 @@ class CulturalAnalysis:
|
|||||||
deontic_counts = s.str.count(deontic_pattern)
|
deontic_counts = s.str.count(deontic_pattern)
|
||||||
perm_counts = s.str.count(permission_pattern)
|
perm_counts = s.str.count(permission_pattern)
|
||||||
|
|
||||||
token_counts = s.apply(lambda t: len(re.findall(r"\b[a-z]{2,}\b", t))).replace(0, 1)
|
token_counts = s.apply(lambda t: len(re.findall(r"\b[a-z]{2,}\b", t))).replace(
|
||||||
|
0, 1
|
||||||
|
)
|
||||||
|
|
||||||
return {
|
result = {
|
||||||
"hedge_total": int(hedge_counts.sum()),
|
"hedge_total": int(hedge_counts.sum()),
|
||||||
"certainty_total": int(certainty_counts.sum()),
|
"certainty_total": int(certainty_counts.sum()),
|
||||||
"deontic_total": int(deontic_counts.sum()),
|
"deontic_total": int(deontic_counts.sum()),
|
||||||
"permission_total": int(perm_counts.sum()),
|
"permission_total": int(perm_counts.sum()),
|
||||||
"hedge_per_1k_tokens": round(1000 * hedge_counts.sum() / token_counts.sum(), 3),
|
"hedge_per_1k_tokens": round(
|
||||||
"certainty_per_1k_tokens": round(1000 * certainty_counts.sum() / token_counts.sum(), 3),
|
1000 * hedge_counts.sum() / token_counts.sum(), 3
|
||||||
"deontic_per_1k_tokens": round(1000 * deontic_counts.sum() / token_counts.sum(), 3),
|
),
|
||||||
"permission_per_1k_tokens": round(1000 * perm_counts.sum() / token_counts.sum(), 3),
|
"certainty_per_1k_tokens": round(
|
||||||
|
1000 * certainty_counts.sum() / token_counts.sum(), 3
|
||||||
|
),
|
||||||
|
"deontic_per_1k_tokens": round(
|
||||||
|
1000 * deontic_counts.sum() / token_counts.sum(), 3
|
||||||
|
),
|
||||||
|
"permission_per_1k_tokens": round(
|
||||||
|
1000 * perm_counts.sum() / token_counts.sum(), 3
|
||||||
|
),
|
||||||
}
|
}
|
||||||
|
|
||||||
def get_avg_emotions_per_entity(self, df: pd.DataFrame, top_n: int = 25, min_posts: int = 10) -> dict[str, Any]:
|
if emotion_cols:
|
||||||
if "entities" not in df.columns:
|
emo = df[emotion_cols].apply(pd.to_numeric, errors="coerce").fillna(0.0)
|
||||||
|
|
||||||
|
result["hedge_emotion_avg"] = (
|
||||||
|
emo.loc[hedge_counts > 0].mean()
|
||||||
|
if (hedge_counts > 0).any()
|
||||||
|
else pd.Series(0.0, index=emotion_cols)
|
||||||
|
).to_dict()
|
||||||
|
result["certainty_emotion_avg"] = (
|
||||||
|
emo.loc[certainty_counts > 0].mean()
|
||||||
|
if (certainty_counts > 0).any()
|
||||||
|
else pd.Series(0.0, index=emotion_cols)
|
||||||
|
).to_dict()
|
||||||
|
result["deontic_emotion_avg"] = (
|
||||||
|
emo.loc[deontic_counts > 0].mean()
|
||||||
|
if (deontic_counts > 0).any()
|
||||||
|
else pd.Series(0.0, index=emotion_cols)
|
||||||
|
).to_dict()
|
||||||
|
result["permission_emotion_avg"] = (
|
||||||
|
emo.loc[perm_counts > 0].mean()
|
||||||
|
if (perm_counts > 0).any()
|
||||||
|
else pd.Series(0.0, index=emotion_cols)
|
||||||
|
).to_dict()
|
||||||
|
|
||||||
|
return result
|
||||||
|
|
||||||
|
def get_avg_emotions_per_entity(
|
||||||
|
self, df: pd.DataFrame, top_n: int = 25, min_posts: int = 10
|
||||||
|
) -> dict[str, Any]:
|
||||||
|
if "ner_entities" not in df.columns:
|
||||||
return {"entity_emotion_avg": {}}
|
return {"entity_emotion_avg": {}}
|
||||||
|
|
||||||
emotion_cols = [c for c in df.columns if c.startswith("emotion_")]
|
emotion_cols = [c for c in df.columns if c.startswith("emotion_")]
|
||||||
|
|
||||||
entity_df = df[["entities"] + emotion_cols].explode("entities")
|
entity_df = df[["ner_entities"] + emotion_cols].explode("ner_entities")
|
||||||
|
|
||||||
entity_df["entity_text"] = entity_df["entities"].apply(
|
entity_df["entity_text"] = entity_df["ner_entities"].apply(
|
||||||
lambda e: e.get("text").strip()
|
lambda e: (
|
||||||
if isinstance(e, dict) and isinstance(e.get("text"), str) and len(e.get("text")) >= 3
|
e.get("text").strip()
|
||||||
else None
|
if isinstance(e, dict)
|
||||||
|
and isinstance(e.get("text"), str)
|
||||||
|
and len(e.get("text")) >= 3
|
||||||
|
else None
|
||||||
|
)
|
||||||
)
|
)
|
||||||
|
|
||||||
entity_df = entity_df.dropna(subset=["entity_text"])
|
entity_df = entity_df.dropna(subset=["entity_text"])
|
||||||
@@ -114,4 +176,4 @@ class CulturalAnalysis:
|
|||||||
"emotion_avg": emo_means,
|
"emotion_avg": emo_means,
|
||||||
}
|
}
|
||||||
|
|
||||||
return {"entity_emotion_avg": entity_emotion_avg}
|
return {"entity_emotion_avg": entity_emotion_avg}
|
||||||
|
|||||||
@@ -2,6 +2,7 @@ import pandas as pd
|
|||||||
|
|
||||||
from server.analysis.nlp import NLP
|
from server.analysis.nlp import NLP
|
||||||
|
|
||||||
|
|
||||||
class DatasetEnrichment:
|
class DatasetEnrichment:
|
||||||
def __init__(self, df: pd.DataFrame, topics: dict):
|
def __init__(self, df: pd.DataFrame, topics: dict):
|
||||||
self.df = self._explode_comments(df)
|
self.df = self._explode_comments(df)
|
||||||
@@ -10,7 +11,9 @@ class DatasetEnrichment:
|
|||||||
|
|
||||||
def _explode_comments(self, df) -> pd.DataFrame:
|
def _explode_comments(self, df) -> pd.DataFrame:
|
||||||
comments_df = df[["id", "comments"]].explode("comments")
|
comments_df = df[["id", "comments"]].explode("comments")
|
||||||
comments_df = comments_df[comments_df["comments"].apply(lambda x: isinstance(x, dict))]
|
comments_df = comments_df[
|
||||||
|
comments_df["comments"].apply(lambda x: isinstance(x, dict))
|
||||||
|
]
|
||||||
comments_df = pd.json_normalize(comments_df["comments"])
|
comments_df = pd.json_normalize(comments_df["comments"])
|
||||||
|
|
||||||
posts_df = df.drop(columns=["comments"])
|
posts_df = df.drop(columns=["comments"])
|
||||||
@@ -24,16 +27,16 @@ class DatasetEnrichment:
|
|||||||
df.drop(columns=["post_id"], inplace=True, errors="ignore")
|
df.drop(columns=["post_id"], inplace=True, errors="ignore")
|
||||||
|
|
||||||
return df
|
return df
|
||||||
|
|
||||||
def enrich(self) -> pd.DataFrame:
|
def enrich(self) -> pd.DataFrame:
|
||||||
self.df['timestamp'] = pd.to_numeric(self.df['timestamp'], errors='raise')
|
self.df["timestamp"] = pd.to_numeric(self.df["timestamp"], errors="raise")
|
||||||
self.df['date'] = pd.to_datetime(self.df['timestamp'], unit='s').dt.date
|
self.df["date"] = pd.to_datetime(self.df["timestamp"], unit="s").dt.date
|
||||||
self.df["dt"] = pd.to_datetime(self.df["timestamp"], unit="s", utc=True)
|
self.df["dt"] = pd.to_datetime(self.df["timestamp"], unit="s", utc=True)
|
||||||
self.df["hour"] = self.df["dt"].dt.hour
|
self.df["hour"] = self.df["dt"].dt.hour
|
||||||
self.df["weekday"] = self.df["dt"].dt.day_name()
|
self.df["weekday"] = self.df["dt"].dt.day_name()
|
||||||
|
|
||||||
self.nlp.add_emotion_cols()
|
self.nlp.add_emotion_cols()
|
||||||
self.nlp.add_topic_col()
|
self.nlp.add_topic_col()
|
||||||
self.nlp.add_ner_cols()
|
self.nlp.add_ner_cols()
|
||||||
|
|
||||||
return self.df
|
return self.df
|
||||||
|
|||||||
@@ -1,6 +1,7 @@
|
|||||||
import pandas as pd
|
import pandas as pd
|
||||||
import re
|
import re
|
||||||
|
|
||||||
|
|
||||||
class InteractionAnalysis:
|
class InteractionAnalysis:
|
||||||
def __init__(self, word_exclusions: set[str]):
|
def __init__(self, word_exclusions: set[str]):
|
||||||
self.word_exclusions = word_exclusions
|
self.word_exclusions = word_exclusions
|
||||||
@@ -30,28 +31,6 @@ class InteractionAnalysis:
|
|||||||
|
|
||||||
return interactions
|
return interactions
|
||||||
|
|
||||||
def average_thread_depth(self, df: pd.DataFrame):
|
|
||||||
depths = []
|
|
||||||
id_to_reply = df.set_index("id")["reply_to"].to_dict()
|
|
||||||
for _, row in df.iterrows():
|
|
||||||
depth = 0
|
|
||||||
current_id = row["id"]
|
|
||||||
|
|
||||||
while True:
|
|
||||||
reply_to = id_to_reply.get(current_id)
|
|
||||||
if pd.isna(reply_to) or reply_to == "":
|
|
||||||
break
|
|
||||||
|
|
||||||
depth += 1
|
|
||||||
current_id = reply_to
|
|
||||||
|
|
||||||
depths.append(depth)
|
|
||||||
|
|
||||||
if not depths:
|
|
||||||
return 0
|
|
||||||
|
|
||||||
return round(sum(depths) / len(depths), 2)
|
|
||||||
|
|
||||||
def top_interaction_pairs(self, df: pd.DataFrame, top_n=10):
|
def top_interaction_pairs(self, df: pd.DataFrame, top_n=10):
|
||||||
graph = self.interaction_graph(df)
|
graph = self.interaction_graph(df)
|
||||||
pairs = []
|
pairs = []
|
||||||
@@ -62,7 +41,7 @@ class InteractionAnalysis:
|
|||||||
|
|
||||||
pairs.sort(key=lambda x: x[1], reverse=True)
|
pairs.sort(key=lambda x: x[1], reverse=True)
|
||||||
return pairs[:top_n]
|
return pairs[:top_n]
|
||||||
|
|
||||||
def conversation_concentration(self, df: pd.DataFrame) -> dict:
|
def conversation_concentration(self, df: pd.DataFrame) -> dict:
|
||||||
if "type" not in df.columns:
|
if "type" not in df.columns:
|
||||||
return {}
|
return {}
|
||||||
@@ -76,12 +55,16 @@ class InteractionAnalysis:
|
|||||||
total_authors = len(author_counts)
|
total_authors = len(author_counts)
|
||||||
|
|
||||||
top_10_pct_n = max(1, int(total_authors * 0.1))
|
top_10_pct_n = max(1, int(total_authors * 0.1))
|
||||||
top_10_pct_share = round(author_counts.head(top_10_pct_n).sum() / total_comments, 4)
|
top_10_pct_share = round(
|
||||||
|
author_counts.head(top_10_pct_n).sum() / total_comments, 4
|
||||||
|
)
|
||||||
|
|
||||||
return {
|
return {
|
||||||
"total_commenting_authors": total_authors,
|
"total_commenting_authors": total_authors,
|
||||||
"top_10pct_author_count": top_10_pct_n,
|
"top_10pct_author_count": top_10_pct_n,
|
||||||
"top_10pct_comment_share": float(top_10_pct_share),
|
"top_10pct_comment_share": float(top_10_pct_share),
|
||||||
"single_comment_authors": int((author_counts == 1).sum()),
|
"single_comment_authors": int((author_counts == 1).sum()),
|
||||||
"single_comment_author_ratio": float(round((author_counts == 1).sum() / total_authors, 4)),
|
"single_comment_author_ratio": float(
|
||||||
}
|
round((author_counts == 1).sum() / total_authors, 4)
|
||||||
|
),
|
||||||
|
}
|
||||||
|
|||||||
@@ -1,17 +1,30 @@
|
|||||||
import pandas as pd
|
|
||||||
import re
|
import re
|
||||||
|
|
||||||
from collections import Counter
|
from collections import Counter
|
||||||
from itertools import islice
|
from dataclasses import dataclass
|
||||||
|
|
||||||
|
import pandas as pd
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass(frozen=True)
|
||||||
|
class NGramConfig:
|
||||||
|
min_token_length: int = 3
|
||||||
|
min_count: int = 2
|
||||||
|
max_results: int = 100
|
||||||
|
|
||||||
|
|
||||||
class LinguisticAnalysis:
|
class LinguisticAnalysis:
|
||||||
def __init__(self, word_exclusions: set[str]):
|
def __init__(self, word_exclusions: set[str]):
|
||||||
self.word_exclusions = word_exclusions
|
self.word_exclusions = word_exclusions
|
||||||
|
self.ngram_config = NGramConfig()
|
||||||
|
|
||||||
def _tokenize(self, text: str):
|
def _tokenize(self, text: str, *, include_exclusions: bool = False) -> list[str]:
|
||||||
tokens = re.findall(r"\b[a-z]{3,}\b", text)
|
pattern = rf"\b[a-z]{{{self.ngram_config.min_token_length},}}\b"
|
||||||
return [t for t in tokens if t not in self.word_exclusions]
|
tokens = re.findall(pattern, text)
|
||||||
|
|
||||||
|
if include_exclusions:
|
||||||
|
return tokens
|
||||||
|
|
||||||
|
return [token for token in tokens if token not in self.word_exclusions]
|
||||||
|
|
||||||
def _clean_text(self, text: str) -> str:
|
def _clean_text(self, text: str) -> str:
|
||||||
text = re.sub(r"http\S+", "", text) # remove URLs
|
text = re.sub(r"http\S+", "", text) # remove URLs
|
||||||
@@ -21,13 +34,24 @@ class LinguisticAnalysis:
|
|||||||
text = re.sub(r"\S+\.(jpg|jpeg|png|webp|gif)", "", text)
|
text = re.sub(r"\S+\.(jpg|jpeg|png|webp|gif)", "", text)
|
||||||
return text
|
return text
|
||||||
|
|
||||||
|
def _content_texts(self, df: pd.DataFrame) -> pd.Series:
|
||||||
|
return df["content"].dropna().astype(str).apply(self._clean_text).str.lower()
|
||||||
|
|
||||||
|
def _valid_ngram(self, tokens: tuple[str, ...]) -> bool:
|
||||||
|
if any(token in self.word_exclusions for token in tokens):
|
||||||
|
return False
|
||||||
|
|
||||||
|
if len(set(tokens)) == 1:
|
||||||
|
return False
|
||||||
|
|
||||||
|
return True
|
||||||
|
|
||||||
def word_frequencies(self, df: pd.DataFrame, limit: int = 100) -> list[dict]:
|
def word_frequencies(self, df: pd.DataFrame, limit: int = 100) -> list[dict]:
|
||||||
texts = df["content"].dropna().astype(str).str.lower()
|
texts = self._content_texts(df)
|
||||||
|
|
||||||
words = []
|
words = []
|
||||||
for text in texts:
|
for text in texts:
|
||||||
tokens = re.findall(r"\b[a-z]{3,}\b", text)
|
words.extend(self._tokenize(text))
|
||||||
words.extend(w for w in tokens if w not in self.word_exclusions)
|
|
||||||
|
|
||||||
counts = Counter(words)
|
counts = Counter(words)
|
||||||
|
|
||||||
@@ -40,31 +64,48 @@ class LinguisticAnalysis:
|
|||||||
|
|
||||||
return word_frequencies.to_dict(orient="records")
|
return word_frequencies.to_dict(orient="records")
|
||||||
|
|
||||||
def ngrams(self, df: pd.DataFrame, n=2, limit=100):
|
def ngrams(self, df: pd.DataFrame, n: int = 2, limit: int | None = None) -> list[dict]:
|
||||||
texts = df["content"].dropna().astype(str).apply(self._clean_text).str.lower()
|
if n < 2:
|
||||||
|
raise ValueError("n must be at least 2")
|
||||||
|
|
||||||
|
texts = self._content_texts(df)
|
||||||
all_ngrams = []
|
all_ngrams = []
|
||||||
|
result_limit = limit or self.ngram_config.max_results
|
||||||
|
|
||||||
for text in texts:
|
for text in texts:
|
||||||
tokens = re.findall(r"\b[a-z]{3,}\b", text)
|
tokens = self._tokenize(text, include_exclusions=True)
|
||||||
|
|
||||||
# stop word removal causes strange behaviors in ngrams
|
if len(tokens) < n:
|
||||||
# tokens = [w for w in tokens if w not in self.word_exclusions]
|
continue
|
||||||
|
|
||||||
ngrams = zip(*(islice(tokens, i, None) for i in range(n)))
|
for index in range(len(tokens) - n + 1):
|
||||||
all_ngrams.extend([" ".join(ng) for ng in ngrams])
|
ngram_tokens = tuple(tokens[index : index + n])
|
||||||
|
if self._valid_ngram(ngram_tokens):
|
||||||
|
all_ngrams.append(" ".join(ngram_tokens))
|
||||||
|
|
||||||
counts = Counter(all_ngrams)
|
counts = Counter(all_ngrams)
|
||||||
|
filtered_counts = [
|
||||||
|
(ngram, count)
|
||||||
|
for ngram, count in counts.items()
|
||||||
|
if count >= self.ngram_config.min_count
|
||||||
|
]
|
||||||
|
|
||||||
|
if not filtered_counts:
|
||||||
|
return []
|
||||||
|
|
||||||
return (
|
return (
|
||||||
pd.DataFrame(counts.items(), columns=["ngram", "count"])
|
pd.DataFrame(filtered_counts, columns=["ngram", "count"])
|
||||||
.sort_values("count", ascending=False)
|
.sort_values(["count", "ngram"], ascending=[False, True])
|
||||||
.head(limit)
|
.head(result_limit)
|
||||||
.to_dict(orient="records")
|
.to_dict(orient="records")
|
||||||
)
|
)
|
||||||
|
|
||||||
def lexical_diversity(self, df: pd.DataFrame) -> dict:
|
def lexical_diversity(self, df: pd.DataFrame) -> dict:
|
||||||
tokens = (
|
tokens = (
|
||||||
df["content"].fillna("").astype(str).str.lower()
|
df["content"]
|
||||||
|
.fillna("")
|
||||||
|
.astype(str)
|
||||||
|
.str.lower()
|
||||||
.str.findall(r"\b[a-z]{2,}\b")
|
.str.findall(r"\b[a-z]{2,}\b")
|
||||||
.explode()
|
.explode()
|
||||||
)
|
)
|
||||||
|
|||||||
@@ -6,6 +6,7 @@ from typing import Any
|
|||||||
from transformers import pipeline
|
from transformers import pipeline
|
||||||
from sentence_transformers import SentenceTransformer
|
from sentence_transformers import SentenceTransformer
|
||||||
|
|
||||||
|
|
||||||
class NLP:
|
class NLP:
|
||||||
_topic_models: dict[str, SentenceTransformer] = {}
|
_topic_models: dict[str, SentenceTransformer] = {}
|
||||||
_emotion_classifiers: dict[str, Any] = {}
|
_emotion_classifiers: dict[str, Any] = {}
|
||||||
@@ -32,7 +33,7 @@ class NLP:
|
|||||||
)
|
)
|
||||||
self.entity_recognizer = self._get_entity_recognizer(
|
self.entity_recognizer = self._get_entity_recognizer(
|
||||||
self.device_str, self.pipeline_device
|
self.device_str, self.pipeline_device
|
||||||
)
|
)
|
||||||
except RuntimeError as exc:
|
except RuntimeError as exc:
|
||||||
if self.use_cuda and "out of memory" in str(exc).lower():
|
if self.use_cuda and "out of memory" in str(exc).lower():
|
||||||
torch.cuda.empty_cache()
|
torch.cuda.empty_cache()
|
||||||
@@ -90,7 +91,7 @@ class NLP:
|
|||||||
)
|
)
|
||||||
cls._emotion_classifiers[device_str] = classifier
|
cls._emotion_classifiers[device_str] = classifier
|
||||||
return classifier
|
return classifier
|
||||||
|
|
||||||
@classmethod
|
@classmethod
|
||||||
def _get_entity_recognizer(cls, device_str: str, pipeline_device: int) -> Any:
|
def _get_entity_recognizer(cls, device_str: str, pipeline_device: int) -> Any:
|
||||||
recognizer = cls._entity_recognizers.get(device_str)
|
recognizer = cls._entity_recognizers.get(device_str)
|
||||||
@@ -207,8 +208,7 @@ class NLP:
|
|||||||
self.df.drop(columns=existing_drop, inplace=True)
|
self.df.drop(columns=existing_drop, inplace=True)
|
||||||
|
|
||||||
remaining_emotion_cols = [
|
remaining_emotion_cols = [
|
||||||
c for c in self.df.columns
|
c for c in self.df.columns if c.startswith("emotion_")
|
||||||
if c.startswith("emotion_")
|
|
||||||
]
|
]
|
||||||
|
|
||||||
if remaining_emotion_cols:
|
if remaining_emotion_cols:
|
||||||
@@ -227,8 +227,6 @@ class NLP:
|
|||||||
|
|
||||||
self.df[remaining_emotion_cols] = normalized.values
|
self.df[remaining_emotion_cols] = normalized.values
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
def add_topic_col(self, confidence_threshold: float = 0.3) -> None:
|
def add_topic_col(self, confidence_threshold: float = 0.3) -> None:
|
||||||
titles = self.df[self.title_col].fillna("").astype(str)
|
titles = self.df[self.title_col].fillna("").astype(str)
|
||||||
contents = self.df[self.content_col].fillna("").astype(str)
|
contents = self.df[self.content_col].fillna("").astype(str)
|
||||||
@@ -257,7 +255,7 @@ class NLP:
|
|||||||
self.df.loc[self.df["topic_confidence"] < confidence_threshold, "topic"] = (
|
self.df.loc[self.df["topic_confidence"] < confidence_threshold, "topic"] = (
|
||||||
"Misc"
|
"Misc"
|
||||||
)
|
)
|
||||||
|
|
||||||
def add_ner_cols(self, max_chars: int = 512) -> None:
|
def add_ner_cols(self, max_chars: int = 512) -> None:
|
||||||
texts = (
|
texts = (
|
||||||
self.df[self.content_col]
|
self.df[self.content_col]
|
||||||
@@ -302,8 +300,4 @@ class NLP:
|
|||||||
|
|
||||||
for label in all_labels:
|
for label in all_labels:
|
||||||
col_name = f"entity_{label}"
|
col_name = f"entity_{label}"
|
||||||
self.df[col_name] = [
|
self.df[col_name] = [d.get(label, 0) for d in entity_count_dicts]
|
||||||
d.get(label, 0) for d in entity_count_dicts
|
|
||||||
]
|
|
||||||
|
|
||||||
|
|
||||||
|
|||||||
@@ -1,4 +1,5 @@
|
|||||||
import nltk
|
import nltk
|
||||||
|
import json
|
||||||
import pandas as pd
|
import pandas as pd
|
||||||
from nltk.corpus import stopwords
|
from nltk.corpus import stopwords
|
||||||
|
|
||||||
@@ -27,6 +28,8 @@ DOMAIN_STOPWORDS = {
|
|||||||
"one",
|
"one",
|
||||||
}
|
}
|
||||||
|
|
||||||
|
EXCLUDED_AUTHORS = {"[deleted]", "automoderator"}
|
||||||
|
|
||||||
nltk.download("stopwords")
|
nltk.download("stopwords")
|
||||||
EXCLUDE_WORDS = set(stopwords.words("english")) | DOMAIN_STOPWORDS
|
EXCLUDE_WORDS = set(stopwords.words("english")) | DOMAIN_STOPWORDS
|
||||||
|
|
||||||
@@ -46,6 +49,12 @@ class StatGen:
|
|||||||
filters = filters or {}
|
filters = filters or {}
|
||||||
filtered_df = df.copy()
|
filtered_df = df.copy()
|
||||||
|
|
||||||
|
if "author" in filtered_df.columns:
|
||||||
|
normalized_authors = (
|
||||||
|
filtered_df["author"].fillna("").astype(str).str.strip().str.lower()
|
||||||
|
)
|
||||||
|
filtered_df = filtered_df[~normalized_authors.isin(EXCLUDED_AUTHORS)]
|
||||||
|
|
||||||
search_query = filters.get("search_query", None)
|
search_query = filters.get("search_query", None)
|
||||||
start_date_filter = filters.get("start_date", None)
|
start_date_filter = filters.get("start_date", None)
|
||||||
end_date_filter = filters.get("end_date", None)
|
end_date_filter = filters.get("end_date", None)
|
||||||
@@ -75,11 +84,22 @@ class StatGen:
|
|||||||
|
|
||||||
return filtered_df
|
return filtered_df
|
||||||
|
|
||||||
|
def _json_ready_records(self, df: pd.DataFrame) -> list[dict]:
|
||||||
|
return json.loads(
|
||||||
|
df.to_json(orient="records", date_format="iso", date_unit="s")
|
||||||
|
)
|
||||||
|
|
||||||
## Public Methods
|
## Public Methods
|
||||||
def filter_dataset(self, df: pd.DataFrame, filters: dict | None = None) -> list[dict]:
|
def filter_dataset(self, df: pd.DataFrame, filters: dict | None = None) -> list[dict]:
|
||||||
return self._prepare_filtered_df(df, filters).to_dict(orient="records")
|
filtered_df = self._prepare_filtered_df(df, filters)
|
||||||
|
return self._json_ready_records(filtered_df)
|
||||||
|
|
||||||
def temporal(self, df: pd.DataFrame, filters: dict | None = None) -> dict:
|
def temporal(
|
||||||
|
self,
|
||||||
|
df: pd.DataFrame,
|
||||||
|
filters: dict | None = None,
|
||||||
|
dataset_id: int | None = None,
|
||||||
|
) -> dict:
|
||||||
filtered_df = self._prepare_filtered_df(df, filters)
|
filtered_df = self._prepare_filtered_df(df, filters)
|
||||||
|
|
||||||
return {
|
return {
|
||||||
@@ -87,7 +107,12 @@ class StatGen:
|
|||||||
"weekday_hour_heatmap": self.temporal_analysis.heatmap(filtered_df),
|
"weekday_hour_heatmap": self.temporal_analysis.heatmap(filtered_df),
|
||||||
}
|
}
|
||||||
|
|
||||||
def linguistic(self, df: pd.DataFrame, filters: dict | None = None) -> dict:
|
def linguistic(
|
||||||
|
self,
|
||||||
|
df: pd.DataFrame,
|
||||||
|
filters: dict | None = None,
|
||||||
|
dataset_id: int | None = None,
|
||||||
|
) -> dict:
|
||||||
filtered_df = self._prepare_filtered_df(df, filters)
|
filtered_df = self._prepare_filtered_df(df, filters)
|
||||||
|
|
||||||
return {
|
return {
|
||||||
@@ -97,7 +122,12 @@ class StatGen:
|
|||||||
"lexical_diversity": self.linguistic_analysis.lexical_diversity(filtered_df)
|
"lexical_diversity": self.linguistic_analysis.lexical_diversity(filtered_df)
|
||||||
}
|
}
|
||||||
|
|
||||||
def emotional(self, df: pd.DataFrame, filters: dict | None = None) -> dict:
|
def emotional(
|
||||||
|
self,
|
||||||
|
df: pd.DataFrame,
|
||||||
|
filters: dict | None = None,
|
||||||
|
dataset_id: int | None = None,
|
||||||
|
) -> dict:
|
||||||
filtered_df = self._prepare_filtered_df(df, filters)
|
filtered_df = self._prepare_filtered_df(df, filters)
|
||||||
|
|
||||||
return {
|
return {
|
||||||
@@ -107,7 +137,12 @@ class StatGen:
|
|||||||
"emotion_by_source": self.emotional_analysis.emotion_by_source(filtered_df)
|
"emotion_by_source": self.emotional_analysis.emotion_by_source(filtered_df)
|
||||||
}
|
}
|
||||||
|
|
||||||
def user(self, df: pd.DataFrame, filters: dict | None = None) -> dict:
|
def user(
|
||||||
|
self,
|
||||||
|
df: pd.DataFrame,
|
||||||
|
filters: dict | None = None,
|
||||||
|
dataset_id: int | None = None,
|
||||||
|
) -> dict:
|
||||||
filtered_df = self._prepare_filtered_df(df, filters)
|
filtered_df = self._prepare_filtered_df(df, filters)
|
||||||
|
|
||||||
return {
|
return {
|
||||||
@@ -115,17 +150,26 @@ class StatGen:
|
|||||||
"users": self.user_analysis.per_user_analysis(filtered_df)
|
"users": self.user_analysis.per_user_analysis(filtered_df)
|
||||||
}
|
}
|
||||||
|
|
||||||
def interactional(self, df: pd.DataFrame, filters: dict | None = None) -> dict:
|
def interactional(
|
||||||
|
self,
|
||||||
|
df: pd.DataFrame,
|
||||||
|
filters: dict | None = None,
|
||||||
|
dataset_id: int | None = None,
|
||||||
|
) -> dict:
|
||||||
filtered_df = self._prepare_filtered_df(df, filters)
|
filtered_df = self._prepare_filtered_df(df, filters)
|
||||||
|
|
||||||
return {
|
return {
|
||||||
"average_thread_depth": self.interaction_analysis.average_thread_depth(filtered_df),
|
|
||||||
"top_interaction_pairs": self.interaction_analysis.top_interaction_pairs(filtered_df, top_n=100),
|
"top_interaction_pairs": self.interaction_analysis.top_interaction_pairs(filtered_df, top_n=100),
|
||||||
"interaction_graph": self.interaction_analysis.interaction_graph(filtered_df),
|
"interaction_graph": self.interaction_analysis.interaction_graph(filtered_df),
|
||||||
"conversation_concentration": self.interaction_analysis.conversation_concentration(filtered_df)
|
"conversation_concentration": self.interaction_analysis.conversation_concentration(filtered_df)
|
||||||
}
|
}
|
||||||
|
|
||||||
def cultural(self, df: pd.DataFrame, filters: dict | None = None) -> dict:
|
def cultural(
|
||||||
|
self,
|
||||||
|
df: pd.DataFrame,
|
||||||
|
filters: dict | None = None,
|
||||||
|
dataset_id: int | None = None,
|
||||||
|
) -> dict:
|
||||||
filtered_df = self._prepare_filtered_df(df, filters)
|
filtered_df = self._prepare_filtered_df(df, filters)
|
||||||
|
|
||||||
return {
|
return {
|
||||||
@@ -134,7 +178,12 @@ class StatGen:
|
|||||||
"avg_emotion_per_entity": self.cultural_analysis.get_avg_emotions_per_entity(filtered_df)
|
"avg_emotion_per_entity": self.cultural_analysis.get_avg_emotions_per_entity(filtered_df)
|
||||||
}
|
}
|
||||||
|
|
||||||
def summary(self, df: pd.DataFrame, filters: dict | None = None) -> dict:
|
def summary(
|
||||||
|
self,
|
||||||
|
df: pd.DataFrame,
|
||||||
|
filters: dict | None = None,
|
||||||
|
dataset_id: int | None = None,
|
||||||
|
) -> dict:
|
||||||
filtered_df = self._prepare_filtered_df(df, filters)
|
filtered_df = self._prepare_filtered_df(df, filters)
|
||||||
|
|
||||||
return self.summary_analysis.summary(filtered_df)
|
return self.summary_analysis.summary(filtered_df)
|
||||||
|
|||||||
@@ -3,6 +3,7 @@ import re
|
|||||||
|
|
||||||
from collections import Counter
|
from collections import Counter
|
||||||
|
|
||||||
|
|
||||||
class UserAnalysis:
|
class UserAnalysis:
|
||||||
def __init__(self, word_exclusions: set[str]):
|
def __init__(self, word_exclusions: set[str]):
|
||||||
self.word_exclusions = word_exclusions
|
self.word_exclusions = word_exclusions
|
||||||
@@ -12,49 +13,49 @@ class UserAnalysis:
|
|||||||
return [t for t in tokens if t not in self.word_exclusions]
|
return [t for t in tokens if t not in self.word_exclusions]
|
||||||
|
|
||||||
def _vocab_richness_per_user(
|
def _vocab_richness_per_user(
|
||||||
self, df: pd.DataFrame, min_words: int = 20, top_most_used_words: int = 100
|
self, df: pd.DataFrame, min_words: int = 20, top_most_used_words: int = 100
|
||||||
) -> list:
|
) -> list:
|
||||||
df = df.copy()
|
df = df.copy()
|
||||||
df["content"] = df["content"].fillna("").astype(str).str.lower()
|
df["content"] = df["content"].fillna("").astype(str).str.lower()
|
||||||
df["tokens"] = df["content"].apply(self._tokenize)
|
df["tokens"] = df["content"].apply(self._tokenize)
|
||||||
|
|
||||||
rows = []
|
rows = []
|
||||||
for author, group in df.groupby("author"):
|
for author, group in df.groupby("author"):
|
||||||
all_tokens = [t for tokens in group["tokens"] for t in tokens]
|
all_tokens = [t for tokens in group["tokens"] for t in tokens]
|
||||||
|
|
||||||
total_words = len(all_tokens)
|
total_words = len(all_tokens)
|
||||||
unique_words = len(set(all_tokens))
|
unique_words = len(set(all_tokens))
|
||||||
events = len(group)
|
events = len(group)
|
||||||
|
|
||||||
# Min amount of words for a user, any less than this might give weird results
|
# Min amount of words for a user, any less than this might give weird results
|
||||||
if total_words < min_words:
|
if total_words < min_words:
|
||||||
continue
|
continue
|
||||||
|
|
||||||
# 100% = they never reused a word (excluding stop words)
|
# 100% = they never reused a word (excluding stop words)
|
||||||
vocab_richness = unique_words / total_words
|
vocab_richness = unique_words / total_words
|
||||||
avg_words = total_words / max(events, 1)
|
avg_words = total_words / max(events, 1)
|
||||||
|
|
||||||
counts = Counter(all_tokens)
|
counts = Counter(all_tokens)
|
||||||
top_words = [
|
top_words = [
|
||||||
{"word": w, "count": int(c)}
|
{"word": w, "count": int(c)}
|
||||||
for w, c in counts.most_common(top_most_used_words)
|
for w, c in counts.most_common(top_most_used_words)
|
||||||
]
|
]
|
||||||
|
|
||||||
rows.append(
|
rows.append(
|
||||||
{
|
{
|
||||||
"author": author,
|
"author": author,
|
||||||
"events": int(events),
|
"events": int(events),
|
||||||
"total_words": int(total_words),
|
"total_words": int(total_words),
|
||||||
"unique_words": int(unique_words),
|
"unique_words": int(unique_words),
|
||||||
"vocab_richness": round(vocab_richness, 3),
|
"vocab_richness": round(vocab_richness, 3),
|
||||||
"avg_words_per_event": round(avg_words, 2),
|
"avg_words_per_event": round(avg_words, 2),
|
||||||
"top_words": top_words,
|
"top_words": top_words,
|
||||||
}
|
}
|
||||||
)
|
)
|
||||||
|
|
||||||
rows = sorted(rows, key=lambda x: x["vocab_richness"], reverse=True)
|
rows = sorted(rows, key=lambda x: x["vocab_richness"], reverse=True)
|
||||||
|
|
||||||
return rows
|
return rows
|
||||||
|
|
||||||
def top_users(self, df: pd.DataFrame) -> list:
|
def top_users(self, df: pd.DataFrame) -> list:
|
||||||
counts = df.groupby(["author", "source"]).size().sort_values(ascending=False)
|
counts = df.groupby(["author", "source"]).size().sort_values(ascending=False)
|
||||||
@@ -70,6 +71,7 @@ class UserAnalysis:
|
|||||||
per_user = df.groupby(["author", "type"]).size().unstack(fill_value=0)
|
per_user = df.groupby(["author", "type"]).size().unstack(fill_value=0)
|
||||||
|
|
||||||
emotion_cols = [col for col in df.columns if col.startswith("emotion_")]
|
emotion_cols = [col for col in df.columns if col.startswith("emotion_")]
|
||||||
|
dominant_topic_by_author = {}
|
||||||
|
|
||||||
avg_emotions_by_author = {}
|
avg_emotions_by_author = {}
|
||||||
if emotion_cols:
|
if emotion_cols:
|
||||||
@@ -79,6 +81,31 @@ class UserAnalysis:
|
|||||||
for author, row in avg_emotions.iterrows()
|
for author, row in avg_emotions.iterrows()
|
||||||
}
|
}
|
||||||
|
|
||||||
|
if "topic" in df.columns:
|
||||||
|
topic_df = df[
|
||||||
|
df["topic"].notna()
|
||||||
|
& (df["topic"] != "")
|
||||||
|
& (df["topic"] != "Misc")
|
||||||
|
]
|
||||||
|
if not topic_df.empty:
|
||||||
|
topic_counts = (
|
||||||
|
topic_df.groupby(["author", "topic"])
|
||||||
|
.size()
|
||||||
|
.reset_index(name="count")
|
||||||
|
.sort_values(
|
||||||
|
["author", "count", "topic"],
|
||||||
|
ascending=[True, False, True],
|
||||||
|
)
|
||||||
|
.drop_duplicates(subset=["author"])
|
||||||
|
)
|
||||||
|
dominant_topic_by_author = {
|
||||||
|
row["author"]: {
|
||||||
|
"topic": row["topic"],
|
||||||
|
"count": int(row["count"]),
|
||||||
|
}
|
||||||
|
for _, row in topic_counts.iterrows()
|
||||||
|
}
|
||||||
|
|
||||||
# ensure columns always exist
|
# ensure columns always exist
|
||||||
for col in ("post", "comment"):
|
for col in ("post", "comment"):
|
||||||
if col not in per_user.columns:
|
if col not in per_user.columns:
|
||||||
@@ -108,6 +135,7 @@ class UserAnalysis:
|
|||||||
"comment_post_ratio": float(row.get("comment_post_ratio", 0)),
|
"comment_post_ratio": float(row.get("comment_post_ratio", 0)),
|
||||||
"comment_share": float(row.get("comment_share", 0)),
|
"comment_share": float(row.get("comment_share", 0)),
|
||||||
"avg_emotions": avg_emotions_by_author.get(author, {}),
|
"avg_emotions": avg_emotions_by_author.get(author, {}),
|
||||||
|
"dominant_topic": dominant_topic_by_author.get(author),
|
||||||
"vocab": vocab_by_author.get(
|
"vocab": vocab_by_author.get(
|
||||||
author,
|
author,
|
||||||
{
|
{
|
||||||
|
|||||||
162
server/app.py
@@ -30,7 +30,9 @@ load_dotenv()
|
|||||||
max_fetch_limit = int(get_env("MAX_FETCH_LIMIT"))
|
max_fetch_limit = int(get_env("MAX_FETCH_LIMIT"))
|
||||||
frontend_url = get_env("FRONTEND_URL")
|
frontend_url = get_env("FRONTEND_URL")
|
||||||
jwt_secret_key = get_env("JWT_SECRET_KEY")
|
jwt_secret_key = get_env("JWT_SECRET_KEY")
|
||||||
jwt_access_token_expires = int(os.getenv("JWT_ACCESS_TOKEN_EXPIRES", 1200)) # Default to 20 minutes
|
jwt_access_token_expires = int(
|
||||||
|
os.getenv("JWT_ACCESS_TOKEN_EXPIRES", 1200)
|
||||||
|
) # Default to 20 minutes
|
||||||
|
|
||||||
# Flask Configuration
|
# Flask Configuration
|
||||||
CORS(app, resources={r"/*": {"origins": frontend_url}})
|
CORS(app, resources={r"/*": {"origins": frontend_url}})
|
||||||
@@ -52,6 +54,28 @@ connectors = get_available_connectors()
|
|||||||
with open("server/topics.json") as f:
|
with open("server/topics.json") as f:
|
||||||
default_topic_list = json.load(f)
|
default_topic_list = json.load(f)
|
||||||
|
|
||||||
|
|
||||||
|
def normalize_topics(topics):
|
||||||
|
if not isinstance(topics, dict) or len(topics) == 0:
|
||||||
|
return None
|
||||||
|
|
||||||
|
normalized = {}
|
||||||
|
|
||||||
|
for topic_name, topic_keywords in topics.items():
|
||||||
|
if not isinstance(topic_name, str) or not isinstance(topic_keywords, str):
|
||||||
|
return None
|
||||||
|
|
||||||
|
clean_name = topic_name.strip()
|
||||||
|
clean_keywords = topic_keywords.strip()
|
||||||
|
|
||||||
|
if not clean_name or not clean_keywords:
|
||||||
|
return None
|
||||||
|
|
||||||
|
normalized[clean_name] = clean_keywords
|
||||||
|
|
||||||
|
return normalized
|
||||||
|
|
||||||
|
|
||||||
@app.route("/register", methods=["POST"])
|
@app.route("/register", methods=["POST"])
|
||||||
def register_user():
|
def register_user():
|
||||||
data = request.get_json()
|
data = request.get_json()
|
||||||
@@ -107,9 +131,13 @@ def login_user():
|
|||||||
def profile():
|
def profile():
|
||||||
current_user = get_jwt_identity()
|
current_user = get_jwt_identity()
|
||||||
|
|
||||||
return jsonify(
|
return (
|
||||||
message="Access granted", user=auth_manager.get_user_by_id(current_user)
|
jsonify(
|
||||||
), 200
|
message="Access granted", user=auth_manager.get_user_by_id(current_user)
|
||||||
|
),
|
||||||
|
200,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
@app.route("/user/datasets")
|
@app.route("/user/datasets")
|
||||||
@jwt_required()
|
@jwt_required()
|
||||||
@@ -117,14 +145,16 @@ def get_user_datasets():
|
|||||||
current_user = int(get_jwt_identity())
|
current_user = int(get_jwt_identity())
|
||||||
return jsonify(dataset_manager.get_user_datasets(current_user)), 200
|
return jsonify(dataset_manager.get_user_datasets(current_user)), 200
|
||||||
|
|
||||||
|
|
||||||
@app.route("/datasets/sources", methods=["GET"])
|
@app.route("/datasets/sources", methods=["GET"])
|
||||||
def get_dataset_sources():
|
def get_dataset_sources():
|
||||||
list_metadata = list(get_connector_metadata().values())
|
list_metadata = list(get_connector_metadata().values())
|
||||||
return jsonify(list_metadata)
|
return jsonify(list_metadata)
|
||||||
|
|
||||||
@app.route("/datasets/scrape", methods=["POST"])
|
|
||||||
|
@app.route("/datasets/fetch", methods=["POST"])
|
||||||
@jwt_required()
|
@jwt_required()
|
||||||
def scrape_data():
|
def fetch_data():
|
||||||
data = request.get_json()
|
data = request.get_json()
|
||||||
connector_metadata = get_connector_metadata()
|
connector_metadata = get_connector_metadata()
|
||||||
|
|
||||||
@@ -137,6 +167,8 @@ def scrape_data():
|
|||||||
|
|
||||||
dataset_name = data["name"].strip()
|
dataset_name = data["name"].strip()
|
||||||
user_id = int(get_jwt_identity())
|
user_id = int(get_jwt_identity())
|
||||||
|
custom_topics = data.get("topics")
|
||||||
|
topics_for_processing = default_topic_list
|
||||||
|
|
||||||
source_configs = data["sources"]
|
source_configs = data["sources"]
|
||||||
|
|
||||||
@@ -160,7 +192,7 @@ def scrape_data():
|
|||||||
limit = int(limit)
|
limit = int(limit)
|
||||||
except (ValueError, TypeError):
|
except (ValueError, TypeError):
|
||||||
return jsonify({"error": "Limit must be an integer"}), 400
|
return jsonify({"error": "Limit must be an integer"}), 400
|
||||||
|
|
||||||
if limit > 1000:
|
if limit > 1000:
|
||||||
limit = 1000
|
limit = 1000
|
||||||
|
|
||||||
@@ -172,15 +204,27 @@ def scrape_data():
|
|||||||
|
|
||||||
if category and not connector_metadata[name]["categories_enabled"]:
|
if category and not connector_metadata[name]["categories_enabled"]:
|
||||||
return jsonify({"error": f"Source {name} does not support categories"}), 400
|
return jsonify({"error": f"Source {name} does not support categories"}), 400
|
||||||
|
|
||||||
if category and not connectors[name]().category_exists(category):
|
# if category and not connectors[name]().category_exists(category):
|
||||||
return jsonify({"error": f"Category does not exist for {name}"}), 400
|
# return jsonify({"error": f"Category does not exist for {name}"}), 400
|
||||||
|
|
||||||
|
if custom_topics is not None:
|
||||||
|
normalized_topics = normalize_topics(custom_topics)
|
||||||
|
if not normalized_topics:
|
||||||
|
return (
|
||||||
|
jsonify(
|
||||||
|
{
|
||||||
|
"error": "Topics must be a non-empty JSON object with non-empty string keys and values"
|
||||||
|
}
|
||||||
|
),
|
||||||
|
400,
|
||||||
|
)
|
||||||
|
|
||||||
|
topics_for_processing = normalized_topics
|
||||||
|
|
||||||
try:
|
try:
|
||||||
dataset_id = dataset_manager.save_dataset_info(
|
dataset_id = dataset_manager.save_dataset_info(
|
||||||
user_id,
|
user_id, dataset_name, topics_for_processing
|
||||||
dataset_name,
|
|
||||||
default_topic_list
|
|
||||||
)
|
)
|
||||||
|
|
||||||
dataset_manager.set_dataset_status(
|
dataset_manager.set_dataset_status(
|
||||||
@@ -189,22 +233,21 @@ def scrape_data():
|
|||||||
f"Data is being fetched from {', '.join(source['name'] for source in source_configs)}",
|
f"Data is being fetched from {', '.join(source['name'] for source in source_configs)}",
|
||||||
)
|
)
|
||||||
|
|
||||||
fetch_and_process_dataset.delay(
|
fetch_and_process_dataset.delay(dataset_id, source_configs, topics_for_processing)
|
||||||
dataset_id,
|
|
||||||
source_configs,
|
|
||||||
default_topic_list
|
|
||||||
)
|
|
||||||
except Exception:
|
except Exception:
|
||||||
print(traceback.format_exc())
|
print(traceback.format_exc())
|
||||||
return jsonify({"error": "Failed to queue dataset processing"}), 500
|
return jsonify({"error": "Failed to queue dataset processing"}), 500
|
||||||
|
|
||||||
return jsonify(
|
return (
|
||||||
{
|
jsonify(
|
||||||
"message": "Dataset queued for processing",
|
{
|
||||||
"dataset_id": dataset_id,
|
"message": "Dataset queued for processing",
|
||||||
"status": "processing",
|
"dataset_id": dataset_id,
|
||||||
}
|
"status": "processing",
|
||||||
), 202
|
}
|
||||||
|
),
|
||||||
|
202,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
@app.route("/datasets/upload", methods=["POST"])
|
@app.route("/datasets/upload", methods=["POST"])
|
||||||
@@ -226,9 +269,12 @@ def upload_data():
|
|||||||
if not post_file.filename.endswith(".jsonl") or not topic_file.filename.endswith(
|
if not post_file.filename.endswith(".jsonl") or not topic_file.filename.endswith(
|
||||||
".json"
|
".json"
|
||||||
):
|
):
|
||||||
return jsonify(
|
return (
|
||||||
{"error": "Invalid file type. Only .jsonl and .json files are allowed."}
|
jsonify(
|
||||||
), 400
|
{"error": "Invalid file type. Only .jsonl and .json files are allowed."}
|
||||||
|
),
|
||||||
|
400,
|
||||||
|
)
|
||||||
|
|
||||||
try:
|
try:
|
||||||
current_user = int(get_jwt_identity())
|
current_user = int(get_jwt_identity())
|
||||||
@@ -241,13 +287,16 @@ def upload_data():
|
|||||||
|
|
||||||
process_dataset.delay(dataset_id, posts_df.to_dict(orient="records"), topics)
|
process_dataset.delay(dataset_id, posts_df.to_dict(orient="records"), topics)
|
||||||
|
|
||||||
return jsonify(
|
return (
|
||||||
{
|
jsonify(
|
||||||
"message": "Dataset queued for processing",
|
{
|
||||||
"dataset_id": dataset_id,
|
"message": "Dataset queued for processing",
|
||||||
"status": "processing",
|
"dataset_id": dataset_id,
|
||||||
}
|
"status": "processing",
|
||||||
), 202
|
}
|
||||||
|
),
|
||||||
|
202,
|
||||||
|
)
|
||||||
except ValueError as e:
|
except ValueError as e:
|
||||||
return jsonify({"error": f"Failed to read JSONL file"}), 400
|
return jsonify({"error": f"Failed to read JSONL file"}), 400
|
||||||
except Exception as e:
|
except Exception as e:
|
||||||
@@ -296,9 +345,12 @@ def update_dataset(dataset_id):
|
|||||||
return jsonify({"error": "A valid name must be provided"}), 400
|
return jsonify({"error": "A valid name must be provided"}), 400
|
||||||
|
|
||||||
dataset_manager.update_dataset_name(dataset_id, new_name.strip())
|
dataset_manager.update_dataset_name(dataset_id, new_name.strip())
|
||||||
return jsonify(
|
return (
|
||||||
{"message": f"Dataset {dataset_id} renamed to '{new_name.strip()}'"}
|
jsonify(
|
||||||
), 200
|
{"message": f"Dataset {dataset_id} renamed to '{new_name.strip()}'"}
|
||||||
|
),
|
||||||
|
200,
|
||||||
|
)
|
||||||
except NotAuthorisedException:
|
except NotAuthorisedException:
|
||||||
return jsonify({"error": "User is not authorised to access this content"}), 403
|
return jsonify({"error": "User is not authorised to access this content"}), 403
|
||||||
except NonExistentDatasetException:
|
except NonExistentDatasetException:
|
||||||
@@ -321,11 +373,14 @@ def delete_dataset(dataset_id):
|
|||||||
|
|
||||||
dataset_manager.delete_dataset_info(dataset_id)
|
dataset_manager.delete_dataset_info(dataset_id)
|
||||||
dataset_manager.delete_dataset_content(dataset_id)
|
dataset_manager.delete_dataset_content(dataset_id)
|
||||||
return jsonify(
|
return (
|
||||||
{
|
jsonify(
|
||||||
"message": f"Dataset {dataset_id} metadata and content successfully deleted"
|
{
|
||||||
}
|
"message": f"Dataset {dataset_id} metadata and content successfully deleted"
|
||||||
), 200
|
}
|
||||||
|
),
|
||||||
|
200,
|
||||||
|
)
|
||||||
except NotAuthorisedException:
|
except NotAuthorisedException:
|
||||||
return jsonify({"error": "User is not authorised to access this content"}), 403
|
return jsonify({"error": "User is not authorised to access this content"}), 403
|
||||||
except NonExistentDatasetException:
|
except NonExistentDatasetException:
|
||||||
@@ -369,7 +424,7 @@ def get_linguistic_analysis(dataset_id):
|
|||||||
|
|
||||||
dataset_content = dataset_manager.get_dataset_content(dataset_id)
|
dataset_content = dataset_manager.get_dataset_content(dataset_id)
|
||||||
filters = get_request_filters()
|
filters = get_request_filters()
|
||||||
return jsonify(stat_gen.linguistic(dataset_content, filters)), 200
|
return jsonify(stat_gen.linguistic(dataset_content, filters, dataset_id=dataset_id)), 200
|
||||||
except NotAuthorisedException:
|
except NotAuthorisedException:
|
||||||
return jsonify({"error": "User is not authorised to access this content"}), 403
|
return jsonify({"error": "User is not authorised to access this content"}), 403
|
||||||
except NonExistentDatasetException:
|
except NonExistentDatasetException:
|
||||||
@@ -393,7 +448,7 @@ def get_emotional_analysis(dataset_id):
|
|||||||
|
|
||||||
dataset_content = dataset_manager.get_dataset_content(dataset_id)
|
dataset_content = dataset_manager.get_dataset_content(dataset_id)
|
||||||
filters = get_request_filters()
|
filters = get_request_filters()
|
||||||
return jsonify(stat_gen.emotional(dataset_content, filters)), 200
|
return jsonify(stat_gen.emotional(dataset_content, filters, dataset_id=dataset_id)), 200
|
||||||
except NotAuthorisedException:
|
except NotAuthorisedException:
|
||||||
return jsonify({"error": "User is not authorised to access this content"}), 403
|
return jsonify({"error": "User is not authorised to access this content"}), 403
|
||||||
except NonExistentDatasetException:
|
except NonExistentDatasetException:
|
||||||
@@ -417,7 +472,7 @@ def get_summary(dataset_id):
|
|||||||
|
|
||||||
dataset_content = dataset_manager.get_dataset_content(dataset_id)
|
dataset_content = dataset_manager.get_dataset_content(dataset_id)
|
||||||
filters = get_request_filters()
|
filters = get_request_filters()
|
||||||
return jsonify(stat_gen.summary(dataset_content, filters)), 200
|
return jsonify(stat_gen.summary(dataset_content, filters, dataset_id=dataset_id)), 200
|
||||||
except NotAuthorisedException:
|
except NotAuthorisedException:
|
||||||
return jsonify({"error": "User is not authorised to access this content"}), 403
|
return jsonify({"error": "User is not authorised to access this content"}), 403
|
||||||
except NonExistentDatasetException:
|
except NonExistentDatasetException:
|
||||||
@@ -441,7 +496,7 @@ def get_temporal_analysis(dataset_id):
|
|||||||
|
|
||||||
dataset_content = dataset_manager.get_dataset_content(dataset_id)
|
dataset_content = dataset_manager.get_dataset_content(dataset_id)
|
||||||
filters = get_request_filters()
|
filters = get_request_filters()
|
||||||
return jsonify(stat_gen.temporal(dataset_content, filters)), 200
|
return jsonify(stat_gen.temporal(dataset_content, filters, dataset_id=dataset_id)), 200
|
||||||
except NotAuthorisedException:
|
except NotAuthorisedException:
|
||||||
return jsonify({"error": "User is not authorised to access this content"}), 403
|
return jsonify({"error": "User is not authorised to access this content"}), 403
|
||||||
except NonExistentDatasetException:
|
except NonExistentDatasetException:
|
||||||
@@ -465,7 +520,7 @@ def get_user_analysis(dataset_id):
|
|||||||
|
|
||||||
dataset_content = dataset_manager.get_dataset_content(dataset_id)
|
dataset_content = dataset_manager.get_dataset_content(dataset_id)
|
||||||
filters = get_request_filters()
|
filters = get_request_filters()
|
||||||
return jsonify(stat_gen.user(dataset_content, filters)), 200
|
return jsonify(stat_gen.user(dataset_content, filters, dataset_id=dataset_id)), 200
|
||||||
except NotAuthorisedException:
|
except NotAuthorisedException:
|
||||||
return jsonify({"error": "User is not authorised to access this content"}), 403
|
return jsonify({"error": "User is not authorised to access this content"}), 403
|
||||||
except NonExistentDatasetException:
|
except NonExistentDatasetException:
|
||||||
@@ -489,7 +544,7 @@ def get_cultural_analysis(dataset_id):
|
|||||||
|
|
||||||
dataset_content = dataset_manager.get_dataset_content(dataset_id)
|
dataset_content = dataset_manager.get_dataset_content(dataset_id)
|
||||||
filters = get_request_filters()
|
filters = get_request_filters()
|
||||||
return jsonify(stat_gen.cultural(dataset_content, filters)), 200
|
return jsonify(stat_gen.cultural(dataset_content, filters, dataset_id=dataset_id)), 200
|
||||||
except NotAuthorisedException:
|
except NotAuthorisedException:
|
||||||
return jsonify({"error": "User is not authorised to access this content"}), 403
|
return jsonify({"error": "User is not authorised to access this content"}), 403
|
||||||
except NonExistentDatasetException:
|
except NonExistentDatasetException:
|
||||||
@@ -513,7 +568,7 @@ def get_interaction_analysis(dataset_id):
|
|||||||
|
|
||||||
dataset_content = dataset_manager.get_dataset_content(dataset_id)
|
dataset_content = dataset_manager.get_dataset_content(dataset_id)
|
||||||
filters = get_request_filters()
|
filters = get_request_filters()
|
||||||
return jsonify(stat_gen.interactional(dataset_content, filters)), 200
|
return jsonify(stat_gen.interactional(dataset_content, filters, dataset_id=dataset_id)), 200
|
||||||
except NotAuthorisedException:
|
except NotAuthorisedException:
|
||||||
return jsonify({"error": "User is not authorised to access this content"}), 403
|
return jsonify({"error": "User is not authorised to access this content"}), 403
|
||||||
except NonExistentDatasetException:
|
except NonExistentDatasetException:
|
||||||
@@ -523,7 +578,8 @@ def get_interaction_analysis(dataset_id):
|
|||||||
except Exception as e:
|
except Exception as e:
|
||||||
print(traceback.format_exc())
|
print(traceback.format_exc())
|
||||||
return jsonify({"error": f"An unexpected error occurred"}), 500
|
return jsonify({"error": f"An unexpected error occurred"}), 500
|
||||||
|
|
||||||
|
|
||||||
@app.route("/dataset/<int:dataset_id>/all", methods=["GET"])
|
@app.route("/dataset/<int:dataset_id>/all", methods=["GET"])
|
||||||
@jwt_required()
|
@jwt_required()
|
||||||
def get_full_dataset(dataset_id: int):
|
def get_full_dataset(dataset_id: int):
|
||||||
@@ -535,7 +591,8 @@ def get_full_dataset(dataset_id: int):
|
|||||||
)
|
)
|
||||||
|
|
||||||
dataset_content = dataset_manager.get_dataset_content(dataset_id)
|
dataset_content = dataset_manager.get_dataset_content(dataset_id)
|
||||||
return jsonify(dataset_content.to_dict(orient="records")), 200
|
filters = get_request_filters()
|
||||||
|
return jsonify(stat_gen.filter_dataset(dataset_content, filters)), 200
|
||||||
except NotAuthorisedException:
|
except NotAuthorisedException:
|
||||||
return jsonify({"error": "User is not authorised to access this content"}), 403
|
return jsonify({"error": "User is not authorised to access this content"}), 403
|
||||||
except NonExistentDatasetException:
|
except NonExistentDatasetException:
|
||||||
@@ -546,5 +603,6 @@ def get_full_dataset(dataset_id: int):
|
|||||||
print(traceback.format_exc())
|
print(traceback.format_exc())
|
||||||
return jsonify({"error": f"An unexpected error occurred"}), 500
|
return jsonify({"error": f"An unexpected error occurred"}), 500
|
||||||
|
|
||||||
|
|
||||||
if __name__ == "__main__":
|
if __name__ == "__main__":
|
||||||
app.run(debug=True)
|
app.run(debug=True)
|
||||||
|
|||||||
@@ -1,29 +1,24 @@
|
|||||||
from abc import ABC, abstractmethod
|
from abc import ABC, abstractmethod
|
||||||
from dto.post import Post
|
from dto.post import Post
|
||||||
|
import os
|
||||||
|
|
||||||
|
|
||||||
class BaseConnector(ABC):
|
class BaseConnector(ABC):
|
||||||
# Each subclass declares these at the class level
|
source_name: str # machine readable
|
||||||
source_name: str # machine-readable: "reddit", "youtube"
|
display_name: str # human readablee
|
||||||
display_name: str # human-readable: "Reddit", "YouTube"
|
required_env: list[str] = []
|
||||||
required_env: list[str] = [] # env vars needed to activate
|
|
||||||
|
|
||||||
search_enabled: bool
|
search_enabled: bool
|
||||||
categories_enabled: bool
|
categories_enabled: bool
|
||||||
|
|
||||||
@classmethod
|
@classmethod
|
||||||
def is_available(cls) -> bool:
|
def is_available(cls) -> bool:
|
||||||
"""Returns True if all required env vars are set."""
|
|
||||||
import os
|
|
||||||
return all(os.getenv(var) for var in cls.required_env)
|
return all(os.getenv(var) for var in cls.required_env)
|
||||||
|
|
||||||
@abstractmethod
|
@abstractmethod
|
||||||
def get_new_posts_by_search(self,
|
def get_new_posts_by_search(
|
||||||
search: str = None,
|
self, search: str = None, category: str = None, post_limit: int = 10
|
||||||
category: str = None,
|
) -> list[Post]: ...
|
||||||
post_limit: int = 10
|
|
||||||
) -> list[Post]:
|
|
||||||
...
|
|
||||||
|
|
||||||
@abstractmethod
|
@abstractmethod
|
||||||
def category_exists(self, category: str) -> bool:
|
def category_exists(self, category: str) -> bool: ...
|
||||||
...
|
|
||||||
|
|||||||
@@ -11,9 +11,7 @@ from server.connectors.base import BaseConnector
|
|||||||
|
|
||||||
logger = logging.getLogger(__name__)
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
HEADERS = {
|
HEADERS = {"User-Agent": "Mozilla/5.0 (compatible; Digital-Ethnography-Aid/1.0)"}
|
||||||
"User-Agent": "Mozilla/5.0 (compatible; ForumScraper/1.0)"
|
|
||||||
}
|
|
||||||
|
|
||||||
class BoardsAPI(BaseConnector):
|
class BoardsAPI(BaseConnector):
|
||||||
source_name: str = "boards.ie"
|
source_name: str = "boards.ie"
|
||||||
@@ -25,19 +23,17 @@ class BoardsAPI(BaseConnector):
|
|||||||
def __init__(self):
|
def __init__(self):
|
||||||
self.base_url = "https://www.boards.ie"
|
self.base_url = "https://www.boards.ie"
|
||||||
|
|
||||||
def get_new_posts_by_search(self,
|
def get_new_posts_by_search(
|
||||||
search: str,
|
self, search: str, category: str, post_limit: int
|
||||||
category: str,
|
) -> list[Post]:
|
||||||
post_limit: int
|
|
||||||
) -> list[Post]:
|
|
||||||
if search:
|
if search:
|
||||||
raise NotImplementedError("Search not compatible with boards.ie")
|
raise NotImplementedError("Search not compatible with boards.ie")
|
||||||
|
|
||||||
if category:
|
if category:
|
||||||
return self._get_posts(f"{self.base_url}/categories/{category}", post_limit)
|
return self._get_posts(f"{self.base_url}/categories/{category}", post_limit)
|
||||||
else:
|
else:
|
||||||
return self._get_posts(f"{self.base_url}/discussions", post_limit)
|
return self._get_posts(f"{self.base_url}/discussions", post_limit)
|
||||||
|
|
||||||
def category_exists(self, category: str) -> bool:
|
def category_exists(self, category: str) -> bool:
|
||||||
if not category:
|
if not category:
|
||||||
return False
|
return False
|
||||||
@@ -59,7 +55,7 @@ class BoardsAPI(BaseConnector):
|
|||||||
except requests.RequestException as e:
|
except requests.RequestException as e:
|
||||||
logger.error(f"Error checking category '{category}': {e}")
|
logger.error(f"Error checking category '{category}': {e}")
|
||||||
return False
|
return False
|
||||||
|
|
||||||
## Private
|
## Private
|
||||||
def _get_posts(self, url, limit) -> list[Post]:
|
def _get_posts(self, url, limit) -> list[Post]:
|
||||||
urls = []
|
urls = []
|
||||||
@@ -78,7 +74,7 @@ class BoardsAPI(BaseConnector):
|
|||||||
href = a.get("href")
|
href = a.get("href")
|
||||||
if href:
|
if href:
|
||||||
urls.append(href)
|
urls.append(href)
|
||||||
|
|
||||||
current_page += 1
|
current_page += 1
|
||||||
|
|
||||||
logger.debug(f"Fetched {len(urls)} post URLs")
|
logger.debug(f"Fetched {len(urls)} post URLs")
|
||||||
@@ -91,12 +87,14 @@ class BoardsAPI(BaseConnector):
|
|||||||
post = self._parse_thread(html, post_url)
|
post = self._parse_thread(html, post_url)
|
||||||
return post
|
return post
|
||||||
|
|
||||||
with ThreadPoolExecutor(max_workers=30) as executor:
|
with ThreadPoolExecutor(max_workers=5) as executor:
|
||||||
futures = {executor.submit(fetch_and_parse, url): url for url in urls}
|
futures = {executor.submit(fetch_and_parse, url): url for url in urls}
|
||||||
|
|
||||||
for i, future in enumerate(as_completed(futures)):
|
for i, future in enumerate(as_completed(futures)):
|
||||||
post_url = futures[future]
|
post_url = futures[future]
|
||||||
logger.debug(f"Fetching Post {i + 1} / {len(urls)} details from URL: {post_url}")
|
logger.debug(
|
||||||
|
f"Fetching Post {i + 1} / {len(urls)} details from URL: {post_url}"
|
||||||
|
)
|
||||||
try:
|
try:
|
||||||
post = future.result()
|
post = future.result()
|
||||||
posts.append(post)
|
posts.append(post)
|
||||||
@@ -105,7 +103,6 @@ class BoardsAPI(BaseConnector):
|
|||||||
|
|
||||||
return posts
|
return posts
|
||||||
|
|
||||||
|
|
||||||
def _fetch_page(self, url: str) -> str:
|
def _fetch_page(self, url: str) -> str:
|
||||||
response = requests.get(url, headers=HEADERS)
|
response = requests.get(url, headers=HEADERS)
|
||||||
response.raise_for_status()
|
response.raise_for_status()
|
||||||
@@ -113,7 +110,7 @@ class BoardsAPI(BaseConnector):
|
|||||||
|
|
||||||
def _parse_thread(self, html: str, post_url: str) -> Post:
|
def _parse_thread(self, html: str, post_url: str) -> Post:
|
||||||
soup = BeautifulSoup(html, "html.parser")
|
soup = BeautifulSoup(html, "html.parser")
|
||||||
|
|
||||||
# Author
|
# Author
|
||||||
author_tag = soup.select_one(".userinfo-username-title")
|
author_tag = soup.select_one(".userinfo-username-title")
|
||||||
author = author_tag.text.strip() if author_tag else None
|
author = author_tag.text.strip() if author_tag else None
|
||||||
@@ -122,10 +119,16 @@ class BoardsAPI(BaseConnector):
|
|||||||
timestamp_tag = soup.select_one(".postbit-header")
|
timestamp_tag = soup.select_one(".postbit-header")
|
||||||
timestamp = None
|
timestamp = None
|
||||||
if timestamp_tag:
|
if timestamp_tag:
|
||||||
match = re.search(r"\d{2}-\d{2}-\d{4}\s+\d{2}:\d{2}[AP]M", timestamp_tag.get_text())
|
match = re.search(
|
||||||
|
r"\d{2}-\d{2}-\d{4}\s+\d{2}:\d{2}[AP]M", timestamp_tag.get_text()
|
||||||
|
)
|
||||||
timestamp = match.group(0) if match else None
|
timestamp = match.group(0) if match else None
|
||||||
# convert to unix epoch
|
# convert to unix epoch
|
||||||
timestamp = datetime.datetime.strptime(timestamp, "%d-%m-%Y %I:%M%p").timestamp() if timestamp else None
|
timestamp = (
|
||||||
|
datetime.datetime.strptime(timestamp, "%d-%m-%Y %I:%M%p").timestamp()
|
||||||
|
if timestamp
|
||||||
|
else None
|
||||||
|
)
|
||||||
|
|
||||||
# Post ID
|
# Post ID
|
||||||
post_num = re.search(r"discussion/(\d+)", post_url)
|
post_num = re.search(r"discussion/(\d+)", post_url)
|
||||||
@@ -133,7 +136,9 @@ class BoardsAPI(BaseConnector):
|
|||||||
|
|
||||||
# Content
|
# Content
|
||||||
content_tag = soup.select_one(".Message.userContent")
|
content_tag = soup.select_one(".Message.userContent")
|
||||||
content = content_tag.get_text(separator="\n", strip=True) if content_tag else None
|
content = (
|
||||||
|
content_tag.get_text(separator="\n", strip=True) if content_tag else None
|
||||||
|
)
|
||||||
|
|
||||||
# Title
|
# Title
|
||||||
title_tag = soup.select_one(".PageTitle h1")
|
title_tag = soup.select_one(".PageTitle h1")
|
||||||
@@ -150,7 +155,7 @@ class BoardsAPI(BaseConnector):
|
|||||||
url=post_url,
|
url=post_url,
|
||||||
timestamp=timestamp,
|
timestamp=timestamp,
|
||||||
source=self.source_name,
|
source=self.source_name,
|
||||||
comments=comments
|
comments=comments,
|
||||||
)
|
)
|
||||||
|
|
||||||
return post
|
return post
|
||||||
@@ -168,9 +173,9 @@ class BoardsAPI(BaseConnector):
|
|||||||
soup = BeautifulSoup(html, "html.parser")
|
soup = BeautifulSoup(html, "html.parser")
|
||||||
next_link = soup.find("a", class_="Next")
|
next_link = soup.find("a", class_="Next")
|
||||||
|
|
||||||
if next_link and next_link.get('href'):
|
if next_link and next_link.get("href"):
|
||||||
href = next_link.get('href')
|
href = next_link.get("href")
|
||||||
current_url = href if href.startswith('http') else url + href
|
current_url = href if href.startswith("http") else url + href
|
||||||
else:
|
else:
|
||||||
current_url = None
|
current_url = None
|
||||||
|
|
||||||
@@ -186,21 +191,29 @@ class BoardsAPI(BaseConnector):
|
|||||||
comment_id = tag.get("id")
|
comment_id = tag.get("id")
|
||||||
|
|
||||||
# Author
|
# Author
|
||||||
user_elem = tag.find('span', class_='userinfo-username-title')
|
user_elem = tag.find("span", class_="userinfo-username-title")
|
||||||
username = user_elem.get_text(strip=True) if user_elem else None
|
username = user_elem.get_text(strip=True) if user_elem else None
|
||||||
|
|
||||||
# Timestamp
|
# Timestamp
|
||||||
date_elem = tag.find('span', class_='DateCreated')
|
date_elem = tag.find("span", class_="DateCreated")
|
||||||
timestamp = date_elem.get_text(strip=True) if date_elem else None
|
timestamp = date_elem.get_text(strip=True) if date_elem else None
|
||||||
timestamp = datetime.datetime.strptime(timestamp, "%d-%m-%Y %I:%M%p").timestamp() if timestamp else None
|
timestamp = (
|
||||||
|
datetime.datetime.strptime(timestamp, "%d-%m-%Y %I:%M%p").timestamp()
|
||||||
|
if timestamp
|
||||||
|
else None
|
||||||
|
)
|
||||||
|
|
||||||
# Content
|
# Content
|
||||||
message_div = tag.find('div', class_='Message userContent')
|
message_div = tag.find("div", class_="Message userContent")
|
||||||
|
|
||||||
if message_div.blockquote:
|
if message_div.blockquote:
|
||||||
message_div.blockquote.decompose()
|
message_div.blockquote.decompose()
|
||||||
|
|
||||||
content = message_div.get_text(separator="\n", strip=True) if message_div else None
|
content = (
|
||||||
|
message_div.get_text(separator="\n", strip=True)
|
||||||
|
if message_div
|
||||||
|
else None
|
||||||
|
)
|
||||||
|
|
||||||
comment = Comment(
|
comment = Comment(
|
||||||
id=comment_id,
|
id=comment_id,
|
||||||
@@ -209,10 +222,8 @@ class BoardsAPI(BaseConnector):
|
|||||||
content=content,
|
content=content,
|
||||||
timestamp=timestamp,
|
timestamp=timestamp,
|
||||||
reply_to=None,
|
reply_to=None,
|
||||||
source=self.source_name
|
source=self.source_name,
|
||||||
)
|
)
|
||||||
comments.append(comment)
|
comments.append(comment)
|
||||||
|
|
||||||
return comments
|
return comments
|
||||||
|
|
||||||
|
|
||||||
|
|||||||
@@ -1,6 +1,10 @@
|
|||||||
import requests
|
import requests
|
||||||
import logging
|
import logging
|
||||||
import time
|
import time
|
||||||
|
import os
|
||||||
|
|
||||||
|
from dotenv import load_dotenv
|
||||||
|
from requests.auth import HTTPBasicAuth
|
||||||
|
|
||||||
from dto.post import Post
|
from dto.post import Post
|
||||||
from dto.user import User
|
from dto.user import User
|
||||||
@@ -9,6 +13,9 @@ from server.connectors.base import BaseConnector
|
|||||||
|
|
||||||
logger = logging.getLogger(__name__)
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
|
CLIENT_ID = os.getenv("REDDIT_CLIENT_ID")
|
||||||
|
CLIENT_SECRET = os.getenv("REDDIT_CLIENT_SECRET")
|
||||||
|
|
||||||
class RedditAPI(BaseConnector):
|
class RedditAPI(BaseConnector):
|
||||||
source_name: str = "reddit"
|
source_name: str = "reddit"
|
||||||
display_name: str = "Reddit"
|
display_name: str = "Reddit"
|
||||||
@@ -17,24 +24,22 @@ class RedditAPI(BaseConnector):
|
|||||||
|
|
||||||
def __init__(self):
|
def __init__(self):
|
||||||
self.url = "https://www.reddit.com/"
|
self.url = "https://www.reddit.com/"
|
||||||
|
self.token = None
|
||||||
|
self.token_expiry = 0
|
||||||
|
|
||||||
# Public Methods #
|
# Public Methods #
|
||||||
def get_new_posts_by_search(self,
|
def get_new_posts_by_search(
|
||||||
search: str,
|
self, search: str, category: str, post_limit: int
|
||||||
category: str,
|
) -> list[Post]:
|
||||||
post_limit: int
|
|
||||||
) -> list[Post]:
|
|
||||||
|
|
||||||
prefix = f"r/{category}/" if category else ""
|
prefix = f"r/{category}/" if category else ""
|
||||||
params = {'limit': post_limit}
|
params = {"limit": post_limit}
|
||||||
|
|
||||||
if search:
|
if search:
|
||||||
endpoint = f"{prefix}search.json"
|
endpoint = f"{prefix}search.json"
|
||||||
params.update({
|
params.update(
|
||||||
'q': search,
|
{"q": search, "sort": "new", "restrict_sr": "on" if category else "off"}
|
||||||
'sort': 'new',
|
)
|
||||||
'restrict_sr': 'on' if category else 'off'
|
|
||||||
})
|
|
||||||
else:
|
else:
|
||||||
endpoint = f"{prefix}new.json"
|
endpoint = f"{prefix}new.json"
|
||||||
|
|
||||||
@@ -43,24 +48,24 @@ class RedditAPI(BaseConnector):
|
|||||||
|
|
||||||
while len(posts) < post_limit:
|
while len(posts) < post_limit:
|
||||||
batch_limit = min(100, post_limit - len(posts))
|
batch_limit = min(100, post_limit - len(posts))
|
||||||
params['limit'] = batch_limit
|
params["limit"] = batch_limit
|
||||||
if after:
|
if after:
|
||||||
params['after'] = after
|
params["after"] = after
|
||||||
|
|
||||||
data = self._fetch_post_overviews(endpoint, params)
|
data = self._fetch_post_overviews(endpoint, params)
|
||||||
|
|
||||||
if not data or 'data' not in data or not data['data'].get('children'):
|
if not data or "data" not in data or not data["data"].get("children"):
|
||||||
break
|
break
|
||||||
|
|
||||||
batch_posts = self._parse_posts(data)
|
batch_posts = self._parse_posts(data)
|
||||||
posts.extend(batch_posts)
|
posts.extend(batch_posts)
|
||||||
|
|
||||||
after = data['data'].get('after')
|
after = data["data"].get("after")
|
||||||
if not after:
|
if not after:
|
||||||
break
|
break
|
||||||
|
|
||||||
return posts[:post_limit]
|
return posts[:post_limit]
|
||||||
|
|
||||||
def _get_new_subreddit_posts(self, subreddit: str, limit: int = 10) -> list[Post]:
|
def _get_new_subreddit_posts(self, subreddit: str, limit: int = 10) -> list[Post]:
|
||||||
posts = []
|
posts = []
|
||||||
after = None
|
after = None
|
||||||
@@ -70,37 +75,36 @@ class RedditAPI(BaseConnector):
|
|||||||
|
|
||||||
while len(posts) < limit:
|
while len(posts) < limit:
|
||||||
batch_limit = min(100, limit - len(posts))
|
batch_limit = min(100, limit - len(posts))
|
||||||
params = {
|
params = {"limit": batch_limit, "after": after}
|
||||||
'limit': batch_limit,
|
|
||||||
'after': after
|
|
||||||
}
|
|
||||||
|
|
||||||
data = self._fetch_post_overviews(url, params)
|
data = self._fetch_post_overviews(url, params)
|
||||||
batch_posts = self._parse_posts(data)
|
batch_posts = self._parse_posts(data)
|
||||||
|
|
||||||
logger.debug(f"Fetched {len(batch_posts)} new posts from subreddit {subreddit}")
|
logger.debug(
|
||||||
|
f"Fetched {len(batch_posts)} new posts from subreddit {subreddit}"
|
||||||
|
)
|
||||||
|
|
||||||
if not batch_posts:
|
if not batch_posts:
|
||||||
break
|
break
|
||||||
|
|
||||||
posts.extend(batch_posts)
|
posts.extend(batch_posts)
|
||||||
after = data['data'].get('after')
|
after = data["data"].get("after")
|
||||||
if not after:
|
if not after:
|
||||||
break
|
break
|
||||||
|
|
||||||
return posts
|
return posts
|
||||||
|
|
||||||
def get_user(self, username: str) -> User:
|
def get_user(self, username: str) -> User:
|
||||||
data = self._fetch_post_overviews(f"user/{username}/about.json", {})
|
data = self._fetch_post_overviews(f"user/{username}/about.json", {})
|
||||||
return self._parse_user(data)
|
return self._parse_user(data)
|
||||||
|
|
||||||
def category_exists(self, category: str) -> bool:
|
def category_exists(self, category: str) -> bool:
|
||||||
try:
|
try:
|
||||||
data = self._fetch_post_overviews(f"r/{category}/about.json", {})
|
data = self._fetch_post_overviews(f"r/{category}/about.json", {})
|
||||||
return (
|
return (
|
||||||
data is not None
|
data is not None
|
||||||
and 'data' in data
|
and "data" in data
|
||||||
and data['data'].get('id') is not None
|
and data["data"].get("id") is not None
|
||||||
)
|
)
|
||||||
except Exception:
|
except Exception:
|
||||||
return False
|
return False
|
||||||
@@ -109,25 +113,26 @@ class RedditAPI(BaseConnector):
|
|||||||
def _parse_posts(self, data) -> list[Post]:
|
def _parse_posts(self, data) -> list[Post]:
|
||||||
posts = []
|
posts = []
|
||||||
|
|
||||||
total_num_posts = len(data['data']['children'])
|
total_num_posts = len(data["data"]["children"])
|
||||||
current_index = 0
|
current_index = 0
|
||||||
|
|
||||||
for item in data['data']['children']:
|
for item in data["data"]["children"]:
|
||||||
current_index += 1
|
current_index += 1
|
||||||
logger.debug(f"Parsing post {current_index} of {total_num_posts}")
|
logger.debug(f"Parsing post {current_index} of {total_num_posts}")
|
||||||
|
|
||||||
post_data = item['data']
|
post_data = item["data"]
|
||||||
post = Post(
|
post = Post(
|
||||||
id=post_data['id'],
|
id=post_data["id"],
|
||||||
author=post_data['author'],
|
author=post_data["author"],
|
||||||
title=post_data['title'],
|
title=post_data["title"],
|
||||||
content=post_data.get('selftext', ''),
|
content=post_data.get("selftext", ""),
|
||||||
url=post_data['url'],
|
url=post_data["url"],
|
||||||
timestamp=post_data['created_utc'],
|
timestamp=post_data["created_utc"],
|
||||||
source=self.source_name,
|
source=self.source_name,
|
||||||
comments=self._get_post_comments(post_data['id']))
|
comments=self._get_post_comments(post_data["id"]),
|
||||||
post.subreddit = post_data['subreddit']
|
)
|
||||||
post.upvotes = post_data['ups']
|
post.subreddit = post_data["subreddit"]
|
||||||
|
post.upvotes = post_data["ups"]
|
||||||
|
|
||||||
posts.append(post)
|
posts.append(post)
|
||||||
return posts
|
return posts
|
||||||
@@ -140,56 +145,102 @@ class RedditAPI(BaseConnector):
|
|||||||
if len(data) < 2:
|
if len(data) < 2:
|
||||||
return comments
|
return comments
|
||||||
|
|
||||||
comment_data = data[1]['data']['children']
|
comment_data = data[1]["data"]["children"]
|
||||||
|
|
||||||
def _parse_comment_tree(items, parent_id=None):
|
def _parse_comment_tree(items, parent_id=None):
|
||||||
for item in items:
|
for item in items:
|
||||||
if item['kind'] != 't1':
|
if item["kind"] != "t1":
|
||||||
continue
|
continue
|
||||||
|
|
||||||
comment_info = item['data']
|
comment_info = item["data"]
|
||||||
comment = Comment(
|
comment = Comment(
|
||||||
id=comment_info['id'],
|
id=comment_info["id"],
|
||||||
post_id=post_id,
|
post_id=post_id,
|
||||||
author=comment_info['author'],
|
author=comment_info["author"],
|
||||||
content=comment_info.get('body', ''),
|
content=comment_info.get("body", ""),
|
||||||
timestamp=comment_info['created_utc'],
|
timestamp=comment_info["created_utc"],
|
||||||
reply_to=parent_id or comment_info.get('parent_id', None),
|
reply_to=parent_id or comment_info.get("parent_id", None),
|
||||||
source=self.source_name
|
source=self.source_name,
|
||||||
)
|
)
|
||||||
|
|
||||||
comments.append(comment)
|
comments.append(comment)
|
||||||
|
|
||||||
# Process replies recursively
|
# Process replies recursively
|
||||||
replies = comment_info.get('replies')
|
replies = comment_info.get("replies")
|
||||||
if replies and isinstance(replies, dict):
|
if replies and isinstance(replies, dict):
|
||||||
reply_items = replies.get('data', {}).get('children', [])
|
reply_items = replies.get("data", {}).get("children", [])
|
||||||
_parse_comment_tree(reply_items, parent_id=comment.id)
|
_parse_comment_tree(reply_items, parent_id=comment.id)
|
||||||
|
|
||||||
_parse_comment_tree(comment_data)
|
_parse_comment_tree(comment_data)
|
||||||
return comments
|
return comments
|
||||||
|
|
||||||
def _parse_user(self, data) -> User:
|
def _parse_user(self, data) -> User:
|
||||||
user_data = data['data']
|
user_data = data["data"]
|
||||||
user = User(
|
user = User(username=user_data["name"], created_utc=user_data["created_utc"])
|
||||||
username=user_data['name'],
|
user.karma = user_data["total_karma"]
|
||||||
created_utc=user_data['created_utc'])
|
|
||||||
user.karma = user_data['total_karma']
|
|
||||||
return user
|
return user
|
||||||
|
|
||||||
|
def _get_token(self):
|
||||||
|
if self.token and time.time() < self.token_expiry:
|
||||||
|
return self.token
|
||||||
|
|
||||||
|
logger.info("Fetching new Reddit access token...")
|
||||||
|
|
||||||
|
auth = HTTPBasicAuth(CLIENT_ID, CLIENT_SECRET)
|
||||||
|
|
||||||
|
data = {
|
||||||
|
"grant_type": "client_credentials"
|
||||||
|
}
|
||||||
|
|
||||||
|
headers = {
|
||||||
|
"User-Agent": "python:ethnography-college-project:0.1 (by /u/ThisBirchWood)"
|
||||||
|
}
|
||||||
|
|
||||||
|
response = requests.post(
|
||||||
|
"https://www.reddit.com/api/v1/access_token",
|
||||||
|
auth=auth,
|
||||||
|
data=data,
|
||||||
|
headers=headers,
|
||||||
|
)
|
||||||
|
|
||||||
|
response.raise_for_status()
|
||||||
|
token_json = response.json()
|
||||||
|
|
||||||
|
self.token = token_json["access_token"]
|
||||||
|
self.token_expiry = time.time() + token_json["expires_in"] - 60
|
||||||
|
|
||||||
|
logger.info(
|
||||||
|
f"Obtained new Reddit access token (expires in {token_json['expires_in']}s)"
|
||||||
|
)
|
||||||
|
|
||||||
|
return self.token
|
||||||
|
|
||||||
def _fetch_post_overviews(self, endpoint: str, params: dict) -> dict:
|
def _fetch_post_overviews(self, endpoint: str, params: dict) -> dict:
|
||||||
url = f"{self.url}{endpoint}"
|
url = f"https://oauth.reddit.com/{endpoint.lstrip('/')}"
|
||||||
max_retries = 15
|
max_retries = 15
|
||||||
backoff = 1 # seconds
|
backoff = 1 # seconds
|
||||||
|
|
||||||
for attempt in range(max_retries):
|
for attempt in range(max_retries):
|
||||||
try:
|
try:
|
||||||
response = requests.get(url, headers={'User-agent': 'python:ethnography-college-project:0.1 (by /u/ThisBirchWood)'}, params=params)
|
response = requests.get(
|
||||||
|
url,
|
||||||
|
headers={
|
||||||
|
"User-agent": "python:ethnography-college-project:0.1 (by /u/ThisBirchWood)",
|
||||||
|
"Authorization": f"Bearer {self._get_token()}",
|
||||||
|
},
|
||||||
|
params=params,
|
||||||
|
)
|
||||||
|
|
||||||
if response.status_code == 429:
|
if response.status_code == 429:
|
||||||
wait_time = response.headers.get("Retry-After", backoff)
|
try:
|
||||||
|
wait_time = int(response.headers.get("X-Ratelimit-Reset", backoff))
|
||||||
|
wait_time += 1 # Add a small buffer to ensure the rate limit has reset
|
||||||
|
except ValueError:
|
||||||
|
wait_time = backoff
|
||||||
|
|
||||||
logger.warning(f"Rate limited by Reddit API. Retrying in {wait_time} seconds...")
|
logger.warning(
|
||||||
|
f"Rate limited by Reddit API. Retrying in {wait_time} seconds..."
|
||||||
|
)
|
||||||
|
|
||||||
time.sleep(wait_time)
|
time.sleep(wait_time)
|
||||||
backoff *= 2
|
backoff *= 2
|
||||||
@@ -205,4 +256,4 @@ class RedditAPI(BaseConnector):
|
|||||||
return response.json()
|
return response.json()
|
||||||
except requests.RequestException as e:
|
except requests.RequestException as e:
|
||||||
print(f"Error fetching data from Reddit API: {e}")
|
print(f"Error fetching data from Reddit API: {e}")
|
||||||
return {}
|
return {}
|
||||||
|
|||||||
@@ -3,6 +3,7 @@ import importlib
|
|||||||
import server.connectors
|
import server.connectors
|
||||||
from server.connectors.base import BaseConnector
|
from server.connectors.base import BaseConnector
|
||||||
|
|
||||||
|
|
||||||
def _discover_connectors() -> list[type[BaseConnector]]:
|
def _discover_connectors() -> list[type[BaseConnector]]:
|
||||||
"""Walk the connectors package and collect all BaseConnector subclasses."""
|
"""Walk the connectors package and collect all BaseConnector subclasses."""
|
||||||
for _, module_name, _ in pkgutil.iter_modules(server.connectors.__path__):
|
for _, module_name, _ in pkgutil.iter_modules(server.connectors.__path__):
|
||||||
@@ -11,20 +12,24 @@ def _discover_connectors() -> list[type[BaseConnector]]:
|
|||||||
importlib.import_module(f"server.connectors.{module_name}")
|
importlib.import_module(f"server.connectors.{module_name}")
|
||||||
|
|
||||||
return [
|
return [
|
||||||
cls for cls in BaseConnector.__subclasses__()
|
cls
|
||||||
|
for cls in BaseConnector.__subclasses__()
|
||||||
if cls.source_name # guard against abstract intermediaries
|
if cls.source_name # guard against abstract intermediaries
|
||||||
]
|
]
|
||||||
|
|
||||||
|
|
||||||
def get_available_connectors() -> dict[str, type[BaseConnector]]:
|
def get_available_connectors() -> dict[str, type[BaseConnector]]:
|
||||||
return {c.source_name: c for c in _discover_connectors() if c.is_available()}
|
return {c.source_name: c for c in _discover_connectors() if c.is_available()}
|
||||||
|
|
||||||
|
|
||||||
def get_connector_metadata() -> dict[str, dict]:
|
def get_connector_metadata() -> dict[str, dict]:
|
||||||
res = {}
|
res = {}
|
||||||
for id, obj in get_available_connectors().items():
|
for id, obj in get_available_connectors().items():
|
||||||
res[id] = {"id": id,
|
res[id] = {
|
||||||
"label": obj.display_name,
|
"id": id,
|
||||||
"search_enabled": obj.search_enabled,
|
"label": obj.display_name,
|
||||||
"categories_enabled": obj.categories_enabled
|
"search_enabled": obj.search_enabled,
|
||||||
}
|
"categories_enabled": obj.categories_enabled,
|
||||||
|
}
|
||||||
|
|
||||||
return res
|
return res
|
||||||
|
|||||||
@@ -1,5 +1,6 @@
|
|||||||
import os
|
import os
|
||||||
import datetime
|
import datetime
|
||||||
|
import logging
|
||||||
|
|
||||||
from dotenv import load_dotenv
|
from dotenv import load_dotenv
|
||||||
from googleapiclient.discovery import build
|
from googleapiclient.discovery import build
|
||||||
@@ -9,9 +10,12 @@ from dto.comment import Comment
|
|||||||
from server.connectors.base import BaseConnector
|
from server.connectors.base import BaseConnector
|
||||||
|
|
||||||
load_dotenv()
|
load_dotenv()
|
||||||
|
|
||||||
API_KEY = os.getenv("YOUTUBE_API_KEY")
|
API_KEY = os.getenv("YOUTUBE_API_KEY")
|
||||||
|
|
||||||
|
logger = logging.getLogger(__name__)
|
||||||
|
logger.setLevel(logging.INFO)
|
||||||
|
|
||||||
|
|
||||||
class YouTubeAPI(BaseConnector):
|
class YouTubeAPI(BaseConnector):
|
||||||
source_name: str = "youtube"
|
source_name: str = "youtube"
|
||||||
display_name: str = "YouTube"
|
display_name: str = "YouTube"
|
||||||
@@ -19,73 +23,91 @@ class YouTubeAPI(BaseConnector):
|
|||||||
categories_enabled: bool = False
|
categories_enabled: bool = False
|
||||||
|
|
||||||
def __init__(self):
|
def __init__(self):
|
||||||
self.youtube = build('youtube', 'v3', developerKey=API_KEY)
|
self.youtube = build("youtube", "v3", developerKey=API_KEY)
|
||||||
|
|
||||||
def get_new_posts_by_search(self,
|
def get_new_posts_by_search(
|
||||||
search: str,
|
self, search: str, category: str, post_limit: int
|
||||||
category: str,
|
) -> list[Post]:
|
||||||
post_limit: int
|
videos = self._search_videos(search, post_limit)
|
||||||
) -> list[Post]:
|
posts = []
|
||||||
videos = self._search_videos(search, post_limit)
|
|
||||||
posts = []
|
|
||||||
|
|
||||||
for video in videos:
|
for video in videos:
|
||||||
video_id = video['id']['videoId']
|
video_id = video["id"]["videoId"]
|
||||||
snippet = video['snippet']
|
snippet = video["snippet"]
|
||||||
title = snippet['title']
|
title = snippet["title"]
|
||||||
description = snippet['description']
|
description = snippet["description"]
|
||||||
published_at = datetime.datetime.strptime(snippet['publishedAt'], "%Y-%m-%dT%H:%M:%SZ").timestamp()
|
published_at = datetime.datetime.strptime(
|
||||||
channel_title = snippet['channelTitle']
|
snippet["publishedAt"], "%Y-%m-%dT%H:%M:%SZ"
|
||||||
|
).timestamp()
|
||||||
|
channel_title = snippet["channelTitle"]
|
||||||
|
|
||||||
comments = []
|
comments = []
|
||||||
comments_data = self._get_video_comments(video_id)
|
comments_data = self._get_video_comments(video_id)
|
||||||
for comment_thread in comments_data:
|
for comment_thread in comments_data:
|
||||||
comment_snippet = comment_thread['snippet']['topLevelComment']['snippet']
|
comment_snippet = comment_thread["snippet"]["topLevelComment"][
|
||||||
comment = Comment(
|
"snippet"
|
||||||
id=comment_thread['id'],
|
]
|
||||||
post_id=video_id,
|
comment = Comment(
|
||||||
content=comment_snippet['textDisplay'],
|
id=comment_thread["id"],
|
||||||
author=comment_snippet['authorDisplayName'],
|
post_id=video_id,
|
||||||
timestamp=datetime.datetime.strptime(comment_snippet['publishedAt'], "%Y-%m-%dT%H:%M:%SZ").timestamp(),
|
content=comment_snippet["textDisplay"],
|
||||||
reply_to=None,
|
author=comment_snippet["authorDisplayName"],
|
||||||
source=self.source_name
|
timestamp=datetime.datetime.strptime(
|
||||||
)
|
comment_snippet["publishedAt"], "%Y-%m-%dT%H:%M:%SZ"
|
||||||
|
).timestamp(),
|
||||||
comments.append(comment)
|
reply_to=None,
|
||||||
|
|
||||||
post = Post(
|
|
||||||
id=video_id,
|
|
||||||
content=f"{title}\n\n{description}",
|
|
||||||
author=channel_title,
|
|
||||||
timestamp=published_at,
|
|
||||||
url=f"https://www.youtube.com/watch?v={video_id}",
|
|
||||||
title=title,
|
|
||||||
source=self.source_name,
|
source=self.source_name,
|
||||||
comments=comments
|
|
||||||
)
|
)
|
||||||
|
|
||||||
posts.append(post)
|
comments.append(comment)
|
||||||
|
|
||||||
|
post = Post(
|
||||||
|
id=video_id,
|
||||||
|
content=f"{title}\n\n{description}",
|
||||||
|
author=channel_title,
|
||||||
|
timestamp=published_at,
|
||||||
|
url=f"https://www.youtube.com/watch?v={video_id}",
|
||||||
|
title=title,
|
||||||
|
source=self.source_name,
|
||||||
|
comments=comments,
|
||||||
|
)
|
||||||
|
|
||||||
|
posts.append(post)
|
||||||
|
|
||||||
|
return posts
|
||||||
|
|
||||||
return posts
|
|
||||||
|
|
||||||
def category_exists(self, category):
|
def category_exists(self, category):
|
||||||
return True
|
return True
|
||||||
|
|
||||||
def _search_videos(self, query, limit):
|
def _search_videos(self, query, limit):
|
||||||
request = self.youtube.search().list(
|
results = []
|
||||||
q=query,
|
next_page_token = None
|
||||||
part='snippet',
|
|
||||||
type='video',
|
while len(results) < limit:
|
||||||
maxResults=limit
|
batch_size = min(50, limit - len(results))
|
||||||
)
|
|
||||||
response = request.execute()
|
request = self.youtube.search().list(
|
||||||
return response.get('items', [])
|
q=query,
|
||||||
|
part="snippet",
|
||||||
|
type="video",
|
||||||
|
maxResults=batch_size,
|
||||||
|
pageToken=next_page_token
|
||||||
|
)
|
||||||
|
|
||||||
|
response = request.execute()
|
||||||
|
results.extend(response.get("items", []))
|
||||||
|
logging.info(f"Fetched {len(results)} out of {limit} videos for query '{query}'")
|
||||||
|
|
||||||
|
next_page_token = response.get("nextPageToken")
|
||||||
|
if not next_page_token:
|
||||||
|
logging.warning(f"No more pages of results available for query '{query}'")
|
||||||
|
break
|
||||||
|
|
||||||
|
return results[:limit]
|
||||||
|
|
||||||
def _get_video_comments(self, video_id):
|
def _get_video_comments(self, video_id):
|
||||||
request = self.youtube.commentThreads().list(
|
request = self.youtube.commentThreads().list(
|
||||||
part='snippet',
|
part="snippet", videoId=video_id, textFormat="plainText"
|
||||||
videoId=video_id,
|
|
||||||
textFormat='plainText'
|
|
||||||
)
|
)
|
||||||
|
|
||||||
try:
|
try:
|
||||||
@@ -93,4 +115,4 @@ class YouTubeAPI(BaseConnector):
|
|||||||
except HttpError as e:
|
except HttpError as e:
|
||||||
print(f"Error fetching comments for video {video_id}: {e}")
|
print(f"Error fetching comments for video {video_id}: {e}")
|
||||||
return []
|
return []
|
||||||
return response.get('items', [])
|
return response.get("items", [])
|
||||||
|
|||||||
@@ -5,6 +5,7 @@ from flask_bcrypt import Bcrypt
|
|||||||
|
|
||||||
EMAIL_REGEX = re.compile(r"[^@]+@[^@]+\.[^@]+")
|
EMAIL_REGEX = re.compile(r"[^@]+@[^@]+\.[^@]+")
|
||||||
|
|
||||||
|
|
||||||
class AuthManager:
|
class AuthManager:
|
||||||
def __init__(self, db: PostgresConnector, bcrypt: Bcrypt):
|
def __init__(self, db: PostgresConnector, bcrypt: Bcrypt):
|
||||||
self.db = db
|
self.db = db
|
||||||
@@ -24,13 +25,13 @@ class AuthManager:
|
|||||||
|
|
||||||
if len(username) < 3:
|
if len(username) < 3:
|
||||||
raise ValueError("Username must be longer than 3 characters")
|
raise ValueError("Username must be longer than 3 characters")
|
||||||
|
|
||||||
if not EMAIL_REGEX.match(email):
|
if not EMAIL_REGEX.match(email):
|
||||||
raise ValueError("Please enter a valid email address")
|
raise ValueError("Please enter a valid email address")
|
||||||
|
|
||||||
if self.get_user_by_email(email):
|
if self.get_user_by_email(email):
|
||||||
raise ValueError("Email already registered")
|
raise ValueError("Email already registered")
|
||||||
|
|
||||||
if self.get_user_by_username(username):
|
if self.get_user_by_username(username):
|
||||||
raise ValueError("Username already taken")
|
raise ValueError("Username already taken")
|
||||||
|
|
||||||
@@ -38,20 +39,22 @@ class AuthManager:
|
|||||||
|
|
||||||
def authenticate_user(self, username, password):
|
def authenticate_user(self, username, password):
|
||||||
user = self.get_user_by_username(username)
|
user = self.get_user_by_username(username)
|
||||||
if user and self.bcrypt.check_password_hash(user['password_hash'], password):
|
if user and self.bcrypt.check_password_hash(user["password_hash"], password):
|
||||||
return user
|
return user
|
||||||
return None
|
return None
|
||||||
|
|
||||||
def get_user_by_id(self, user_id):
|
def get_user_by_id(self, user_id):
|
||||||
query = "SELECT id, username, email FROM users WHERE id = %s"
|
query = "SELECT id, username, email FROM users WHERE id = %s"
|
||||||
result = self.db.execute(query, (user_id,), fetch=True)
|
result = self.db.execute(query, (user_id,), fetch=True)
|
||||||
return result[0] if result else None
|
return result[0] if result else None
|
||||||
|
|
||||||
def get_user_by_username(self, username) -> dict:
|
def get_user_by_username(self, username) -> dict:
|
||||||
query = "SELECT id, username, email, password_hash FROM users WHERE username = %s"
|
query = (
|
||||||
|
"SELECT id, username, email, password_hash FROM users WHERE username = %s"
|
||||||
|
)
|
||||||
result = self.db.execute(query, (username,), fetch=True)
|
result = self.db.execute(query, (username,), fetch=True)
|
||||||
return result[0] if result else None
|
return result[0] if result else None
|
||||||
|
|
||||||
def get_user_by_email(self, email) -> dict:
|
def get_user_by_email(self, email) -> dict:
|
||||||
query = "SELECT id, username, email, password_hash FROM users WHERE email = %s"
|
query = "SELECT id, username, email, password_hash FROM users WHERE email = %s"
|
||||||
result = self.db.execute(query, (email,), fetch=True)
|
result = self.db.execute(query, (email,), fetch=True)
|
||||||
|
|||||||
@@ -3,6 +3,7 @@ from server.db.database import PostgresConnector
|
|||||||
from psycopg2.extras import Json
|
from psycopg2.extras import Json
|
||||||
from server.exceptions import NonExistentDatasetException
|
from server.exceptions import NonExistentDatasetException
|
||||||
|
|
||||||
|
|
||||||
class DatasetManager:
|
class DatasetManager:
|
||||||
def __init__(self, db: PostgresConnector):
|
def __init__(self, db: PostgresConnector):
|
||||||
self.db = db
|
self.db = db
|
||||||
@@ -15,18 +16,45 @@ class DatasetManager:
|
|||||||
|
|
||||||
if dataset_info.get("user_id") != user_id:
|
if dataset_info.get("user_id") != user_id:
|
||||||
return False
|
return False
|
||||||
|
|
||||||
return True
|
return True
|
||||||
|
|
||||||
def get_user_datasets(self, user_id: int) -> list[dict]:
|
def get_user_datasets(self, user_id: int) -> list[dict]:
|
||||||
query = "SELECT * FROM datasets WHERE user_id = %s"
|
query = "SELECT * FROM datasets WHERE user_id = %s"
|
||||||
return self.db.execute(query, (user_id, ), fetch=True)
|
return self.db.execute(query, (user_id,), fetch=True)
|
||||||
|
|
||||||
def get_dataset_content(self, dataset_id: int) -> pd.DataFrame:
|
def get_dataset_content(self, dataset_id: int) -> pd.DataFrame:
|
||||||
query = "SELECT * FROM events WHERE dataset_id = %s"
|
query = "SELECT * FROM events WHERE dataset_id = %s"
|
||||||
result = self.db.execute(query, (dataset_id,), fetch=True)
|
result = self.db.execute(query, (dataset_id,), fetch=True)
|
||||||
return pd.DataFrame(result)
|
df = pd.DataFrame(result)
|
||||||
|
if df.empty:
|
||||||
|
return df
|
||||||
|
|
||||||
|
dedupe_columns = [
|
||||||
|
column
|
||||||
|
for column in [
|
||||||
|
"post_id",
|
||||||
|
"parent_id",
|
||||||
|
"reply_to",
|
||||||
|
"author",
|
||||||
|
"type",
|
||||||
|
"timestamp",
|
||||||
|
"dt",
|
||||||
|
"title",
|
||||||
|
"content",
|
||||||
|
"source",
|
||||||
|
"topic",
|
||||||
|
]
|
||||||
|
if column in df.columns
|
||||||
|
]
|
||||||
|
|
||||||
|
if dedupe_columns:
|
||||||
|
df = df.drop_duplicates(subset=dedupe_columns, keep="first")
|
||||||
|
else:
|
||||||
|
df = df.drop_duplicates(keep="first")
|
||||||
|
|
||||||
|
return df.reset_index(drop=True)
|
||||||
|
|
||||||
def get_dataset_info(self, dataset_id: int) -> dict:
|
def get_dataset_info(self, dataset_id: int) -> dict:
|
||||||
query = "SELECT * FROM datasets WHERE id = %s"
|
query = "SELECT * FROM datasets WHERE id = %s"
|
||||||
result = self.db.execute(query, (dataset_id,), fetch=True)
|
result = self.db.execute(query, (dataset_id,), fetch=True)
|
||||||
@@ -35,20 +63,32 @@ class DatasetManager:
|
|||||||
raise NonExistentDatasetException(f"Dataset {dataset_id} does not exist")
|
raise NonExistentDatasetException(f"Dataset {dataset_id} does not exist")
|
||||||
|
|
||||||
return result[0]
|
return result[0]
|
||||||
|
|
||||||
def save_dataset_info(self, user_id: int, dataset_name: str, topics: dict) -> int:
|
def save_dataset_info(self, user_id: int, dataset_name: str, topics: dict) -> int:
|
||||||
query = """
|
query = """
|
||||||
INSERT INTO datasets (user_id, name, topics)
|
INSERT INTO datasets (user_id, name, topics)
|
||||||
VALUES (%s, %s, %s)
|
VALUES (%s, %s, %s)
|
||||||
RETURNING id
|
RETURNING id
|
||||||
"""
|
"""
|
||||||
result = self.db.execute(query, (user_id, dataset_name, Json(topics)), fetch=True)
|
result = self.db.execute(
|
||||||
|
query, (user_id, dataset_name, Json(topics)), fetch=True
|
||||||
|
)
|
||||||
return result[0]["id"] if result else None
|
return result[0]["id"] if result else None
|
||||||
|
|
||||||
def save_dataset_content(self, dataset_id: int, event_data: pd.DataFrame):
|
def save_dataset_content(self, dataset_id: int, event_data: pd.DataFrame):
|
||||||
if event_data.empty:
|
if event_data.empty:
|
||||||
return
|
return
|
||||||
|
|
||||||
|
dedupe_columns = [
|
||||||
|
column for column in ["id", "type", "source"] if column in event_data.columns
|
||||||
|
]
|
||||||
|
if dedupe_columns:
|
||||||
|
event_data = event_data.drop_duplicates(subset=dedupe_columns, keep="first")
|
||||||
|
else:
|
||||||
|
event_data = event_data.drop_duplicates(keep="first")
|
||||||
|
|
||||||
|
self.delete_dataset_content(dataset_id)
|
||||||
|
|
||||||
query = """
|
query = """
|
||||||
INSERT INTO events (
|
INSERT INTO events (
|
||||||
dataset_id,
|
dataset_id,
|
||||||
@@ -113,7 +153,9 @@ class DatasetManager:
|
|||||||
|
|
||||||
self.db.execute_batch(query, values)
|
self.db.execute_batch(query, values)
|
||||||
|
|
||||||
def set_dataset_status(self, dataset_id: int, status: str, status_message: str | None = None):
|
def set_dataset_status(
|
||||||
|
self, dataset_id: int, status: str, status_message: str | None = None
|
||||||
|
):
|
||||||
if status not in ["fetching", "processing", "complete", "error"]:
|
if status not in ["fetching", "processing", "complete", "error"]:
|
||||||
raise ValueError("Invalid status")
|
raise ValueError("Invalid status")
|
||||||
|
|
||||||
@@ -137,24 +179,24 @@ class DatasetManager:
|
|||||||
WHERE id = %s
|
WHERE id = %s
|
||||||
"""
|
"""
|
||||||
|
|
||||||
result = self.db.execute(query, (dataset_id, ), fetch=True)
|
result = self.db.execute(query, (dataset_id,), fetch=True)
|
||||||
|
|
||||||
if not result:
|
if not result:
|
||||||
print(result)
|
print(result)
|
||||||
raise NonExistentDatasetException(f"Dataset {dataset_id} does not exist")
|
raise NonExistentDatasetException(f"Dataset {dataset_id} does not exist")
|
||||||
|
|
||||||
return result[0]
|
return result[0]
|
||||||
|
|
||||||
def update_dataset_name(self, dataset_id: int, new_name: str):
|
def update_dataset_name(self, dataset_id: int, new_name: str):
|
||||||
query = "UPDATE datasets SET name = %s WHERE id = %s"
|
query = "UPDATE datasets SET name = %s WHERE id = %s"
|
||||||
self.db.execute(query, (new_name, dataset_id))
|
self.db.execute(query, (new_name, dataset_id))
|
||||||
|
|
||||||
def delete_dataset_info(self, dataset_id: int):
|
def delete_dataset_info(self, dataset_id: int):
|
||||||
query = "DELETE FROM datasets WHERE id = %s"
|
query = "DELETE FROM datasets WHERE id = %s"
|
||||||
|
|
||||||
self.db.execute(query, (dataset_id, ))
|
self.db.execute(query, (dataset_id,))
|
||||||
|
|
||||||
def delete_dataset_content(self, dataset_id: int):
|
def delete_dataset_content(self, dataset_id: int):
|
||||||
query = "DELETE FROM events WHERE dataset_id = %s"
|
query = "DELETE FROM events WHERE dataset_id = %s"
|
||||||
|
|
||||||
self.db.execute(query, (dataset_id, ))
|
self.db.execute(query, (dataset_id,))
|
||||||
|
|||||||
@@ -1,8 +1,17 @@
|
|||||||
import os
|
import os
|
||||||
import psycopg2
|
import psycopg2
|
||||||
|
import os
|
||||||
|
from dotenv import load_dotenv
|
||||||
from psycopg2.extras import RealDictCursor
|
from psycopg2.extras import RealDictCursor
|
||||||
from psycopg2.extras import execute_batch
|
from psycopg2.extras import execute_batch
|
||||||
|
|
||||||
|
load_dotenv()
|
||||||
|
postgres_host = os.getenv("POSTGRES_HOST", "localhost")
|
||||||
|
postgres_port = os.getenv("POSTGRES_PORT", 5432)
|
||||||
|
postgres_user = os.getenv("POSTGRES_USER", "postgres")
|
||||||
|
postgres_password = os.getenv("POSTGRES_PASSWORD", "postgres")
|
||||||
|
postgres_db = os.getenv("POSTGRES_DB", "postgres")
|
||||||
|
|
||||||
from server.exceptions import DatabaseNotConfiguredException
|
from server.exceptions import DatabaseNotConfiguredException
|
||||||
|
|
||||||
|
|
||||||
@@ -15,15 +24,17 @@ class PostgresConnector:
|
|||||||
|
|
||||||
try:
|
try:
|
||||||
self.connection = psycopg2.connect(
|
self.connection = psycopg2.connect(
|
||||||
host=os.getenv("POSTGRES_HOST", "localhost"),
|
host=postgres_host,
|
||||||
port=os.getenv("POSTGRES_PORT", 5432),
|
port=postgres_port,
|
||||||
user=os.getenv("POSTGRES_USER", "postgres"),
|
user=postgres_user,
|
||||||
password=os.getenv("POSTGRES_PASSWORD", "postgres"),
|
password=postgres_password,
|
||||||
database=os.getenv("POSTGRES_DB", "postgres"),
|
database=postgres_db,
|
||||||
)
|
)
|
||||||
except psycopg2.OperationalError as e:
|
except psycopg2.OperationalError as e:
|
||||||
raise DatabaseNotConfiguredException(f"Ensure database is up and running: {e}")
|
raise DatabaseNotConfiguredException(
|
||||||
|
f"Ensure database is up and running: {e}"
|
||||||
|
)
|
||||||
|
|
||||||
self.connection.autocommit = False
|
self.connection.autocommit = False
|
||||||
|
|
||||||
def execute(self, query, params=None, fetch=False) -> list:
|
def execute(self, query, params=None, fetch=False) -> list:
|
||||||
@@ -48,4 +59,4 @@ class PostgresConnector:
|
|||||||
|
|
||||||
def close(self):
|
def close(self):
|
||||||
if self.connection:
|
if self.connection:
|
||||||
self.connection.close()
|
self.connection.close()
|
||||||
|
|||||||
@@ -1,16 +1,23 @@
|
|||||||
from celery import Celery
|
from celery import Celery
|
||||||
|
from dotenv import load_dotenv
|
||||||
|
from server.utils import get_env
|
||||||
|
|
||||||
|
load_dotenv()
|
||||||
|
REDIS_URL = get_env("REDIS_URL")
|
||||||
|
|
||||||
|
|
||||||
def create_celery():
|
def create_celery():
|
||||||
celery = Celery(
|
celery = Celery(
|
||||||
"ethnograph",
|
"ethnograph",
|
||||||
broker="redis://redis:6379/0",
|
broker=REDIS_URL,
|
||||||
backend="redis://redis:6379/0",
|
backend=REDIS_URL,
|
||||||
)
|
)
|
||||||
celery.conf.task_serializer = "json"
|
celery.conf.task_serializer = "json"
|
||||||
celery.conf.result_serializer = "json"
|
celery.conf.result_serializer = "json"
|
||||||
celery.conf.accept_content = ["json"]
|
celery.conf.accept_content = ["json"]
|
||||||
return celery
|
return celery
|
||||||
|
|
||||||
|
|
||||||
celery = create_celery()
|
celery = create_celery()
|
||||||
|
|
||||||
from server.queue import tasks
|
from server.queue import tasks
|
||||||
|
|||||||
@@ -1,3 +1,5 @@
|
|||||||
|
from time import time
|
||||||
|
|
||||||
import pandas as pd
|
import pandas as pd
|
||||||
import logging
|
import logging
|
||||||
|
|
||||||
@@ -9,6 +11,7 @@ from server.connectors.registry import get_available_connectors
|
|||||||
|
|
||||||
logger = logging.getLogger(__name__)
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
|
|
||||||
@celery.task(bind=True, max_retries=3)
|
@celery.task(bind=True, max_retries=3)
|
||||||
def process_dataset(self, dataset_id: int, posts: list, topics: dict):
|
def process_dataset(self, dataset_id: int, posts: list, topics: dict):
|
||||||
db = PostgresConnector()
|
db = PostgresConnector()
|
||||||
@@ -17,19 +20,27 @@ def process_dataset(self, dataset_id: int, posts: list, topics: dict):
|
|||||||
try:
|
try:
|
||||||
df = pd.DataFrame(posts)
|
df = pd.DataFrame(posts)
|
||||||
|
|
||||||
|
dataset_manager.set_dataset_status(
|
||||||
|
dataset_id, "processing", "NLP Processing Started"
|
||||||
|
)
|
||||||
|
|
||||||
processor = DatasetEnrichment(df, topics)
|
processor = DatasetEnrichment(df, topics)
|
||||||
enriched_df = processor.enrich()
|
enriched_df = processor.enrich()
|
||||||
|
|
||||||
dataset_manager.save_dataset_content(dataset_id, enriched_df)
|
dataset_manager.save_dataset_content(dataset_id, enriched_df)
|
||||||
dataset_manager.set_dataset_status(dataset_id, "complete", "NLP Processing Completed Successfully")
|
dataset_manager.set_dataset_status(
|
||||||
|
dataset_id, "complete", "NLP Processing Completed Successfully"
|
||||||
|
)
|
||||||
except Exception as e:
|
except Exception as e:
|
||||||
dataset_manager.set_dataset_status(dataset_id, "error", f"An error occurred: {e}")
|
dataset_manager.set_dataset_status(
|
||||||
|
dataset_id, "error", f"An error occurred: {e}"
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
@celery.task(bind=True, max_retries=3)
|
@celery.task(bind=True, max_retries=3)
|
||||||
def fetch_and_process_dataset(self,
|
def fetch_and_process_dataset(
|
||||||
dataset_id: int,
|
self, dataset_id: int, source_info: list[dict], topics: dict
|
||||||
source_info: list[dict],
|
):
|
||||||
topics: dict):
|
|
||||||
connectors = get_available_connectors()
|
connectors = get_available_connectors()
|
||||||
db = PostgresConnector()
|
db = PostgresConnector()
|
||||||
dataset_manager = DatasetManager(db)
|
dataset_manager = DatasetManager(db)
|
||||||
@@ -37,6 +48,7 @@ def fetch_and_process_dataset(self,
|
|||||||
|
|
||||||
try:
|
try:
|
||||||
for metadata in source_info:
|
for metadata in source_info:
|
||||||
|
fetch_start = time()
|
||||||
name = metadata["name"]
|
name = metadata["name"]
|
||||||
search = metadata.get("search")
|
search = metadata.get("search")
|
||||||
category = metadata.get("category")
|
category = metadata.get("category")
|
||||||
@@ -44,18 +56,29 @@ def fetch_and_process_dataset(self,
|
|||||||
|
|
||||||
connector = connectors[name]()
|
connector = connectors[name]()
|
||||||
raw_posts = connector.get_new_posts_by_search(
|
raw_posts = connector.get_new_posts_by_search(
|
||||||
search=search,
|
search=search, category=category, post_limit=limit
|
||||||
category=category,
|
|
||||||
post_limit=limit
|
|
||||||
)
|
)
|
||||||
posts.extend(post.to_dict() for post in raw_posts)
|
posts.extend(post.to_dict() for post in raw_posts)
|
||||||
|
|
||||||
|
fetch_time = time() - fetch_start
|
||||||
df = pd.DataFrame(posts)
|
df = pd.DataFrame(posts)
|
||||||
|
|
||||||
|
nlp_start = time()
|
||||||
|
|
||||||
|
dataset_manager.set_dataset_status(
|
||||||
|
dataset_id, "processing", "NLP Processing Started"
|
||||||
|
)
|
||||||
|
|
||||||
processor = DatasetEnrichment(df, topics)
|
processor = DatasetEnrichment(df, topics)
|
||||||
enriched_df = processor.enrich()
|
enriched_df = processor.enrich()
|
||||||
|
|
||||||
|
nlp_time = time() - nlp_start
|
||||||
|
|
||||||
dataset_manager.save_dataset_content(dataset_id, enriched_df)
|
dataset_manager.save_dataset_content(dataset_id, enriched_df)
|
||||||
dataset_manager.set_dataset_status(dataset_id, "complete", "NLP Processing Completed Successfully")
|
dataset_manager.set_dataset_status(
|
||||||
|
dataset_id, "complete", f"Completed Successfully. Fetch time: {fetch_time:.2f}s, NLP time: {nlp_time:.2f}s"
|
||||||
|
)
|
||||||
except Exception as e:
|
except Exception as e:
|
||||||
dataset_manager.set_dataset_status(dataset_id, "error", f"An error occurred: {e}")
|
dataset_manager.set_dataset_status(
|
||||||
|
dataset_id, "error", f"An error occurred: {e}"
|
||||||
|
)
|
||||||
|
|||||||