Commit Graph

367 Commits

Author SHA1 Message Date
6378015726 fix(stats): remove duplicated entries in corpus explorer 2026-04-01 00:22:29 +01:00
430793cd09 feat(frontend): add "show more" functionality to corpus explorer 2026-04-01 00:09:20 +01:00
b270ed03ae feat(frontend): implement corpus explorer
This allows you to view the posts & comments associated with a specific aggregate.
2026-04-01 00:04:25 +01:00
1dde5f7b08 fix(nlp): fix missing processing dataset status update 2026-03-31 20:59:09 +01:00
a841c6f6a1 perf(stats): memoize derived state and reduce intermediate allocations 2026-03-31 20:15:07 +01:00
2045ccebb5 build(docker): update CMD to include host binding 2026-03-31 19:31:58 +01:00
efb4c8384d chore(stats): remove average_thread_depth 2026-03-31 16:40:54 +01:00
75fd042d74 feat(api): add support for custom topic lists when autoscraping 2026-03-31 13:36:37 +01:00
e776ef53ac refactor(database): configurable database source 2026-03-29 21:30:18 +01:00
f996b38fa5 fix(report): remove unicode char 2026-03-25 19:46:29 +00:00
6d8ae3e811 docs: add section on Topic Modelling in NLP 2026-03-25 19:44:14 +00:00
376773a0cc style: run python linter & prettifier on backend code 2026-03-25 19:34:43 +00:00
aae10c4d9d style: run prettifier plugin on entire frontend 2026-03-25 19:30:21 +00:00
8730af146d chore: remove main.py
Not used anymore.
2026-03-22 14:41:47 +00:00
7716ee0bff build(env): extract Redis URL into env file
This could allow one to connect to a remote Redis instance with a powerful GPU, allowing one to offload the NLP work.
2026-03-22 14:41:15 +00:00
97e897c240 fix(analysis): broken entity handling in cultural endpoint 2026-03-22 14:34:05 +00:00
c3762f189c build(docker): comment out GPU deployment configuration from worker service
While this works for NVIDIA GPUs, it breaks on a MacBook or any non-NVIDIA machine. I commented it out because it's still useful on these machines.
2026-03-22 13:34:51 +00:00
078716754c feat(report): add main.tex for project documentation and analysis 2026-03-21 23:54:42 +00:00
e43eae5afd fix(frontend): missing "fetching" status from auto-scrape
When auto-scraping, the dataset status page would say "Dataset Ready" when it was still fetching.
2026-03-21 22:49:16 +00:00
b537b5ef16 docs: update .gitignore 2026-03-21 19:24:51 +00:00
acc591ff1e Merge pull request 'Finish off the links between frontend and backend' (#10) from feat/add-frontend-pages into main
Reviewed-on: #10
2026-03-18 20:30:19 +00:00
e054997bb1 feat(frontend): reword CulturalStats to improve understandability 2026-03-18 19:23:35 +00:00
e5414befa7 feat(frontend): add dominant emotion display to UserModal 2026-03-18 19:12:25 +00:00
86926898ce feat(frontend): improve labels to be more understandable 2026-03-18 19:12:11 +00:00
b1177540a1 feat(frontend): enhance EmotionalStats component with detailed mood analysis 2026-03-18 19:11:18 +00:00
f604fcc531 feat(frontend): add warning message for scraping limits 2026-03-18 19:02:11 +00:00
b7aec2b0ea feat(frontend): add favicon
Credit goes to `srip` on flaticon for the image.
2026-03-18 19:00:31 +00:00
1446dd176d feat(frontend): center page selection 2026-03-18 18:53:14 +00:00
c215024ef2 feat(frontend): add deleted user filter
Reddit often contains "[Deleted]" when a user is banned or deletes their post/comment. Keeping the backend faithful to the original dataset is important so the filtering is being done on the frontend.
2026-03-18 18:50:51 +00:00
17ef42e548 feat!(frontend): add cultural, interactional and linguistic stat pages 2026-03-18 18:43:49 +00:00
7e4a91bb5e style(frontend): style api types to be in order of the endpoint 2026-03-18 18:40:39 +00:00
436549641f chore(frontend): add api types for new backend data 2026-03-18 18:37:39 +00:00
3e78a54388 feat(stat): add conversation concentration metric
Remove old `initiator_ratio` metric which wasn't working due every event having a `reply_to` value.

This metric was suggested by AI, and is a surprisingly interesting one that gave interesting insights.
2026-03-18 18:36:09 +00:00
71998c450e fix(db): change title type to text
Occasionally a Reddit post would have a long title, and would break in the schema.
2026-03-17 19:49:03 +00:00
2a00384a55 feat(interaction): add top interaction pairs and initiator ratio methods 2026-03-17 19:03:56 +00:00
8372aa7278 feat(api): add endpoint to view entire dataset 2026-03-17 13:36:41 +00:00
7b5a939271 fix(stats): missing private methods in User obj 2026-03-17 13:36:10 +00:00
2fa1dff4b7 feat(stat): add lexical diversity stat 2026-03-17 13:27:49 +00:00
31fb275ee3 fix(db): incorrect NER column being inserted 2026-03-17 12:53:30 +00:00
8a0f6e71e8 chore(api): rename cultural entity emotion endpoint 2026-03-17 12:31:53 +00:00
9093059d05 refactor(stats): move user stats out of interactional into users 2026-03-17 12:23:03 +00:00
8a13444b16 chore(frontend): add new API types 2026-03-16 16:46:07 +00:00
3468fdc2ea feat(api): add new user and linguistic endpoints 2026-03-16 16:45:11 +00:00
09a4f9036f refactor(stats): add summary and user stat classes for consistency 2026-03-16 16:43:24 +00:00
97fccd073b feat(emotional): add average emotion & dominant emotion stats 2026-03-16 16:41:28 +00:00
94befb61c5 Merge pull request 'Automatic Scraping of dataset options' (#9) from feat/automatic-scraping-datasets into main
Reviewed-on: #9
2026-03-14 21:58:49 +00:00
12f5953146 fix(api): remove error exceptions in API responses
Mainly a security thing, we don't want actual code errors being given in the API response, as someone could find out how the inner workings of the code behaves.
2026-03-14 21:58:00 +00:00
5b0441c34b fix(connector): unnecessary comment limits
In addition, I made some methods private to better align with the BaseConnector parent class.
2026-03-14 21:53:13 +00:00
d2b919cd66 fix(api): enforce integer limit and cap at 1000 in scrape_data function 2026-03-14 17:35:05 +00:00
062937ec3c fix(api): incorrect validation on search 2026-03-14 17:12:02 +00:00