Commit Graph

413 Commits

Author SHA1 Message Date
162a4de64e fix(frontend): detects which sources support category or search 2026-03-12 10:07:28 +00:00
6684780d23 fix(connectors): add stronger validation to scrape endpoint
Strong validation needed, otherwise data goes to Celery and crashes silently. In addition it checks if that specific source supports search or category.
2026-03-12 09:59:07 +00:00
c12f1b4371 chore(connectors): add category and search validation fields 2026-03-12 09:56:34 +00:00
01d6bd0164 fix(connectors): category / search fields breaking
Ideally category and search are fully optional, however some sites break if one or the other is not provided.

Unfortuntely `boards.ie` has a different page type for searches and I'm not bothered to implement a scraper from scratch.

In addition, removed comment limit options.
2026-03-11 21:16:26 +00:00
12cbc24074 chore(utils): remove split_limit function 2026-03-11 19:47:44 +00:00
0658713f42 chore: remove unused dataset creation script 2026-03-11 19:44:38 +00:00
b2ae1a9f70 feat(frontend): add page for scraping endpoint 2026-03-11 19:41:34 +00:00
eff416c34e fix(connectors): hardcoded source name in Youtube connector 2026-03-10 23:36:09 +00:00
524c9c50a0 fix(api): incorrect dataset status update message 2026-03-10 23:28:21 +00:00
2ab74d922a feat(api): support per-source search, category and limit configuration 2026-03-10 23:15:33 +00:00
d520e2af98 fix(auth): missing email and username business rules 2026-03-10 22:48:04 +00:00
8fe84a30f6 fix: data leak when opening topics file 2026-03-10 22:45:07 +00:00
dc330b87b9 fix(celery): process dataset directly in fetch task
Calling the original `process_dataset` function led to issues with JSON serialisation.
2026-03-10 22:17:00 +00:00
7ccc934f71 build: change celery to debug mode 2026-03-10 22:14:45 +00:00
a3dbe04a57 fix(frontend): option to delete dataset not shown after fail 2026-03-10 19:23:48 +00:00
a65c4a461c fix(api): flask delegates dataset fetch to celery 2026-03-10 19:17:41 +00:00
15704a0782 chore(db): update db schema to include "fetching" status 2026-03-10 19:17:08 +00:00
6ec47256d0 feat(api): add database scraping endpoints 2026-03-10 19:04:33 +00:00
2572664e26 chore(utils): add env getter that fails if env not found 2026-03-10 18:50:53 +00:00
17bd4702b2 fix(connectors): connector detectors returning name of ID alongside connector obj 2026-03-10 18:36:40 +00:00
53cb5c2ea5 feat(topics): add generalised topic list
This is easier and quicker compared to deriving a topics list based on the dataset that has been scraped.

While using LLMs to create a personalised topic list based on the query, category or dataset itself would yield better results for most, it is beyond the scope of this project.
2026-03-10 18:36:08 +00:00
0866dda8b3 chore: add util to always split evenly 2026-03-10 18:25:05 +00:00
5ccb2e73cd fix(connectors): incorrect registry location
Registry paths were using the incorrect connector path locations.
2026-03-10 18:18:42 +00:00
2a8d7c7972 refactor(connectors): Youtube & Reddit connectors implement BaseConnector 2026-03-10 18:11:33 +00:00
e7a8c17be4 chore(connectors): add base connector inheritance 2026-03-10 18:08:01 +00:00
cc799f7368 feat(connectors): add base connector and registry for detection
Idea is to have a "plugin-type" system, where new connectors can extend the `BaseConnector` class and implement the fetch posts method.

These are automatically detected by the registry, and automatically used in new Flask endpoints that give a list of possible sources.

Allows for an open-ended system where new data scrapers / API consumers can be added dynamically.
2026-03-09 21:29:03 +00:00
262a70dbf3 refactor(api): rename /upload endpoint
Ensures consistency with the other dataset-based endpoints and follows the REST-API rules more cleanly.
2026-03-09 20:55:12 +00:00
ca444e9cb0 refactor: move connectors to backend dir
They will now be more used in the backend.
2026-03-09 20:53:13 +00:00
738af5415b Merge pull request 'Editable and removable datasets' (#8) from feat/editable-datasets into main
Reviewed-on: #8
2026-03-05 16:55:48 +00:00
2b14a8a417 feat(frontend): add deletion modal confirmation box 2026-03-05 12:29:53 +00:00
a154b25415 fix(db): missing rollback on execute_batch method
Arguably more important on a batch function to have rollback.
2026-03-05 10:09:14 +00:00
eb273efe61 Merge remote-tracking branch 'origin/main' into feat/editable-datasets 2026-03-04 22:34:55 +00:00
a9001c79e1 build: add frontend to main docker compose
Forgot to add this earlier
2026-03-04 22:34:32 +00:00
eec8f2417e feat(frontend): add ability to delete datasets 2026-03-04 22:32:19 +00:00
f5835b5a97 feat(frontend): add frontend option to change name 2026-03-04 22:17:31 +00:00
64e3f9eea8 feat: implement PATCH dataset route
At the moment only allows for the updating of the name. Which seems to be the only editable part of dataset metadata.
2026-03-04 21:38:06 +00:00
4f01bf0419 fix(db): incorrect SQL condition when deleting dataset content 2026-03-04 21:35:10 +00:00
6948891677 Merge remote-tracking branch 'origin/main' into feat/editable-datasets 2026-03-04 21:30:13 +00:00
f1f33e2fe4 feat: implement delete dataset route 2026-03-04 21:29:01 +00:00
e20d0689e8 fix(celery): adjust try-catch logic to improve error handling
Capturing the instantiation of the database and dataset manager objects inside the try-catch will cause errors if something else fails.

If an exception occurs and the dataset_manager is not initialised, the code inside the catch block will fail.
2026-03-04 21:18:59 +00:00
fcdac6f3bb Merge pull request 'Fix the frontend API calls and implement logins on frontend' (#7) from feat/update-frontend-api-calls into main
Reviewed-on: #7
2026-03-04 20:20:50 +00:00
5fc1f1532f feat(user stats): updated styling and stats in user page
Interaction graph was taking up too much space and was the only thing on the screen. Further statistics were added however these may be removed in favour of more informative statistics
2026-03-04 20:20:34 +00:00
24277e0104 fix(frontend): move loading card higher up
Looks weird lower down on the screen
2026-03-04 20:09:55 +00:00
4e99b77492 fix(db): missing post ID in db schema
Caused surprisingly little errors. It only broke the interaction graph.
2026-03-04 20:05:20 +00:00
b6815c490a feat: add loading page for when dataset is loading
Originally there was a simple "Loading" text, however this looked bad and might lead a user to think that the page had frozen.

There is now a more comprehensive loading animation which users might be happy to sit on for a few minutes.
2026-03-04 18:39:20 +00:00
29c90ddfff feat: update name on topbar
Crosspost Analysis Engine sounds far cooler than "Ethnograph View"
2026-03-04 18:37:48 +00:00
3fe08b9c67 fix(backend): buggy reply_time_by_emotion metric
This metric was never stastically significant and held no real value. It also so happened to hold accidental NaN values in the dataframe which broke the frontend.

Happy to remove.
2026-03-04 18:37:11 +00:00
f9bc9cf9c9 fix: remove Datasets tab when not logged in 2026-03-03 20:32:33 +00:00
249528bb5c feat(frontend): remove "Upload" and "Last Stats" page
These are redundant and clunky, everything can be accessed from the Dataset tab
2026-03-03 20:30:42 +00:00
bd0e1a9050 refactor(frontend): move stylings out of logic into centralized file 2026-03-03 20:28:23 +00:00