Commit Graph

192 Commits

Author SHA1 Message Date
2ab74d922a feat(api): support per-source search, category and limit configuration 2026-03-10 23:15:33 +00:00
d520e2af98 fix(auth): missing email and username business rules 2026-03-10 22:48:04 +00:00
8fe84a30f6 fix: data leak when opening topics file 2026-03-10 22:45:07 +00:00
dc330b87b9 fix(celery): process dataset directly in fetch task
Calling the original `process_dataset` function led to issues with JSON serialisation.
2026-03-10 22:17:00 +00:00
a65c4a461c fix(api): flask delegates dataset fetch to celery 2026-03-10 19:17:41 +00:00
15704a0782 chore(db): update db schema to include "fetching" status 2026-03-10 19:17:08 +00:00
6ec47256d0 feat(api): add database scraping endpoints 2026-03-10 19:04:33 +00:00
2572664e26 chore(utils): add env getter that fails if env not found 2026-03-10 18:50:53 +00:00
17bd4702b2 fix(connectors): connector detectors returning name of ID alongside connector obj 2026-03-10 18:36:40 +00:00
53cb5c2ea5 feat(topics): add generalised topic list
This is easier and quicker compared to deriving a topics list based on the dataset that has been scraped.

While using LLMs to create a personalised topic list based on the query, category or dataset itself would yield better results for most, it is beyond the scope of this project.
2026-03-10 18:36:08 +00:00
0866dda8b3 chore: add util to always split evenly 2026-03-10 18:25:05 +00:00
5ccb2e73cd fix(connectors): incorrect registry location
Registry paths were using the incorrect connector path locations.
2026-03-10 18:18:42 +00:00
2a8d7c7972 refactor(connectors): Youtube & Reddit connectors implement BaseConnector 2026-03-10 18:11:33 +00:00
e7a8c17be4 chore(connectors): add base connector inheritance 2026-03-10 18:08:01 +00:00
cc799f7368 feat(connectors): add base connector and registry for detection
Idea is to have a "plugin-type" system, where new connectors can extend the `BaseConnector` class and implement the fetch posts method.

These are automatically detected by the registry, and automatically used in new Flask endpoints that give a list of possible sources.

Allows for an open-ended system where new data scrapers / API consumers can be added dynamically.
2026-03-09 21:29:03 +00:00
262a70dbf3 refactor(api): rename /upload endpoint
Ensures consistency with the other dataset-based endpoints and follows the REST-API rules more cleanly.
2026-03-09 20:55:12 +00:00
ca444e9cb0 refactor: move connectors to backend dir
They will now be more used in the backend.
2026-03-09 20:53:13 +00:00
a154b25415 fix(db): missing rollback on execute_batch method
Arguably more important on a batch function to have rollback.
2026-03-05 10:09:14 +00:00
f5835b5a97 feat(frontend): add frontend option to change name 2026-03-04 22:17:31 +00:00
64e3f9eea8 feat: implement PATCH dataset route
At the moment only allows for the updating of the name. Which seems to be the only editable part of dataset metadata.
2026-03-04 21:38:06 +00:00
4f01bf0419 fix(db): incorrect SQL condition when deleting dataset content 2026-03-04 21:35:10 +00:00
6948891677 Merge remote-tracking branch 'origin/main' into feat/editable-datasets 2026-03-04 21:30:13 +00:00
f1f33e2fe4 feat: implement delete dataset route 2026-03-04 21:29:01 +00:00
e20d0689e8 fix(celery): adjust try-catch logic to improve error handling
Capturing the instantiation of the database and dataset manager objects inside the try-catch will cause errors if something else fails.

If an exception occurs and the dataset_manager is not initialised, the code inside the catch block will fail.
2026-03-04 21:18:59 +00:00
4e99b77492 fix(db): missing post ID in db schema
Caused surprisingly little errors. It only broke the interaction graph.
2026-03-04 20:05:20 +00:00
3fe08b9c67 fix(backend): buggy reply_time_by_emotion metric
This metric was never stastically significant and held no real value. It also so happened to hold accidental NaN values in the dataframe which broke the frontend.

Happy to remove.
2026-03-04 18:37:11 +00:00
e2ac4495fd chore(frontend): add extra types to frontend 2026-03-03 20:13:13 +00:00
207c4b67da feat(frontend): add dataset name requirements to the upload page 2026-03-03 17:28:46 +00:00
772205d3df feat(api): add ability to fetch all datasets by a user 2026-03-03 17:25:00 +00:00
5310568631 feat: add React layout and a topbar allowing for easy logins 2026-03-03 17:17:57 +00:00
9d1e8960fc perf: update cultural analysis to use regex instead of Counter 2026-03-03 14:25:25 +00:00
eb4187c559 feat(api): add status returns for NLP processing 2026-03-03 13:46:37 +00:00
63cd465189 feat(db): add status and constraints to the schema 2026-03-03 13:46:06 +00:00
f93e45b827 fix(dataset): silent erros if dataset did not exist 2026-03-03 13:13:40 +00:00
075e1fba85 fix: typo in exception naming 2026-03-03 13:12:28 +00:00
a4c527ce5b fix(db): execute not committing if fetch flag was set 2026-03-03 13:10:50 +00:00
3772f83d11 fix: add title column to db
This was accidentally removed in a previous merge
2026-03-03 12:41:02 +00:00
3a58705635 feat: add celery & redis for background data processing 2026-03-03 12:27:14 +00:00
6248b32ce2 refactor: move app.py into main server dir 2026-03-03 11:14:51 +00:00
87bdc0245a refactor: move core files into separate dirs 2026-03-03 11:13:33 +00:00
8b8462fd58 chore: add non-existent database error check 2026-03-03 11:11:10 +00:00
36bede42d9 style: clean up imports 2026-03-03 11:08:56 +00:00
4bec0dd32c refactor: extract dataset functionality out of db class 2026-03-02 19:18:05 +00:00
4961ddc349 refactor: move db dir into server 2026-03-02 19:05:56 +00:00
c9151da643 feat: add custom error for non-existent dataset 2026-03-02 18:59:31 +00:00
18c8539646 fix: server error when attmepting to access non-existant dataset 2026-03-02 18:55:27 +00:00
6d8f2fa4e0 feat: add custom exceptions file 2026-03-02 18:54:11 +00:00
5ea71023b5 refactor: move query parameter extraction function out of flask app 2026-03-02 18:29:09 +00:00
37cb2c9ff4 feat(querying): make filters stateless
Stateless filters are required as the server cannot store them in the StatGen object
2026-03-02 16:18:02 +00:00
82a98f84bd refactor: combine query results into one endpoint 2026-03-01 19:06:49 +00:00