Commit Graph

425 Commits

Author SHA1 Message Date
3468fdc2ea feat(api): add new user and linguistic endpoints 2026-03-16 16:45:11 +00:00
09a4f9036f refactor(stats): add summary and user stat classes for consistency 2026-03-16 16:43:24 +00:00
97fccd073b feat(emotional): add average emotion & dominant emotion stats 2026-03-16 16:41:28 +00:00
94befb61c5 Merge pull request 'Automatic Scraping of dataset options' (#9) from feat/automatic-scraping-datasets into main
Reviewed-on: #9
2026-03-14 21:58:49 +00:00
12f5953146 fix(api): remove error exceptions in API responses
Mainly a security thing, we don't want actual code errors being given in the API response, as someone could find out how the inner workings of the code behaves.
2026-03-14 21:58:00 +00:00
5b0441c34b fix(connector): unnecessary comment limits
In addition, I made some methods private to better align with the BaseConnector parent class.
2026-03-14 21:53:13 +00:00
d2b919cd66 fix(api): enforce integer limit and cap at 1000 in scrape_data function 2026-03-14 17:35:05 +00:00
062937ec3c fix(api): incorrect validation on search 2026-03-14 17:12:02 +00:00
2a00795cc2 chore(connectors): implement category_exists for Boards API 2026-03-14 17:11:49 +00:00
c990f29645 fix(frontend): misaligned loading page for datasets 2026-03-14 17:05:46 +00:00
8a423b2a29 feat(connectors): implement category validation in scraping process 2026-03-14 16:59:43 +00:00
d96f459104 fix(connectors): update URL references to use base_url in BoardsAPI 2026-03-13 21:59:17 +00:00
162a4de64e fix(frontend): detects which sources support category or search 2026-03-12 10:07:28 +00:00
6684780d23 fix(connectors): add stronger validation to scrape endpoint
Strong validation needed, otherwise data goes to Celery and crashes silently. In addition it checks if that specific source supports search or category.
2026-03-12 09:59:07 +00:00
c12f1b4371 chore(connectors): add category and search validation fields 2026-03-12 09:56:34 +00:00
01d6bd0164 fix(connectors): category / search fields breaking
Ideally category and search are fully optional, however some sites break if one or the other is not provided.

Unfortuntely `boards.ie` has a different page type for searches and I'm not bothered to implement a scraper from scratch.

In addition, removed comment limit options.
2026-03-11 21:16:26 +00:00
12cbc24074 chore(utils): remove split_limit function 2026-03-11 19:47:44 +00:00
0658713f42 chore: remove unused dataset creation script 2026-03-11 19:44:38 +00:00
b2ae1a9f70 feat(frontend): add page for scraping endpoint 2026-03-11 19:41:34 +00:00
eff416c34e fix(connectors): hardcoded source name in Youtube connector 2026-03-10 23:36:09 +00:00
524c9c50a0 fix(api): incorrect dataset status update message 2026-03-10 23:28:21 +00:00
2ab74d922a feat(api): support per-source search, category and limit configuration 2026-03-10 23:15:33 +00:00
d520e2af98 fix(auth): missing email and username business rules 2026-03-10 22:48:04 +00:00
8fe84a30f6 fix: data leak when opening topics file 2026-03-10 22:45:07 +00:00
dc330b87b9 fix(celery): process dataset directly in fetch task
Calling the original `process_dataset` function led to issues with JSON serialisation.
2026-03-10 22:17:00 +00:00
7ccc934f71 build: change celery to debug mode 2026-03-10 22:14:45 +00:00
a3dbe04a57 fix(frontend): option to delete dataset not shown after fail 2026-03-10 19:23:48 +00:00
a65c4a461c fix(api): flask delegates dataset fetch to celery 2026-03-10 19:17:41 +00:00
15704a0782 chore(db): update db schema to include "fetching" status 2026-03-10 19:17:08 +00:00
6ec47256d0 feat(api): add database scraping endpoints 2026-03-10 19:04:33 +00:00
2572664e26 chore(utils): add env getter that fails if env not found 2026-03-10 18:50:53 +00:00
17bd4702b2 fix(connectors): connector detectors returning name of ID alongside connector obj 2026-03-10 18:36:40 +00:00
53cb5c2ea5 feat(topics): add generalised topic list
This is easier and quicker compared to deriving a topics list based on the dataset that has been scraped.

While using LLMs to create a personalised topic list based on the query, category or dataset itself would yield better results for most, it is beyond the scope of this project.
2026-03-10 18:36:08 +00:00
0866dda8b3 chore: add util to always split evenly 2026-03-10 18:25:05 +00:00
5ccb2e73cd fix(connectors): incorrect registry location
Registry paths were using the incorrect connector path locations.
2026-03-10 18:18:42 +00:00
2a8d7c7972 refactor(connectors): Youtube & Reddit connectors implement BaseConnector 2026-03-10 18:11:33 +00:00
e7a8c17be4 chore(connectors): add base connector inheritance 2026-03-10 18:08:01 +00:00
cc799f7368 feat(connectors): add base connector and registry for detection
Idea is to have a "plugin-type" system, where new connectors can extend the `BaseConnector` class and implement the fetch posts method.

These are automatically detected by the registry, and automatically used in new Flask endpoints that give a list of possible sources.

Allows for an open-ended system where new data scrapers / API consumers can be added dynamically.
2026-03-09 21:29:03 +00:00
262a70dbf3 refactor(api): rename /upload endpoint
Ensures consistency with the other dataset-based endpoints and follows the REST-API rules more cleanly.
2026-03-09 20:55:12 +00:00
ca444e9cb0 refactor: move connectors to backend dir
They will now be more used in the backend.
2026-03-09 20:53:13 +00:00
738af5415b Merge pull request 'Editable and removable datasets' (#8) from feat/editable-datasets into main
Reviewed-on: #8
2026-03-05 16:55:48 +00:00
2b14a8a417 feat(frontend): add deletion modal confirmation box 2026-03-05 12:29:53 +00:00
a154b25415 fix(db): missing rollback on execute_batch method
Arguably more important on a batch function to have rollback.
2026-03-05 10:09:14 +00:00
eb273efe61 Merge remote-tracking branch 'origin/main' into feat/editable-datasets 2026-03-04 22:34:55 +00:00
a9001c79e1 build: add frontend to main docker compose
Forgot to add this earlier
2026-03-04 22:34:32 +00:00
eec8f2417e feat(frontend): add ability to delete datasets 2026-03-04 22:32:19 +00:00
f5835b5a97 feat(frontend): add frontend option to change name 2026-03-04 22:17:31 +00:00
64e3f9eea8 feat: implement PATCH dataset route
At the moment only allows for the updating of the name. Which seems to be the only editable part of dataset metadata.
2026-03-04 21:38:06 +00:00
4f01bf0419 fix(db): incorrect SQL condition when deleting dataset content 2026-03-04 21:35:10 +00:00
6948891677 Merge remote-tracking branch 'origin/main' into feat/editable-datasets 2026-03-04 21:30:13 +00:00