5b0441c34b
fix(connector): unnecessary comment limits
...
In addition, I made some methods private to better align with the BaseConnector parent class.
2026-03-14 21:53:13 +00:00
d2b919cd66
fix(api): enforce integer limit and cap at 1000 in scrape_data function
2026-03-14 17:35:05 +00:00
062937ec3c
fix(api): incorrect validation on search
2026-03-14 17:12:02 +00:00
2a00795cc2
chore(connectors): implement category_exists for Boards API
2026-03-14 17:11:49 +00:00
c990f29645
fix(frontend): misaligned loading page for datasets
2026-03-14 17:05:46 +00:00
8a423b2a29
feat(connectors): implement category validation in scraping process
2026-03-14 16:59:43 +00:00
d96f459104
fix(connectors): update URL references to use base_url in BoardsAPI
2026-03-13 21:59:17 +00:00
162a4de64e
fix(frontend): detects which sources support category or search
2026-03-12 10:07:28 +00:00
6684780d23
fix(connectors): add stronger validation to scrape endpoint
...
Strong validation needed, otherwise data goes to Celery and crashes silently. In addition it checks if that specific source supports search or category.
2026-03-12 09:59:07 +00:00
c12f1b4371
chore(connectors): add category and search validation fields
2026-03-12 09:56:34 +00:00
01d6bd0164
fix(connectors): category / search fields breaking
...
Ideally category and search are fully optional, however some sites break if one or the other is not provided.
Unfortuntely `boards.ie` has a different page type for searches and I'm not bothered to implement a scraper from scratch.
In addition, removed comment limit options.
2026-03-11 21:16:26 +00:00
12cbc24074
chore(utils): remove split_limit function
2026-03-11 19:47:44 +00:00
0658713f42
chore: remove unused dataset creation script
2026-03-11 19:44:38 +00:00
b2ae1a9f70
feat(frontend): add page for scraping endpoint
2026-03-11 19:41:34 +00:00
eff416c34e
fix(connectors): hardcoded source name in Youtube connector
2026-03-10 23:36:09 +00:00
524c9c50a0
fix(api): incorrect dataset status update message
2026-03-10 23:28:21 +00:00
2ab74d922a
feat(api): support per-source search, category and limit configuration
2026-03-10 23:15:33 +00:00
d520e2af98
fix(auth): missing email and username business rules
2026-03-10 22:48:04 +00:00
8fe84a30f6
fix: data leak when opening topics file
2026-03-10 22:45:07 +00:00
dc330b87b9
fix(celery): process dataset directly in fetch task
...
Calling the original `process_dataset` function led to issues with JSON serialisation.
2026-03-10 22:17:00 +00:00
7ccc934f71
build: change celery to debug mode
2026-03-10 22:14:45 +00:00
a3dbe04a57
fix(frontend): option to delete dataset not shown after fail
2026-03-10 19:23:48 +00:00
a65c4a461c
fix(api): flask delegates dataset fetch to celery
2026-03-10 19:17:41 +00:00
15704a0782
chore(db): update db schema to include "fetching" status
2026-03-10 19:17:08 +00:00
6ec47256d0
feat(api): add database scraping endpoints
2026-03-10 19:04:33 +00:00
2572664e26
chore(utils): add env getter that fails if env not found
2026-03-10 18:50:53 +00:00
17bd4702b2
fix(connectors): connector detectors returning name of ID alongside connector obj
2026-03-10 18:36:40 +00:00
53cb5c2ea5
feat(topics): add generalised topic list
...
This is easier and quicker compared to deriving a topics list based on the dataset that has been scraped.
While using LLMs to create a personalised topic list based on the query, category or dataset itself would yield better results for most, it is beyond the scope of this project.
2026-03-10 18:36:08 +00:00
0866dda8b3
chore: add util to always split evenly
2026-03-10 18:25:05 +00:00
5ccb2e73cd
fix(connectors): incorrect registry location
...
Registry paths were using the incorrect connector path locations.
2026-03-10 18:18:42 +00:00
2a8d7c7972
refactor(connectors): Youtube & Reddit connectors implement BaseConnector
2026-03-10 18:11:33 +00:00
e7a8c17be4
chore(connectors): add base connector inheritance
2026-03-10 18:08:01 +00:00
cc799f7368
feat(connectors): add base connector and registry for detection
...
Idea is to have a "plugin-type" system, where new connectors can extend the `BaseConnector` class and implement the fetch posts method.
These are automatically detected by the registry, and automatically used in new Flask endpoints that give a list of possible sources.
Allows for an open-ended system where new data scrapers / API consumers can be added dynamically.
2026-03-09 21:29:03 +00:00
262a70dbf3
refactor(api): rename /upload endpoint
...
Ensures consistency with the other dataset-based endpoints and follows the REST-API rules more cleanly.
2026-03-09 20:55:12 +00:00
ca444e9cb0
refactor: move connectors to backend dir
...
They will now be more used in the backend.
2026-03-09 20:53:13 +00:00
738af5415b
Merge pull request 'Editable and removable datasets' ( #8 ) from feat/editable-datasets into main
...
Reviewed-on: #8
2026-03-05 16:55:48 +00:00
2b14a8a417
feat(frontend): add deletion modal confirmation box
2026-03-05 12:29:53 +00:00
a154b25415
fix(db): missing rollback on execute_batch method
...
Arguably more important on a batch function to have rollback.
2026-03-05 10:09:14 +00:00
eb273efe61
Merge remote-tracking branch 'origin/main' into feat/editable-datasets
2026-03-04 22:34:55 +00:00
a9001c79e1
build: add frontend to main docker compose
...
Forgot to add this earlier
2026-03-04 22:34:32 +00:00
eec8f2417e
feat(frontend): add ability to delete datasets
2026-03-04 22:32:19 +00:00
f5835b5a97
feat(frontend): add frontend option to change name
2026-03-04 22:17:31 +00:00
64e3f9eea8
feat: implement PATCH dataset route
...
At the moment only allows for the updating of the name. Which seems to be the only editable part of dataset metadata.
2026-03-04 21:38:06 +00:00
4f01bf0419
fix(db): incorrect SQL condition when deleting dataset content
2026-03-04 21:35:10 +00:00
6948891677
Merge remote-tracking branch 'origin/main' into feat/editable-datasets
2026-03-04 21:30:13 +00:00
f1f33e2fe4
feat: implement delete dataset route
2026-03-04 21:29:01 +00:00
e20d0689e8
fix(celery): adjust try-catch logic to improve error handling
...
Capturing the instantiation of the database and dataset manager objects inside the try-catch will cause errors if something else fails.
If an exception occurs and the dataset_manager is not initialised, the code inside the catch block will fail.
2026-03-04 21:18:59 +00:00
fcdac6f3bb
Merge pull request 'Fix the frontend API calls and implement logins on frontend' ( #7 ) from feat/update-frontend-api-calls into main
...
Reviewed-on: #7
2026-03-04 20:20:50 +00:00
5fc1f1532f
feat(user stats): updated styling and stats in user page
...
Interaction graph was taking up too much space and was the only thing on the screen. Further statistics were added however these may be removed in favour of more informative statistics
2026-03-04 20:20:34 +00:00
24277e0104
fix(frontend): move loading card higher up
...
Looks weird lower down on the screen
2026-03-04 20:09:55 +00:00