e35e51d295
fix(reddit_api): handle rate limit wait time conversion error
2026-04-14 17:35:21 +01:00
4dd2721e98
Merge remote-tracking branch 'origin/main' into feat/corpus-explorer
2026-04-10 13:19:17 +01:00
ec64551881
fix(connectors): update User-Agent header for BoardsAPI
2026-04-08 19:34:30 +01:00
c6cae040f0
feat(analysis): add emotional averages to stance markers
2026-04-07 12:49:18 +01:00
e903e1b738
feat(user): add dominant topic information to user data
2026-04-07 11:34:03 +01:00
6efa75dfe6
chore(connectors): reduce aggressive parallel connections to boards.ie
2026-04-04 12:33:06 +01:00
de61e7653f
perf(connector): add reddit API authentication to speed up fetching
...
This aligns better with ethics and massively increases rate limits.
2026-04-04 12:26:54 +01:00
98aa04256b
fix(reddit_api): fix reddit ratelimit check
2026-04-04 10:20:48 +01:00
37d08c63b8
chore: rename auto-scraper to auto-fetcher
...
Improves the perception of ethics
2026-04-01 09:50:53 +01:00
1482e96051
feat(datasets): implement deduplication of dataset records in get_dataset_content
2026-04-01 09:06:07 +01:00
cd6030a760
fix(ngrams): remove stop words from ngrams
2026-04-01 08:44:47 +01:00
6378015726
fix(stats): remove duplicated entries in corpus explorer
2026-04-01 00:22:29 +01:00
b270ed03ae
feat(frontend): implement corpus explorer
...
This allows you to view the posts & comments associated with a specific aggregate.
2026-04-01 00:04:25 +01:00
1dde5f7b08
fix(nlp): fix missing processing dataset status update
2026-03-31 20:59:09 +01:00
efb4c8384d
chore(stats): remove average_thread_depth
2026-03-31 16:40:54 +01:00
75fd042d74
feat(api): add support for custom topic lists when autoscraping
2026-03-31 13:36:37 +01:00
e776ef53ac
refactor(database): configurable database source
2026-03-29 21:30:18 +01:00
376773a0cc
style: run python linter & prettifier on backend code
2026-03-25 19:34:43 +00:00
7716ee0bff
build(env): extract Redis URL into env file
...
This could allow one to connect to a remote Redis instance with a powerful GPU, allowing one to offload the NLP work.
2026-03-22 14:41:15 +00:00
97e897c240
fix(analysis): broken entity handling in cultural endpoint
2026-03-22 14:34:05 +00:00
3e78a54388
feat(stat): add conversation concentration metric
...
Remove old `initiator_ratio` metric which wasn't working due every event having a `reply_to` value.
This metric was suggested by AI, and is a surprisingly interesting one that gave interesting insights.
2026-03-18 18:36:09 +00:00
71998c450e
fix(db): change title type to text
...
Occasionally a Reddit post would have a long title, and would break in the schema.
2026-03-17 19:49:03 +00:00
2a00384a55
feat(interaction): add top interaction pairs and initiator ratio methods
2026-03-17 19:03:56 +00:00
8372aa7278
feat(api): add endpoint to view entire dataset
2026-03-17 13:36:41 +00:00
7b5a939271
fix(stats): missing private methods in User obj
2026-03-17 13:36:10 +00:00
2fa1dff4b7
feat(stat): add lexical diversity stat
2026-03-17 13:27:49 +00:00
31fb275ee3
fix(db): incorrect NER column being inserted
2026-03-17 12:53:30 +00:00
8a0f6e71e8
chore(api): rename cultural entity emotion endpoint
2026-03-17 12:31:53 +00:00
9093059d05
refactor(stats): move user stats out of interactional into users
2026-03-17 12:23:03 +00:00
3468fdc2ea
feat(api): add new user and linguistic endpoints
2026-03-16 16:45:11 +00:00
09a4f9036f
refactor(stats): add summary and user stat classes for consistency
2026-03-16 16:43:24 +00:00
97fccd073b
feat(emotional): add average emotion & dominant emotion stats
2026-03-16 16:41:28 +00:00
12f5953146
fix(api): remove error exceptions in API responses
...
Mainly a security thing, we don't want actual code errors being given in the API response, as someone could find out how the inner workings of the code behaves.
2026-03-14 21:58:00 +00:00
5b0441c34b
fix(connector): unnecessary comment limits
...
In addition, I made some methods private to better align with the BaseConnector parent class.
2026-03-14 21:53:13 +00:00
d2b919cd66
fix(api): enforce integer limit and cap at 1000 in scrape_data function
2026-03-14 17:35:05 +00:00
062937ec3c
fix(api): incorrect validation on search
2026-03-14 17:12:02 +00:00
2a00795cc2
chore(connectors): implement category_exists for Boards API
2026-03-14 17:11:49 +00:00
8a423b2a29
feat(connectors): implement category validation in scraping process
2026-03-14 16:59:43 +00:00
d96f459104
fix(connectors): update URL references to use base_url in BoardsAPI
2026-03-13 21:59:17 +00:00
6684780d23
fix(connectors): add stronger validation to scrape endpoint
...
Strong validation needed, otherwise data goes to Celery and crashes silently. In addition it checks if that specific source supports search or category.
2026-03-12 09:59:07 +00:00
c12f1b4371
chore(connectors): add category and search validation fields
2026-03-12 09:56:34 +00:00
01d6bd0164
fix(connectors): category / search fields breaking
...
Ideally category and search are fully optional, however some sites break if one or the other is not provided.
Unfortuntely `boards.ie` has a different page type for searches and I'm not bothered to implement a scraper from scratch.
In addition, removed comment limit options.
2026-03-11 21:16:26 +00:00
12cbc24074
chore(utils): remove split_limit function
2026-03-11 19:47:44 +00:00
eff416c34e
fix(connectors): hardcoded source name in Youtube connector
2026-03-10 23:36:09 +00:00
524c9c50a0
fix(api): incorrect dataset status update message
2026-03-10 23:28:21 +00:00
2ab74d922a
feat(api): support per-source search, category and limit configuration
2026-03-10 23:15:33 +00:00
d520e2af98
fix(auth): missing email and username business rules
2026-03-10 22:48:04 +00:00
8fe84a30f6
fix: data leak when opening topics file
2026-03-10 22:45:07 +00:00
dc330b87b9
fix(celery): process dataset directly in fetch task
...
Calling the original `process_dataset` function led to issues with JSON serialisation.
2026-03-10 22:17:00 +00:00
a65c4a461c
fix(api): flask delegates dataset fetch to celery
2026-03-10 19:17:41 +00:00