Commit Graph

80 Commits

Author SHA1 Message Date
5c1e65b606 remove unused package.json 2026-01-27 17:29:53 +00:00
200645a4e0 update requirements.txt 2026-01-27 15:06:56 +00:00
8394673b3c feat: posts_per_day and comments_per_day endpoints in Flask 2026-01-27 13:20:51 +00:00
2482c1da1c add error message in React Page 2026-01-27 13:20:07 +00:00
d78c58a70c fix comment datetime to be parsed to timestamp in YoutubeAPI 2026-01-27 12:37:16 +00:00
e72d336de4 remove /data subdir
Dataset is now created in the pwd. Avoids issues if the folder didn't exist
2026-01-27 12:36:01 +00:00
2401875a19 combine posts and comments endpoint 2026-01-27 12:14:33 +00:00
ffba2d78c2 add two inputs for posts and comments 2026-01-27 12:14:22 +00:00
322b69825c add package lock 2026-01-27 12:00:06 +00:00
82bd9a7a9b add upload post endpoint in flask app 2026-01-27 11:59:01 +00:00
524a5d2619 Add react app 2026-01-27 11:58:08 +00:00
ff2b08fc2d update gitignore 2026-01-27 11:50:56 +00:00
7d94494fe2 youtube connector returns posts and comments in a flat manner 2026-01-24 20:19:15 +00:00
d96845d48b implement pagination to search subreddit method & remove timeframe attr
In addition, it now searches new posts instead of top
2026-01-22 17:10:16 +00:00
8f504b7d4d updated reddit api to flatten comments and posts into separate data structures 2026-01-22 17:05:32 +00:00
79cdb7babf remove unused top subreddit posts method 2026-01-22 15:55:33 +00:00
152264bda9 separate comment and post data structures
This allows for a flat data structure, benefical to data analysis
2026-01-22 15:53:47 +00:00
3c4aad77ef update number of fetched comments and videos from youtube 2026-01-22 15:29:55 +00:00
501dec9dd5 convert YouTube published_at to timestamp 2026-01-22 15:02:55 +00:00
096a415f3b fix datetime from boards.ie not being parsed properly 2026-01-22 14:49:01 +00:00
a34252deda Add response code 500 error handling in reddit api 2026-01-19 22:55:28 +00:00
245ab19183 Add error handling for YouTube comments fetching 2026-01-19 22:54:48 +00:00
2243558e56 update gitignore 2026-01-19 20:57:26 +00:00
09a7c6fc9f remove debug print 2026-01-19 20:53:56 +00:00
187401c5eb Implement YouTube API integration for video and comment fetching 2026-01-19 20:50:17 +00:00
2b0aed0f74 add .env to gitignore 2026-01-19 20:33:25 +00:00
85388ef6aa Add comment limit to _parse_comments method in BoardsAPI
Some boards.ie threads have thousands of comments which is slow to fetch with pagination
2026-01-19 20:23:11 +00:00
9c66ec8b82 Save to jsonl file after every fetch
Reduces errors and lost data
2026-01-19 20:22:47 +00:00
e9cf51731d Add comment parsing functionality to BoardsAPI
Pagination required due to multiple pages of comments on boards.
2026-01-19 18:24:44 +00:00
415b1ca87e update README 2026-01-17 22:16:56 +00:00
8088417a37 Remove docker-compose.yml file 2026-01-17 22:16:19 +00:00
4ea9bc8b45 Increase max_workers in ThreadPoolExecutor to improve post fetching performance 2026-01-17 22:14:34 +00:00
d7baf39087 Implement exponential backoff for handling Reddit API rate limits in _fetch_data method 2026-01-17 22:14:26 +00:00
193ff43975 Refactor dataset creation to use post_to_dict for improved data structure and limit API calls to 400 2026-01-17 22:14:15 +00:00
1d2865470b Add comment parsing to _parse_posts 2026-01-17 18:20:21 +00:00
db21e86b8e Fix post ID extraction in _parse_thread method 2026-01-17 16:18:04 +00:00
09d12ae173 Add logging to reddit api class 2026-01-17 16:12:18 +00:00
38cf57e198 Include Ireland posts in dataset creation 2026-01-17 16:05:42 +00:00
ed3d89fd27 Refactor post fetching to use ThreadPoolExecutor for improved concurrency 2026-01-17 16:05:37 +00:00
d44b247bda rename dataset output to "posts.json" 2026-01-17 14:52:32 +00:00
d5e6b7a895 Refactor post detail fetching into separate _parse_thread method 2026-01-17 14:51:57 +00:00
610bab67d5 Add boards.ie to dataset creation & add logging config 2026-01-17 14:43:56 +00:00
b8ed409e04 implement slight efficiency gain in board.ie pagination 2026-01-17 14:43:14 +00:00
0523c1a091 Refactor logging to use class logger in BoardsAPI 2026-01-17 14:37:28 +00:00
a1c1e1e0d8 patch broken title scrape 2026-01-17 14:28:16 +00:00
9eec7b00e3 Implement BoardsAPI to fetch new category posts and their details 2026-01-17 14:25:43 +00:00
c3a81d8b01 update requirements.txt 2026-01-17 13:59:43 +00:00
ad416d4966 Add Comment DTO 2026-01-17 13:59:35 +00:00
47e71113f6 Merge branch 'main' of github:ThisBirchWood/ethnograph-view 2026-01-15 12:43:53 +00:00
b0e079599a Rename fetch data script & add check for empty posts 2026-01-13 19:06:00 +00:00