Commit Graph

62 Commits

Author SHA1 Message Date
501dec9dd5 convert YouTube published_at to timestamp 2026-01-22 15:02:55 +00:00
096a415f3b fix datetime from boards.ie not being parsed properly 2026-01-22 14:49:01 +00:00
a34252deda Add response code 500 error handling in reddit api 2026-01-19 22:55:28 +00:00
245ab19183 Add error handling for YouTube comments fetching 2026-01-19 22:54:48 +00:00
2243558e56 update gitignore 2026-01-19 20:57:26 +00:00
09a7c6fc9f remove debug print 2026-01-19 20:53:56 +00:00
187401c5eb Implement YouTube API integration for video and comment fetching 2026-01-19 20:50:17 +00:00
2b0aed0f74 add .env to gitignore 2026-01-19 20:33:25 +00:00
85388ef6aa Add comment limit to _parse_comments method in BoardsAPI
Some boards.ie threads have thousands of comments which is slow to fetch with pagination
2026-01-19 20:23:11 +00:00
9c66ec8b82 Save to jsonl file after every fetch
Reduces errors and lost data
2026-01-19 20:22:47 +00:00
e9cf51731d Add comment parsing functionality to BoardsAPI
Pagination required due to multiple pages of comments on boards.
2026-01-19 18:24:44 +00:00
415b1ca87e update README 2026-01-17 22:16:56 +00:00
8088417a37 Remove docker-compose.yml file 2026-01-17 22:16:19 +00:00
4ea9bc8b45 Increase max_workers in ThreadPoolExecutor to improve post fetching performance 2026-01-17 22:14:34 +00:00
d7baf39087 Implement exponential backoff for handling Reddit API rate limits in _fetch_data method 2026-01-17 22:14:26 +00:00
193ff43975 Refactor dataset creation to use post_to_dict for improved data structure and limit API calls to 400 2026-01-17 22:14:15 +00:00
1d2865470b Add comment parsing to _parse_posts 2026-01-17 18:20:21 +00:00
db21e86b8e Fix post ID extraction in _parse_thread method 2026-01-17 16:18:04 +00:00
09d12ae173 Add logging to reddit api class 2026-01-17 16:12:18 +00:00
38cf57e198 Include Ireland posts in dataset creation 2026-01-17 16:05:42 +00:00
ed3d89fd27 Refactor post fetching to use ThreadPoolExecutor for improved concurrency 2026-01-17 16:05:37 +00:00
d44b247bda rename dataset output to "posts.json" 2026-01-17 14:52:32 +00:00
d5e6b7a895 Refactor post detail fetching into separate _parse_thread method 2026-01-17 14:51:57 +00:00
610bab67d5 Add boards.ie to dataset creation & add logging config 2026-01-17 14:43:56 +00:00
b8ed409e04 implement slight efficiency gain in board.ie pagination 2026-01-17 14:43:14 +00:00
0523c1a091 Refactor logging to use class logger in BoardsAPI 2026-01-17 14:37:28 +00:00
a1c1e1e0d8 patch broken title scrape 2026-01-17 14:28:16 +00:00
9eec7b00e3 Implement BoardsAPI to fetch new category posts and their details 2026-01-17 14:25:43 +00:00
c3a81d8b01 update requirements.txt 2026-01-17 13:59:43 +00:00
ad416d4966 Add Comment DTO 2026-01-17 13:59:35 +00:00
47e71113f6 Merge branch 'main' of github:ThisBirchWood/ethnograph-view 2026-01-15 12:43:53 +00:00
b0e079599a Rename fetch data script & add check for empty posts 2026-01-13 19:06:00 +00:00
538ea9fe12 Remove database connection and schema setup from the project 2026-01-13 19:01:18 +00:00
73a19f3ce3 Add script to orchestrate dataset creation 2026-01-13 18:59:42 +00:00
e58c18bf99 add json files and vscode workspaces to gitignore 2026-01-13 18:57:29 +00:00
d4fb78aac4 Add pagination to new_subreddit method to bypass 100 post limit 2026-01-13 18:46:43 +00:00
05874d233f Implement subreddit search method for new posts 2026-01-13 18:39:55 +00:00
b5624035ec rename reddit_connecter to reddit_api 2026-01-13 14:45:20 +00:00
7c01c335fa remove base_connector and remove non-subreddit specific methods
Project will focus on specific communities, not enact a reddit-wide search
2026-01-13 14:19:43 +00:00
62823bfd44 update requirements.txt 2026-01-12 15:31:14 +00:00
0cc95c5358 add ID field to post dto 2026-01-11 20:36:19 +00:00
68642709b7 add rudimentary sentiment analysis endpoint to calculate average sentiment of posts 2026-01-11 17:31:37 +00:00
4d459f2035 update main.py to launch flask app 2026-01-11 17:21:09 +00:00
195188dcd7 update User-agent header in _fetch_data method and add __exit__ method to Database class 2026-01-11 15:30:34 +00:00
b5a2b01402 remove debug print statements from fetch_subreddit function 2026-01-11 15:11:49 +00:00
2a8e3fd4db update README to clarify requirements 2026-01-11 15:11:09 +00:00
4b8aebd312 add fetch_subreddit endpoint to retrieve and insert top posts from a specified subreddit 2026-01-11 15:07:44 +00:00
d3c985ba1f update posts table schema to include title and author_username fields 2026-01-11 14:44:34 +00:00
5e1bccb2a8 add execute_many method to Database class and update fetch_reddit endpoint to insert posts into database 2026-01-11 14:44:22 +00:00
1e9eb11aa1 update docker-compose to drop volume on restart
For faster development of database schema
2026-01-11 14:44:12 +00:00