|
|
d96845d48b
|
implement pagination to search subreddit method & remove timeframe attr
In addition, it now searches new posts instead of top
|
2026-01-22 17:10:16 +00:00 |
|
|
|
8f504b7d4d
|
updated reddit api to flatten comments and posts into separate data structures
|
2026-01-22 17:05:32 +00:00 |
|
|
|
79cdb7babf
|
remove unused top subreddit posts method
|
2026-01-22 15:55:33 +00:00 |
|
|
|
152264bda9
|
separate comment and post data structures
This allows for a flat data structure, benefical to data analysis
|
2026-01-22 15:53:47 +00:00 |
|
|
|
3c4aad77ef
|
update number of fetched comments and videos from youtube
|
2026-01-22 15:29:55 +00:00 |
|
|
|
501dec9dd5
|
convert YouTube published_at to timestamp
|
2026-01-22 15:02:55 +00:00 |
|
|
|
096a415f3b
|
fix datetime from boards.ie not being parsed properly
|
2026-01-22 14:49:01 +00:00 |
|
|
|
a34252deda
|
Add response code 500 error handling in reddit api
|
2026-01-19 22:55:28 +00:00 |
|
|
|
245ab19183
|
Add error handling for YouTube comments fetching
|
2026-01-19 22:54:48 +00:00 |
|
|
|
2243558e56
|
update gitignore
|
2026-01-19 20:57:26 +00:00 |
|
|
|
09a7c6fc9f
|
remove debug print
|
2026-01-19 20:53:56 +00:00 |
|
|
|
187401c5eb
|
Implement YouTube API integration for video and comment fetching
|
2026-01-19 20:50:17 +00:00 |
|
|
|
2b0aed0f74
|
add .env to gitignore
|
2026-01-19 20:33:25 +00:00 |
|
|
|
85388ef6aa
|
Add comment limit to _parse_comments method in BoardsAPI
Some boards.ie threads have thousands of comments which is slow to fetch with pagination
|
2026-01-19 20:23:11 +00:00 |
|
|
|
9c66ec8b82
|
Save to jsonl file after every fetch
Reduces errors and lost data
|
2026-01-19 20:22:47 +00:00 |
|
|
|
e9cf51731d
|
Add comment parsing functionality to BoardsAPI
Pagination required due to multiple pages of comments on boards.
|
2026-01-19 18:24:44 +00:00 |
|
|
|
415b1ca87e
|
update README
|
2026-01-17 22:16:56 +00:00 |
|
|
|
8088417a37
|
Remove docker-compose.yml file
|
2026-01-17 22:16:19 +00:00 |
|
|
|
4ea9bc8b45
|
Increase max_workers in ThreadPoolExecutor to improve post fetching performance
|
2026-01-17 22:14:34 +00:00 |
|
|
|
d7baf39087
|
Implement exponential backoff for handling Reddit API rate limits in _fetch_data method
|
2026-01-17 22:14:26 +00:00 |
|
|
|
193ff43975
|
Refactor dataset creation to use post_to_dict for improved data structure and limit API calls to 400
|
2026-01-17 22:14:15 +00:00 |
|
|
|
1d2865470b
|
Add comment parsing to _parse_posts
|
2026-01-17 18:20:21 +00:00 |
|
|
|
db21e86b8e
|
Fix post ID extraction in _parse_thread method
|
2026-01-17 16:18:04 +00:00 |
|
|
|
09d12ae173
|
Add logging to reddit api class
|
2026-01-17 16:12:18 +00:00 |
|
|
|
38cf57e198
|
Include Ireland posts in dataset creation
|
2026-01-17 16:05:42 +00:00 |
|
|
|
ed3d89fd27
|
Refactor post fetching to use ThreadPoolExecutor for improved concurrency
|
2026-01-17 16:05:37 +00:00 |
|
|
|
d44b247bda
|
rename dataset output to "posts.json"
|
2026-01-17 14:52:32 +00:00 |
|
|
|
d5e6b7a895
|
Refactor post detail fetching into separate _parse_thread method
|
2026-01-17 14:51:57 +00:00 |
|
|
|
610bab67d5
|
Add boards.ie to dataset creation & add logging config
|
2026-01-17 14:43:56 +00:00 |
|
|
|
b8ed409e04
|
implement slight efficiency gain in board.ie pagination
|
2026-01-17 14:43:14 +00:00 |
|
|
|
0523c1a091
|
Refactor logging to use class logger in BoardsAPI
|
2026-01-17 14:37:28 +00:00 |
|
|
|
a1c1e1e0d8
|
patch broken title scrape
|
2026-01-17 14:28:16 +00:00 |
|
|
|
9eec7b00e3
|
Implement BoardsAPI to fetch new category posts and their details
|
2026-01-17 14:25:43 +00:00 |
|
|
|
c3a81d8b01
|
update requirements.txt
|
2026-01-17 13:59:43 +00:00 |
|
|
|
ad416d4966
|
Add Comment DTO
|
2026-01-17 13:59:35 +00:00 |
|
|
|
47e71113f6
|
Merge branch 'main' of github:ThisBirchWood/ethnograph-view
|
2026-01-15 12:43:53 +00:00 |
|
|
|
b0e079599a
|
Rename fetch data script & add check for empty posts
|
2026-01-13 19:06:00 +00:00 |
|
|
|
538ea9fe12
|
Remove database connection and schema setup from the project
|
2026-01-13 19:01:18 +00:00 |
|
|
|
73a19f3ce3
|
Add script to orchestrate dataset creation
|
2026-01-13 18:59:42 +00:00 |
|
|
|
e58c18bf99
|
add json files and vscode workspaces to gitignore
|
2026-01-13 18:57:29 +00:00 |
|
|
|
d4fb78aac4
|
Add pagination to new_subreddit method to bypass 100 post limit
|
2026-01-13 18:46:43 +00:00 |
|
|
|
05874d233f
|
Implement subreddit search method for new posts
|
2026-01-13 18:39:55 +00:00 |
|
|
|
b5624035ec
|
rename reddit_connecter to reddit_api
|
2026-01-13 14:45:20 +00:00 |
|
|
|
7c01c335fa
|
remove base_connector and remove non-subreddit specific methods
Project will focus on specific communities, not enact a reddit-wide search
|
2026-01-13 14:19:43 +00:00 |
|
|
|
62823bfd44
|
update requirements.txt
|
2026-01-12 15:31:14 +00:00 |
|
|
|
0cc95c5358
|
add ID field to post dto
|
2026-01-11 20:36:19 +00:00 |
|
|
|
68642709b7
|
add rudimentary sentiment analysis endpoint to calculate average sentiment of posts
|
2026-01-11 17:31:37 +00:00 |
|
|
|
4d459f2035
|
update main.py to launch flask app
|
2026-01-11 17:21:09 +00:00 |
|
|
|
195188dcd7
|
update User-agent header in _fetch_data method and add __exit__ method to Database class
|
2026-01-11 15:30:34 +00:00 |
|
|
|
b5a2b01402
|
remove debug print statements from fetch_subreddit function
|
2026-01-11 15:11:49 +00:00 |
|