Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update crawler with google docs support #32

Merged
merged 9 commits into from
May 12, 2024
Merged

Conversation

nattvara
Copy link
Owner

@nattvara nattvara commented May 2, 2024

Description

This PR will add support for downloading and indexing the content of google docs. It will also fix a few bugs in the crawler.

What will this PR change

  • Remove some new kth urls from being indexed
  • Add support for downloading google docs, sheets and slides
  • Add support for specifying an extra url in a course that will always be crawled even if there isn't a link to it in the course
  • fix issue where secure cookies couldn't be added
  • fix issues where large files would crash the crawler

Screenshots

N/A

@nattvara nattvara merged commit 5516574 into main May 12, 2024
12 checks passed
@nattvara nattvara deleted the feature/fix-crawling-bug branch May 12, 2024 15:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant