Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid spidering PDF URLs - causes crash or incomplete record #29

Open
wswtizer opened this issue Dec 21, 2016 · 0 comments
Open

Avoid spidering PDF URLs - causes crash or incomplete record #29

wswtizer opened this issue Dec 21, 2016 · 0 comments

Comments

@wswtizer
Copy link

Problem: I managed to crash the devCenter Uploader trying to add a link to a PDF with the 'Create New Document' tab. (Using the 'Create provisional document' tab has a different result in that it creates a record, but doesn't populate the title, so I can't access that record via the UI. Issue #28 opened for that.)

The problem is related to the fact that the tool tries to crawl (spider) for PDF, but there is no data. Every a document is edited with a blank body, it tries to fetch the content again.

Glynn suggested that the URL could be 'pre-fetched' in order to get its content-type and if it's not text/html just skip the crawler.

Workaround suggested: for PDF URL, ensure that you have a title, and put some words in the body field when creating a new record with 'Create new document tab' - this would avoid the attempt to fetch again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant