-
-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Download attached PDF documents when opening page #108
Comments
Okay, I should have checked before: This issue occurs with many devices of various vendors. Does it make sense to make a list with working and broken pages? |
This is not a really bug, it is a limitation ^^ As mentioned in the error message, current scraper simply does not retrieve this kind of items into the ZIM. This would need additional efforts. It is however new to me that we now have "Documents" in ifixit guides, we will need to check that. Thank you for reporting, and no need to list pages without this. Unless I missed something, I think that all "Documents" are just missing. |
Thank you for the quick reply, and for the clarification! I changed the title now, so this might be considered a feature request. 🙂 Would the respective wiki page be scraped if it contained no PDF attachment? There is also normal text missing; the PDF issue was just an additional information. If downloading the PDF in addition to the ZIM is out of scope for this project (as that seems to be a client task), maybe PDFs (or rather all kinds of attachments) could be treated as external links? That way my system is responsible for downloading the PDF and opening it with my default PDF reader. |
Do you have an example of normal text missing? This is not really expected. Philosophy so far has been to focus on what is really important for an offline user (categories, guides, ...) and postpone to "later" what is less important: items (parts and tools), wikis, ... I had a quick look and documents seems to become a very important part of iFixit now that some companies are providing these to iFixit. I think we should "urgently" add support for these. Most our users are offline and won't be able to use the external link. I cannot provide an ETA however, hopefully in the coming months. You speak about other kind of attachments, do you have an example? Is it still a document (i.e. in a Documents section, and served on a /Document url)? |
Sorry for the confusion, I did not have any other file formats in mind. 🙂 Pictures seem to be fetched fine, and I saw no videos or any other attachments than PDFs. Concerning the missing text, I was referring to my initial link: This page contains at least a summary, a TOC, and some categories. As already shown by the above screenshot, this information seems to be missing. I could test this with some more pages if needed. 👍 And yes, I perfectly understand that external links are not a real solution (not even a workaround) — but as a short-term "hack" (until this gets solved) it might be considered more intuitive for new users than the information currently displayed (which I obviously did not fully understand without your explanation in this issue 🫢). What do you think? Thanks for the great work — and in case I could help with some more testing, just let me know pls. |
Hi, I stumbled over a dead link in the ZIM file.
The iFixit German archive has just been downloaded on a fresh install of Kiwix.
Now I wanted to verify that the page is really missing in the wiki, but "unfortunately" it's not. 🙂
Here's the link:
https://de.ifixit.com/Device/Lenovo_ThinkPad_T460p
Original link:
/Document/PDhLL6RFxYZE3Hre/t460p_hmm_en_sp40k04964_02.pdf
(The above was copied from the Kiwix error.)
The text was updated successfully, but these errors were encountered: