Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve title text extraction #8

Closed
davorpa opened this issue Sep 17, 2022 · 3 comments · Fixed by #11
Closed

Improve title text extraction #8

davorpa opened this issue Sep 17, 2022 · 3 comments · Fixed by #11

Comments

@davorpa
Copy link
Member

davorpa commented Sep 17, 2022

According to current code

const [link, ...otherStuff] = listItem; // head of listItem = url, the rest is "other stuff"
entry.url = link.url;
entry.title = link.children[0].value;
// remember to get OTHER STUFF!! remember there may be multiple links!

first node children[0] is used as resource titles without check if there are more meaningfull tokens. So the rest is stripped making sometimes difficult to do a search by title of resources.

image

Therefore a escape in resources title links part is needed when submitting and make a rebuild Markdown here is mandatory

Context

See EbookFoundation/free-programming-books#7086
Related with #2 (same workarround)

@eshellman
Copy link
Contributor

first node children[0] is used as resource titles without check if there are more meaningfull tokens.

Could you present an example from our current parsed data? Thanks

@davorpa
Copy link
Member Author

davorpa commented Sep 24, 2022

first node children[0] is used as resource titles without check if there are more meaningfull tokens.

Could you present an example from our current parsed data? Thanks

Not at all since EbookFoundation/free-programming-books#7086 has been already fixed. Anyway I see you merge #11. Should I do anything more?

@eshellman
Copy link
Contributor

I think we're good

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants