-
Notifications
You must be signed in to change notification settings - Fork 17
When run on kenkyuusha certain headers are incomplete #5
Comments
I added the following lines in line 143 of book.c: boef: たしなむ【嗜む】 33548 130 boef: 嗜む 38028 1326 |
Basically, things go wrong in book_undupe(book); We need to be smarter about what we are removing. |
I would propose to save the heading with the largest content when removing in book_undupe(book). I don't understand your code at first view. Could you have a look at it? |
I changed the undupe code with this quicksort and removeduplicates. The resulting file is a bit smaller but it seems to work as it should. int partition_entries(Book_Entry arr[], int low, int high)
} /* The main function that implements QuickSort
} int removeDuplicates_subbook(Book_Subbook* subbook)
} static void subbook_undupe(Book_Subbook* subbook) { |
It crashes on gakken though. |
And doesn't work as it should. Working on an updated version. |
I think the easiest fix is just to check lengths when looking for dupes. If there is a dupe with a longer header length, swap it with the current entry and delete the dupe. You shouldn't have to sort anything. That being said, I'm not sure you actually want to use headers for anything. All of that information can be found in the entry text, and you are going to have to parse all of that stuff out with regex anyway. Honestly, if anything, this made me wonder if I should even be exporting the headers out of zero-epwing as AFAIK they are just some weird artifact of the EPWING format. |
For reference articles you don't have a header in the entry text itself: Looking at your code to remove dupes, I don't see how you can get at the entry which you are comparing from a Page-pointer solely. |
The heading of たしなむ is "heading": "嗜む" while it should be "たしなむ【嗜む】"
The text was updated successfully, but these errors were encountered: