Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PLATEAU Document 02 work #101

Merged
merged 13 commits into from
Jun 7, 2024
Merged

Conversation

hmdne
Copy link
Contributor

@hmdne hmdne commented Jun 6, 2024

Tasks that need to be done for the document 02 to be parsed (it will grow over time):

Breaking:

Semantic:

  • handle aside and table headers
  • handle code block
  • handle sections
  • handle Annexes somehow

Minor bugs:

  • in 1.4.2: line break character and list item issue
  • in 1.4.2.4 (and further): numbered list made with CSS markup based on unnumbered list
  • missing images fix introduced a bug that floods document with FIXME

Manual fixes needed:

  • Missing image (search for // FIXME)
  • in 1.4.2.2: 4th list level issue (using full width spaces and ・ to denote a 4th level list item)
  • in 1.4.2.2: list misaligned
  • in 4.4.1: list misaligned
  • in 5.4.5: lists to correct
  • some code blocks aren't distinguished from aside blocks. this needs to be manually corrected
  • in C.3.1.3: some extraneous \
  • in H.2.2: HTML tags
  • in general - inspect the lists. they are kinda hacky, since the document is hacky.

cc: @ronaldtse @ReesePlews

Metanorma PR checklist

@hmdne
Copy link
Contributor Author

hmdne commented Jun 6, 2024

First question: this document starts with section 0. Should I remove the numbers, and if yes, then how to handle that section 0?

@hmdne hmdne marked this pull request as draft June 6, 2024 20:08
Copy link

codecov bot commented Jun 6, 2024

Codecov Report

Attention: Patch coverage is 96.15385% with 1 line in your changes missing coverage. Please review.

Project coverage is 98.34%. Comparing base (defb04a) to head (78225b1).
Report is 67 commits behind head on main.

Files Patch % Lines
lib/coradoc/reverse_adoc/postprocessor.rb 85.71% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #101      +/-   ##
==========================================
+ Coverage   96.67%   98.34%   +1.66%     
==========================================
  Files          42       46       +4     
  Lines        1054     1327     +273     
==========================================
+ Hits         1019     1305     +286     
+ Misses         35       22      -13     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@hmdne
Copy link
Contributor Author

hmdne commented Jun 6, 2024

I have decided to set section 0's style to "abstract".

Then we have a number of annexes. Those are lettered. It would make sense to set them as "appendix", but then:

          when "改訂履歴" # Document history

is also appendix.

@hmdne
Copy link
Contributor Author

hmdne commented Jun 6, 2024

So, I handled annexes as appendices. But document history is also an appendix for now.

@hmdne hmdne marked this pull request as ready for review June 6, 2024 23:56
@hmdne
Copy link
Contributor Author

hmdne commented Jun 6, 2024

This is ready. I can find no more issues that can be fixed automatically. I am including the document 02 below:

document.tar.gz

@ReesePlews
Copy link

hello @hmdne thank you for starting work on document two.

the issues about "strange tables" may either be mistakes or just editorial changes to the document to help mitigate page breaks, etc. they could be restyled manually to make them more normal, then the conversion tool would handle them, and metanorma would compensate for things like page breaks.

what would you like me to do to help with this?

@hmdne
Copy link
Contributor Author

hmdne commented Jun 7, 2024

@ReesePlews The problem in general is much more complicated unfortunately. There are different semantics in HTML and AsciiDoc, for instance, in AsciiDoc you should define a number of columns up front (in some cases it's mandatory, in some not?). Then there are colspans and rowspans - if only they were applied correctly, we wouldn't have an issue, but the document often has tables with colspans and rowspans that escape the table boundary. After all, it's not a real error in HTML - the table renders correctly, but in AsciiDoc we basically have no delimiter for rows like in HTML (at least if the number of columns is specified) and it's more strict from what I experienced.

Either way, I have fixed those issues and both 01 and 02 documents do handle all tables correctly. So in some cases we just adjust the colspans/rowspans, in some we add empty cells that are missing in a row. You can take a look at the test [1], for the cases that I have covered, though I'm sure there will be more problems in the future.

In case of document 02, there was one table which I had to add cells in front of, not at the end of the row.

Anyway, to be clear, the document is now rendered correctly, mostly, with the issues I have mentioned above.

[1] https://github.com/hmdne/coradoc/blob/6f24ceb124b1804420439bc1b35641bc354c2075/spec/reverse_adoc/lib/reverse_adoc/converters/table_ensure_spec.rb

@ReesePlews
Copy link

@hmdne perhaps we should add this to the mn-samples-plateau repo, so we can work with that more easily?
something like 002-v4 ?

hmdne added a commit to metanorma/mn-samples-plateau that referenced this pull request Jun 7, 2024
@hmdne
Copy link
Contributor Author

hmdne commented Jun 7, 2024

@ReesePlews I have created a PR to this repository, feel free to merge.

@ReesePlews
Copy link

@hmdne thank you for the explanation. if i am understanding correctly there were some cases in the HTML for both doc1 and doc2 that had to be corrected manually, is that correct? and those manual corrections you made are in the html files currently found in reference-docs folder in the repo? if so, should we change the name so we know they are modified?

@hmdne
Copy link
Contributor Author

hmdne commented Jun 7, 2024

@ReesePlews

No. I made no corrections by hand. I added a verbatim output from reverse_adoc and marked some places (in the first post) where I found hand corrections will be needed.

Anyway, while the tables were quite hard to do, this I was able to solve with algorithms. The bigger problem are lists (and indentation). In the incoming document, they tend to be... without any apparent order. Those are made using one or more of:

  • CSS styles
  • hand written (ie. using 1., 1), other symbols)
  • wide space (the CJK character) indentation

I was able to just handle the CSS styles, which is why sometimes there are fragments like:

[none]
**** List item

@hmdne
Copy link
Contributor Author

hmdne commented Jun 7, 2024

To be clear, I used this document as input, also verbatim:

#77

(The second HTML)

@hmdne
Copy link
Contributor Author

hmdne commented Jun 7, 2024

where I found hand corrections will be needed.

In AsciiDoc though, not in HTML.

@ReesePlews
Copy link

@hmdne thank you for the information. many things are new to me about this conversion process, so there is some confusion. for both docs 1 and 2 once a converted set of asciidoc files have been produced, i dont think we will be using the original html anymore, as the metanorma tool chain would be generating all of the doc types (html, doc, pdf). of course we want the converter to handle as much as possible but when the input html data is so messed up, there is only so much it can do.

if i am understanding correctly, the manual revisions you have made now (in the original html for both documents) and is enough to get highly usable asciidoc bases ready for further work.

please do the merge to the main branch. i am still somewhat hesitant about doing that, especially when so much manual work has been done i dont want to cause a problem. thank you

@hmdne
Copy link
Contributor Author

hmdne commented Jun 7, 2024

@ReesePlews As I said before, no manual modifications have been done to the original HTML document (nor to AsciiDoc). All modifications are done automatically, on the HTML document during the conversion process.

If you use reverse_adoc (current branch) on the input document, you will have exactly the same AsciiDoc output as I have committed to the -samples repo.

Do you want me to merge this PR, or metanorma/mn-samples-plateau#23 ?

@ReesePlews
Copy link

@hmdne thank you for clarifying and updating the conversion tool to handle more unique import cases.

yes, please the PR so that doc2 will appear in the sources directory. later today i will try and run the converter on doc2 using the instructions from our work with doc1.

thank you again for making these updates to the conversion tool.

hmdne added a commit to metanorma/mn-samples-plateau that referenced this pull request Jun 7, 2024
@ReesePlews
Copy link

@hmdne a question...

i have tried to modify my .gitignore file so that my working tests do not get pushed to the main repo. however github desktop is asking me to submit these to "main" should i do that, for the following two files?

image

image

@hmdne
Copy link
Contributor Author

hmdne commented Jun 7, 2024

Gemfile surely shouldn't be gitignored in upstream code. Also, instead of using .gitignore, you may use a local exclude:

.git/info/exclude

It has the same syntax as .gitignore, but is local to your repository. It works alongside .gitignore.

@ReesePlews
Copy link

thank you @hmdne i will remove that line from gitignore and read about "exclude"

@ReesePlews
Copy link

hello @hmdne what should i do with the edit to my Gemfile? should i commit back to the server or stash that?

@ronaldtse ronaldtse merged commit d0fc234 into metanorma:main Jun 7, 2024
11 of 16 checks passed
@hmdne
Copy link
Contributor Author

hmdne commented Jun 7, 2024

@ReesePlews IIRC you just uncommented "sassc". It should be safe to commit.

ReesePlews added a commit to metanorma/mn-samples-plateau that referenced this pull request Jun 7, 2024
@ReesePlews
Copy link

hello @hmdne i committed the Gemfile but it stopped with that same --no-no-install-fonts error can you check that when you have time. thank you.
https://github.com/metanorma/mn-samples-plateau/actions/runs/9416095705/job/25938475324

@ronaldtse
Copy link
Contributor

@ReesePlews I will update the workflow. It's a separate issue.

@ronaldtse
Copy link
Contributor

@ReesePlews the fixes are here: metanorma/mn-samples-plateau#24

@ReesePlews
Copy link

thanks @ronaldtse, does each repo need to be updated?

@ronaldtse
Copy link
Contributor

@ReesePlews yes.

FYI @CAMOBAP the impact of not updating v1 is real.

@CAMOBAP
Copy link
Contributor

CAMOBAP commented Jun 7, 2024

@ronaldtse I understand, but I will still try to defend the approach to not do it.

GitHub Actions delivery is not much different from delivering other software, we have to update major on breaking changes, this is what happens with install-fonts flag rename

If we decide to update v1 instead of moving to the next one, why do we use versioning for actions at all?

Versioning is not always about comfort but about consistency too.

We have some abandoned flavors that are working with old version of metanorma, v1 update will break them

If one day metanorma will introduce breaking changes i suppose we will not update previous one, so why should we do this way with GHA?

@ronaldtse
Copy link
Contributor

I don’t disagree with the principles you stated but I have two comments.

  1. The “ruby/setup-ruby” action supports all versions of Ruby, and has been at v1 forever. This is good for ensuring that users who adopted it previously will continue to have their jobs working.

  2. If an action only works for a limited set of versions of Metanorma, we should lock the action only for those versions.

From the users’ perspective, doing an upgrade on an action step, when they have no idea what has even changed (and clearly they won’t care about the difference in no-install or no-no-install), is just an annoyance and leaves a bad taste.

In any case, we cannot have an action version that fails by default.

@ronaldtse
Copy link
Contributor

And this is probably the moment we think about allowing users to choose Metanorma versions in the action!

@CAMOBAP
Copy link
Contributor

CAMOBAP commented Jun 8, 2024

In any case, we cannot have an action version that fails by default.

Good point, let me implement the fix that will handle flags based on a version of metanorma

And this is probably the moment we think about allowing users to choose Metanorma versions in the action!

Users already have control over this, by choosing the version in:

  1. docker container setup
  2. actions-mn/setup

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

5 participants