Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling bilingual documents with weaving (sequential vs parallel) #101

Open
ronaldtse opened this issue Jan 4, 2021 · 21 comments
Open
Assignees
Labels
enhancement New feature or request

Comments

@ronaldtse
Copy link
Contributor

The JCGM_200_2012 document uses a different layout than the bilingual SI Brochure which is sequential in "document".

The SI Brochure is composed of the English and French documents placed one after another. (document-level weaving).

JCGM 200 is composed of multiple weaving methods, where two documents are stitched together at different points, sequential in some but parallel in others.

There are different cases that need encoding in this layout.

Sequential

This section of English text corresponds to French text, show English first and French next.

English French
Screen Shot 2021-01-04 at 11 26 00 AM Screen Shot 2021-01-04 at 11 26 22 AM

Parallel

Side by side

English text corresponds to French text, placed side by side

Example 1 Example 2
Screen Shot 2021-01-04 at 11 27 19 AM Screen Shot 2021-01-04 at 11 28 36 AM

Shared element between both languages

Document element applies to both text, e.g. 1 table/image shared by both languages

Example 1 Example 2
Screen Shot 2021-01-04 at 11 27 53 AM Screen Shot 2021-01-04 at 11 29 34 AM

@Intelligent2013 could you spell out the requirements for "weaving/stitching" the bilingual documents?

Originally posted by @ronaldtse in #88 (comment)

@ronaldtse ronaldtse added the enhancement New feature or request label Jan 4, 2021
@Intelligent2013
Copy link
Contributor

Some additional cases for further decision making:

  • left and right blocks align on start of claim, not paragraph, note, example
    изображение
    so there is a vertical misalignment between block (NOTE 3 in this case) at the end of claim:
    изображение

It means that 'en' claim[1] can be displayed near 'fr' claim[1], ... 'en' claim[n] near 'fr' claim[n].

  • there are cells, which contain only one value (m, kg, ....) and equal values (kelvin, mole)
    изображение

so need to check cell value when merge cell from different languages.

  • there are images with text on each language:
    изображение

and common image (without text) for both languages

изображение

  • Figure with title displays as independent image:
    изображение

then for another language:

изображение

  • need to check how to do two columns footnotes in Apache FOP:
    изображение

@Intelligent2013
Copy link
Contributor

  • left and right blocks align on
    • image,
    • table,
    • bibitem

@Intelligent2013
Copy link
Contributor

So there are 3 cases to display images:

  1. display own image for each language, each image in the column
  2. display common image for both languages
  3. display own image for each language, image for first language, then below image for second language.

@Intelligent2013
Copy link
Contributor

Intelligent2013 commented Jan 9, 2021

@ronaldtse The requirements for "weaving/stitching" the bilingual documents:

  1. table/image shared by both languages
    examples:
    изображение

изображение

should be marked with the attribute common="true" (in both documents) like this:

<table id="table1" common="true">...

and table in adoc for first document should have text for both languages (the logic for merging table cell via xslt is too complicated)

  1. table/image which should be displayed one after another

examples:
изображение
....
изображение

should be marked with the attribute span="true" (in both documents) like this:

<figure id="figure2" span="true">...

Some another comments:

  • at this moment I can't figure out how to do two-column footnotes like this:
    изображение

Currenly, it shows like this - one footnotes below another:
изображение

  • here is a draft resulted JCGM PDF example (generated from manually encoded xml from en- end fr- ISO Rice xmls in metanorma-collection structure), just for demontration current result:
    document.col.presentation.pdf

@Intelligent2013
Copy link
Contributor

@ronaldtse Alternative variant instead of common="true" and span="true" could be:

  • class="common"
  • class="span"

@ronaldtse
Copy link
Contributor Author

left and right blocks align on start of claim, not paragraph, note, example

The easy way is to align on every (significant) document element. I wonder if this should be configurable (i.e. which elements should align across languages.

so there is a vertical misalignment between block (NOTE 3 in this case) at the end of claim:

This should be an anomaly as the original document was done manually.

there are cells, which contain only one value (m, kg, ....) and equal values (kelvin, mole)

I consider this table a shared table that is encoded as bilingual and inseparable into two separate per-language tables.

there are images with text on each language:

Let's consider these part of the language-specific image.

So there are 3 cases to display images:

  • display own image for each language, each image in the column
  • display common image for both languages
  • display own image for each language, image for first language, then below image for second language.

Correct.

Currenly, it shows like this - one footnotes below another:

This is acceptable for now.

@Intelligent2013
Copy link
Contributor

The easy way is to align on every (significant) document element. I wonder if this should be configurable (i.e. which elements should align across languages.

@ronaldtse would you like to configure it:

  • via xslt by specifying a list of kind of elements which should align across languages, for example across_elements='clause note', i.e. all clauses, notes will be aligned across languages
    or
  • via .adoc markup by specifying some mark/class for each concrete element that should be aligned across languages?

@ronaldtse
Copy link
Contributor Author

ronaldtse commented Jan 19, 2021

Both:

ADOC: specifying a list of kind of elements which should align across languages, for example across_elements='clause note', i.e. all clauses, notes will be aligned across languages

This can be a configurable attribute in Adoc.

and

ADOC: specifying some mark/class for each concrete element that should be aligned across languages

  • We can have the 2 language pairs "match" some sort of anchor or ID, then we know they are parallel. (e.g. [en=my_id] will match [fr=my_id]
  • A shared element should have both the English and French IDs marked (e.g. [en=my_id,fr=my_id])

Should work right?

@opoudjis
Copy link
Contributor

  • We can have the 2 language pairs "match" some sort of anchor or ID, then we know they are parallel. (e.g. [en=my_id] will match [fr=my_id]
  • A shared element should have both the English and French IDs marked (e.g. [en=my_id,fr=my_id])
    Should work right?

Apart from the minor detail that you've just made this markup up, Asciidoc will not do anything useful with it, and in any case the collections processing will destroy it because they will insert the document suffix after the id, precisely in order to guarantee identifier uniqueness in the aggregated document.

You cannot just insert [en=my_id,fr=my_id] and have it automatically work. Whatever ends up being put in will be extra work and novel markup. It is likely to be a novel, inter-document bookmark, which will not have an id but a name attribute, so that it can be exempted from the global and entirely correct requirement that id attributes must be globally unique within a collection.

@Intelligent2013
Copy link
Contributor

I agree that id should be unique in the collection xml.

Some thoughts how XSLT will process xml in this manner (preliminary solution, I have to do some experiments):

  • 1st document is lead
  • 2nd document is slave
  • xslt process each element that should be aligned from 1st document and process match element from 2nd document.
  • if 1st document has an additional element (there isn't in 2nd), then it show as is, right column is empty.
  • if 2nd document has an additional element (there isn't in 1st), then it show as is, left column is empty.
  • To match element between two languages we can have a few methods:
    • match by element number, (examples: 1st clause from en doc match to 1st clause from fr doc. 1st note in 2nd clause from en doc match to 1st note in 2nd clause)
    • if documents are mismatched in elements (which should be aligned), then user have to add by some name/bookmark attribute in adoc. Examples: one document has an additional clause, or note. It means that next one (common element in both documents) should be marked with the additional unique attribute name/bookmark.
    • if both documents have matched notes, but don't want to align them for whole document, just only for some notes, then we have to add attribute for element - it may be unique name/attribute and just attribute cross-align (for example).

Intelligent2013 added a commit to metanorma/mn-native-pdf that referenced this issue Jan 19, 2021
Intelligent2013 added a commit to metanorma/mn-native-pdf that referenced this issue Jan 19, 2021
Intelligent2013 added a commit to metanorma/mn-native-pdf that referenced this issue Jan 20, 2021
Intelligent2013 added a commit to metanorma/mn-native-pdf that referenced this issue Jan 20, 2021
Intelligent2013 added a commit to metanorma/mn-native-pdf that referenced this issue Jan 20, 2021
Intelligent2013 added a commit to metanorma/mn-native-pdf that referenced this issue Jan 20, 2021
Intelligent2013 added a commit to metanorma/mn-native-pdf that referenced this issue Jan 20, 2021
Intelligent2013 added a commit to metanorma/mn-native-pdf that referenced this issue Jan 20, 2021
@Intelligent2013
Copy link
Contributor

JCGM XSLT updated to produce bilingual document with these rules/properties:

  • table/figure attributes common='true' and span='true' - see above
  • to align of concrete kind of elements (note, term, p) across languages - specify it in property (xslt variable) align-cross-elements
  • to align of the concrete element (same place in both documents hierachy) across languages - specify attribute cross-align="true" for element in both documents
  • to align of the concrete element (may be different place in document hierarchy) across languages - specify attribute name="unique name for document(not documents) in elements for both documents

note: attributes names can be chaged, or changed to class. etc. It can be changed in prototype xslt.

There are a few restrictions:

  • alignment for list item works only for first paragraph in list item
  • alignment for tables works only for whole table (alignment on table's title)
  • if second document has some additional element that determined to show as across languages (i.e. element's name there is in align-cross-elements propery, or marked with attribute @cross-align), then it can't be showed. It is very complicated logic for xslt to find such 'unknown/non-linked' element and I can't figure out how to do it.

So, if this solution is acceptable, then in adoc these property should be added:

  • in bibliography - property align-cross-elements (specify xml block element's names delimeted by space)
  • in document body:
    • properties common='true' and span='true' for table and figure
    • property cross-align="true"
    • property name (unique name in document)

@opoudjis
Copy link
Contributor

opoudjis commented Jan 24, 2021

Wait. These are updates to the information model of Metanorma, and I need to understand them before I can approve them, and make sure they are clear within the context of Metanorma as well. So:

  • I still do not understand what align-cross-elements is doing, and why it belongs in the bibliography. Is this a specification that, say, the elements "p, note, term" shall always be aligned in bilingual text? If it is, it does not belong in the semantic XML, and I'm not sure that it even belongs in Presentation XML; if I do inject it there, it will be in //bibdata/ext, and it will be as separate tags, not space-delimited. So presumably, a repeating //bibdata/ext/bilingual-align-element tag.

  • I'm not going to use @common and @span, which are much too open-ended in interpretation. They are mutually exclusive anyway, so instead of @common, @span, @cross-align, I suggest using @multilingual-rendering = common (or shared), @multilingual-rendering = full-width, @multilingual-rendering = cross-align respectively.

  • I'm not enthusiastic about @name, but I can't come up with anything better. But @name is distinct from @multilingual-rendering = name; the latter means "align with any element in the other document which has the same name attribute.

If you're ok with these, I'll realise them in metanorma/metanorma-standoc#420

@opoudjis
Copy link
Contributor

@Intelligent2013
Copy link
Contributor

  • I still do not understand what align-cross-elements is doing,

Example1: align-cross-elements="clause", i.e. clauses (begin of clauses) always be aligned in bilingual text.
изображение

Example2: align-cross-elements="clause li", i.e. clauses, list items (begin of clauses and list items) always be aligned in bilingual text.
изображение

Example3: align-cross-elements="clause li p", i.e. clauses, list items and paragraphs (begin of clauses, list items and paragraphs) always be aligned in bilingual text.
изображение

Note that in align-cross-elements we set the name of xml elements (<clause>, <p>, <li>) not @name attribute.

and why it belongs in the bibliography.

Actually align-cross-elements can be in any place, may be it would be better in metanorma-collection/align-cross-elements or metanorma-collection/align-cross-elements/manifest.

Is this a specification that, say, the elements "p, note, term" shall always be aligned in bilingual text? If it is, it does not belong in the semantic XML, and I'm not sure that it even belongs in Presentation XML; if I do inject it there, it will be in //bibdata/ext, and it will be as separate tags, not space-delimited.

No problem, the element's structure does not matter.

So presumably, a repeating //bibdata/ext/bilingual-align-element tag.

May be, is there a use case of this tag in real documents?

  • I'm not going to use @common and @span, which are much too open-ended in interpretation. They are mutually exclusive anyway, so instead of @common, @span, @cross-align, I suggest using @multilingual-rendering = common (or shared), @multilingual-rendering = full-width, @multilingual-rendering = cross-align respectively.

Agree.

But @name is distinct from @multilingual-rendering = name; the latter means "align with any element in the other document which has the same name attribute.

Sorry, I don't understand it. In my proposal @name means, for example:

  • in first document we set
<figure id="figureC-2" name="figtest">
          <name>Figure C.2 — Stages of gelatinization</name>
  • in second document we set
<figure id="figureA-1" name="figtest">
              <name>Figure A.1test — Diviseur d’échantillon de type «Bon diviseur» (@common=true)</name>

in resulted PDF:
изображение

I.e. element with @name="figtest" from 2nd document should be aligned near element @name="figtest" from 1st document.

But what @multilingual-rendering = name does mean in your proposal? I don't see a difference.

@ronaldtse
Copy link
Contributor Author

Also a note that we will need validation on compilation for the semantic XML, i.e. so that the user won't miss alignments (e.g., "the element foo [en] does not have a corresponding element in [fr]").

@opoudjis
Copy link
Contributor

@Intelligent2013 I'm proposing:

<figure id="figureC-2" name="figtest" multilingual-rendering = "name">
          <name>Figure C.2 — Stages of gelatinization</name>

<figure id="figureA-1" name="figtest" multilingual-rendering = "name">
              <name>Figure A.1test — Diviseur d’échantillon de type «Bon diviseur» (@common=true)</name>

It seems laboured, perhaps, but I don't want to presuppose what @name is used for in documents, and @multilingual-rendering makes it explicit that there is alignment, and is consistent with the other instances of @multilingual-rendering.

@Intelligent2013
Copy link
Contributor

It seems laboured, perhaps, but I don't want to presuppose what @name is used for in documents, and @multilingual-rendering makes it explicit that there is alignment, and is consistent with the other instances of @multilingual-rendering.

I see. Agree.

Regarding to these use cases (https://www.ctan.org/pkg/paracol):

https://tex.stackexchange.com/questions/308260/parallel-text-translation-including-the-same-double-parallel-heading-numbering as an FYI to myself...

may be it would be better to rename some properties:

  • align-cross-elements -> parallel-elements, or use bilingual-align-element if it exist already.
  • @multilingual-rendering = full-width -> @multilingual-rendering = double-column
  • @multilingual-rendering = cross-align -> @multilingual-rendering = parallel

for a clearer understanding of objectives the elements.

@opoudjis
Copy link
Contributor

opoudjis commented Jan 25, 2021

  • align-cross-elements -> //bibdata/ext/parallel-align-element
  • @multilingual-rendering = full-width -> @multilingual-rendering = all-columns
  • @multilingual-rendering = cross-align -> @multilingual-rendering = parallel

@Intelligent2013
Copy link
Contributor

@manuel489 and @anermina could you encode https://www.bipm.org/utils/common/documents/jcgm/JCGM_200_2012.pdf into asciidoc? Thank you!

@ronaldtse
Copy link
Contributor Author

We're moving JCGM documents into https://github.com/metanorma/mn-samples-jcgm. Thanks!

@ronaldtse
Copy link
Contributor Author

@manuel489 @anermina we will wait for JCGM to provide further information of JCGM 200, we don't want to manually convert that document, it's long. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants