Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate libreoffice conversions into a2x #24

Open
dagwieers opened this issue Feb 13, 2012 · 16 comments
Open

Integrate libreoffice conversions into a2x #24

dagwieers opened this issue Feb 13, 2012 · 16 comments

Comments

@dagwieers
Copy link
Owner

Since some time LibreOffice can do conversion from various formats to other formats using the command line. For example:

libreoffice --convert-to pdf some-file.fodt
libreoffice --convert-to doc some-file.fodt

It would be nice to make a2x understand that one can go from AsciiDoc to PDF, DOC or PPT by going through the ODF backend and using libreoffice for the final conversion.

@ghost ghost assigned elextr Feb 13, 2012
@elextr
Copy link
Collaborator

elextr commented Feb 14, 2012

This is possible, the steps are:

  1. fodt generation needs to go through a2x, that means we need a backend-opt to specify fodt or packaged.
  2. the a2x-backend can then check for -f pdf and run the extra command as above
  3. for doc format, since -f has a fixed set of options and doc isn't supported without the odt backend add another backend-opt to specify doc output

@dagwieers
Copy link
Owner Author

Unless we would drop the packaged support, and have libreoffice create a package ODT file instead. Merging styles would not be possible in this case (or at least not at the moment), but we could add the functionality to LibreOffice (or rather convince someone to implement that ;-))

@elextr
Copy link
Collaborator

elextr commented Feb 15, 2012

Then files can't be used with open office. The whole backend becomes for the use of Libreoffice only. By generating packaged files we support both suites.

Also flat ODT files can be large if they contain many images since embedding increases the size by at least 30% whilst a zipped package might even shrink the images.

So I don't think this is a good idea just to avoid an option.

@dagwieers
Copy link
Owner Author

Agreed, but the a2x solution does need some more love. And somehow it seems easier to create a tool that turns a flat XML ODF (.fodt) into an ODF (.odt), rather than creating the content.xml. We lack the meta.xml now.

Based on the minimal-odf directory it should be quite straightforward to do, although it might require some python XML foo.

@elextr
Copy link
Collaborator

elextr commented Feb 15, 2012

Yes, "should be", but I could never get LO or OO to read a package without their extra files, they always said it was damaged, thats why I took the approach of simply replacing the contents in an existing package. It would be nice to try some more, the problem is time, maybe soon.

For the meta, it should be easy to get asciidoc to write a meta.xml file if say an odf-meta-file attribute is set, that can then be included in the package in place of the existing meta.xml

@dagwieers
Copy link
Owner Author

I fixed that it says it is damaged ! The solution is to make sure the mimetype file is in the ZIP file as the first file, and uncompressed (which should be the default considering its size). The procedure is in the Makefile:

cd minimal-odf; \
echo -n 'application/vnd.oasis.opendocument.text' >mimetype; \
echo '<office:document-styles office:version="1.2" **...snip all namespaces...** >' >styles.xml; \
cat ../backends/odt/asciidoc.odt.styles >>styles.xml; \
echo '</office:document-styles>' >>styles.xml; \
rm -rf ../backends/odt/asciidoc.ott; \
zip -X -r ../backends/odt/asciidoc.ott mimetype *

If you make sure this is the case, you can check with "file" on Linux to see if it is considered an "OpenDocument Text":

[dag@moria asciidoc-odf]$ file backends/odt/asciidoc.ott 
backends/odt/asciidoc.ott: OpenDocument Text

The minimum is:

  • mimetype
  • content.xml
  • styles.xml

I spend nearly 4 hours to nail this to the minimum working set. Unfortunately, you need to include all the namespaces inside the styles.xml, which are not inside my stylesheets (because the very first element is different in flat XML ODF files wrt. packaged ODF)

@dagwieers
Copy link
Owner Author

The meta-information is also created by backends/odt/odt.conf, it contains various fields, including the document title used as a field on the cover or on the first page. I don't know how a2x will generate this meta.xml by itself though.

@elextr
Copy link
Collaborator

elextr commented Feb 16, 2012

On 16 February 2012 17:54, Dag Wieërs
[email protected]
wrote:

The meta-information is also created by backends/odt/odt.conf, it contains various fields, including the document title used as a field on the cover or on the first page. I don't know how a2x will generate this meta.xml by itself though.

Yes, what I suggest is that if the not-flat-odf is defined then write
the meta inside an XML comment suitably delimited so a2x backend can
find it and copy it.

eg

ifdef::not-flat-odf[]

endif::not-flat-odf[]

then it can be found the way the image files are found now

Cheers
Lex


Reply to this email directly or view it on GitHub:
#24 (comment)

@elextr
Copy link
Collaborator

elextr commented Feb 16, 2012

On 16 February 2012 17:49, Dag Wieërs
[email protected]
wrote:

I fixed that it says it is damaged !

That isn't the problem I had with writing the packages from python,
the mimetype was always first and not compressed (and confirmed by hex
dumping the file :)
The problem was something else. Since that was a while ago and things
have moved on significantly since then, I will look at it again based
on the makefile.

In the a2x-backend.py file I just didn't bother with mimetype first
since neither LO or OO seemed to care, I guess that was Murphys law.

Cheers
Lex

The solution is to make sure the mimetype file is in the ZIP file as
the first file, and uncompressed (which should be the default
considering its size). The procedure is in the Makefile:

        cd minimal-odf; \
        echo -n 'application/vnd.oasis.opendocument.text' >mimetype; \
        echo '<office:document-styles office:version="1.2" **...snip all namespaces...** >' >styles.xml; \
        cat ../backends/odt/asciidoc.odt.styles >>styles.xml; \
        echo '</office:document-styles>' >>styles.xml; \
        rm -rf ../backends/odt/asciidoc.ott; \
        zip -X -r ../backends/odt/asciidoc.ott mimetype *

If you make sure this is the case, you can check with "file" on Linux to see if it is considered an "OpenDocument Text":

[dag@moria asciidoc-odf]$ file backends/odt/asciidoc.ott
backends/odt/asciidoc.ott: OpenDocument Text


Reply to this email directly or view it on GitHub:
#24 (comment)

@dagwieers
Copy link
Owner Author

Hmm, when I changed the a2x file to specifically make it uncompressed and first, it suddenly worked. (At least file reported the correct file type and not just a zip file). You can find the change here: 998c4b8#diff-3

But it is only one of the requirements. Now with jing/xmllint working again, I noticed that it can complain also for certain invalidities. One was an incomplete set of namespaces. Another was related to a missing or broken META-INF/manifest.xml, which strictly speaking is not necessary according to the specifications. I understand how hard it is to find the cause of LibreOffice not being able to open the document (or saying it is damaged) without any clues whatsoever :-(

I was planning to open bug-reports to LibreOffice regarding these issues, I just need to break the minimal-odf in various simple ways and attach them to one of the bug-reports. Either the error-message should be more specific, or it should be more liberal in acceptance.

@elextr
Copy link
Collaborator

elextr commented Feb 17, 2012

On 16 February 2012 20:24, Dag Wieërs
[email protected]
wrote:

Hmm, when I changed the a2x file to specifically make it uncompressed and first, it suddenly worked. (At least file reported the correct file type and not just a zip file). You can find the change here: 998c4b8#diff-3

Yeah, as I said Murphy ensured that it worked for me without the
change, but failed for someone else :) good fix.

But it is only one of the requirements. Now with jing/xmllint working again, I noticed that it can complain also for certain invalidities. One was an incomplete set of namespaces. Another was related to a missing or broken META-INF/manifest.xml, which strictly speaking is not necessary according to the specifications. I understand how hard it is to find the cause of LibreOffice not being able to open the document (or saying it is damaged) without any clues whatsoever :-(

Thats a good suggestion, if I have trouble when I try again I will run
xmllint and see if it gives any hints.

I was planning to open bug-reports to LibreOffice regarding these issues, I just need to break the minimal-odf in various simple ways and attach them to one of the bug-reports. Either the error-message should be more specific, or it should be more liberal in acceptance.

Prefer liberal acceptance :)


Reply to this email directly or view it on GitHub:
#24 (comment)

@dagwieers
Copy link
Owner Author

I recommend jing over xmllint for debugging, as it sometimes gives better clues.

(See the README for details on how to use it)

@elextr
Copy link
Collaborator

elextr commented Feb 22, 2012

On 16 February 2012 17:49, Dag Wieërs
[email protected]
wrote:

I fixed that it says it is damaged ! The solution is to make sure the mimetype file is in the ZIP file as the first file, and uncompressed (which should be the default considering its size). The procedure is in the Makefile:

        cd minimal-odf; \
        echo -n 'application/vnd.oasis.opendocument.text' >mimetype; \
        echo '<office:document-styles office:version="1.2" **...snip all namespaces...** >' >styles.xml; \
        cat ../backends/odt/asciidoc.odt.styles >>styles.xml; \
        echo '</office:document-styles>' >>styles.xml; \
        rm -rf ../backends/odt/asciidoc.ott; \
        zip -X -r ../backends/odt/asciidoc.ott mimetype *

I should have seen it before, the makefile below doesn't make an
acceptable minimal OTT file. There is no manifest, no META-INF etc.
If you were trying to use it as the styles-doc to the a2x-backend then
yes it wouldn't work. The styles-doc must be a valid ott file
loadable by LO.

If you make sure this is the case, you can check with "file" on Linux to see if it is considered an "OpenDocument Text":

[dag@moria asciidoc-odf]$ file backends/odt/asciidoc.ott
backends/odt/asciidoc.ott: OpenDocument Text


Reply to this email directly or view it on GitHub:
#24 (comment)

@dagwieers
Copy link
Owner Author

The minimal-odf/ tree does include a META-INF and manifest.xml file, and it does open in LibreOffice. That was the sole purpose of creating the minimal ODF, it implements the minimum for LibreOffice.

@elextr
Copy link
Collaborator

elextr commented Feb 22, 2012

On 22 February 2012 19:10, Dag Wieërs
[email protected]
wrote:

The minimal-odf/ tree does include a META-INF and manifest.xml file, and it does open in LibreOffice. That was the sole purpose of creating the minimal ODF, it implements the minimum for LibreOffice.

Oh, they are already in the directory. As I noted in another post LO
seems to want a settings.xml when I try it, not sure why. Anyhow
making a settings is easy when it is known that it is needed.

Cheers
Lex


Reply to this email directly or view it on GitHub:
#24 (comment)

@dagwieers
Copy link
Owner Author

I wonder what version of LibreOffice this is because with both LibreOffice 3.4.5 and 3.5.0 I have no issues opening the minimal ODF file. But even OpenOffice 3.2.1 can open this file without a settings.xml, so I think every LibreOffice version should work correctly. Or is opening the file in Writer not showing the problem you are experiencing ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants