Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong conversion of German characters äöüß #294

Open
ai-arie opened this issue Jan 19, 2025 · 3 comments
Open

Wrong conversion of German characters äöüß #294

ai-arie opened this issue Jan 19, 2025 · 3 comments

Comments

@ai-arie
Copy link

ai-arie commented Jan 19, 2025

I'm using the Python API to convert some German documents, and I narrowed down the problem in my conversion to MarkItDown.

md = MarkItDown()
result = md.convert(file_name)
markdown_text = result.text_content

Before the conversion the document in "file_name" contains: ...für künstliche Intelligenz und zur Änderung...

After the conversion markdown_text contains: ...fΟΦr kΟΦnstliche Intelligenz und zur Ο³nderung...

Any help with that?
Preferably, the solution should not change the output when converting English text, only German (potentially other languages? did not test...)

Thanks!

@Pindar777
Copy link

I had the same issue in terms of saving the converted text as .md and found the following solution:

...

with open(md_file, 'w', encoding="utf-8") as f:
     f.write(result.text_content)

...

@ai-arie
Copy link
Author

ai-arie commented Jan 20, 2025

thanks @Pindar777 - I'm saving as .txt - but that does not work.
Have not checked for .md yet, will try it out.

@Pindar777
Copy link

@ai-arie .md or .txt should behave in the same way. As a matter of fact I could not save any output without the encoding parameter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants