Release 4.0.0 · ilius/pyglossary

Changes since 3.3.0

Require Python 3.7 or 3.8, drop support for Python 3.4, 3.5 and 3.6
Fix / rewrite setup.py
- Fix python3 setup.py sdist bdist_wheel, and pypi paackage
  - Had to move ui/ directory into pyglossary/
- Switch from distutils to setuptools
- Remove py2exe
Add interactive command line user interface
- Automatically selected if input & ouput file arguments are not passed and one of these:
  - On Linux and no $DISPLAY is not set
  - On Mac and no tkinter module is found
  - --ui=cmd flag is passed
New format support:
- Add read support for FreeDict, #206
- Add read support for Zim (Kiwix)
- Add read and write support for Kobo E-Reader Dictfile (.df)
- Add write support for DICT.org dictfmt source file
- Add read support for dictunformat output file
- Add write support for JSON
- Add read support for Dict.cc (SQLite3)
- Add read support for JMDict, #239
- Add basic read support for Wiktionary Dump (.xml)
- Add read support for cc-kedict
- Add read support for DigitalNK (SQLite3)
- Add read support for Wordset.org JSON directory
Remove Omnidic write support (Unmaintained J2ME dictionary)
Remove Octopus MDict Source plugin
Remove Babylon Source plugin
BGL Weader: improvements
DictionaryForMIDs Writer: fix non-working code
Gettext Source (po) Writer: fix info header
MOBI E-Book Writer: fix sort order, fix and test kindlegen codes, add kindlegen_path option, #112
EPUB-2 E-Book Writer: fix sort order
XDXF Reader: rewrite with etree.iterparse to avoid using too much RAM
Lingoes Source (LDF) Reader: fix ignoring info/metadata header
dict_org.py: rewrite broken plugin (Reader and Writer)
DSL Reader: fix loosing metadata/info
Aard 2 (slob) Reader:
- Fix adding css/js files as normal entries
- Add bword:// prefix to entry links
- Fix duplicate entries issue by keeping a set of blob IDs, #224
- Detect and pass defiFormat
Aard 2 (slob) Writer:
- Fix content_type detection
- Remove bword:// prefix from entry links
- Add resource files / data entries, #243
- Fix replacing image paths
- Show log events from slob.py in debug mode
- Change default compression to zlib
- Allow passing empty compression
Octopus MDict Reader:
- Read MDX file twice to load links
- Count data entries as part of len(reader) for progressbar
StarDict Writer:
- Copy "copyright" and "publisher" values to "description"
- Add source and target language codes to the end of bookname
- Add write-option stardict_client: bool
  Set True to make glossary more compatible with StarDict 3.x
- Fix broken result when sametypesequence option is given and a definitions contains |
- Allow sametypesequence=x for xdxf
- Add merge_syns option
- Allow sametypesequence=None option
XDXF Reader:
- Fix/improve xdxf to html transformation
Kobo Writer:
- Fix get_prefix algorithm and sorting order, with tests, #219
- Replace <img src=... tags with [Image: name.bmp], #219
  - and show a warning about data entries
- Additional keywords as alternatives, #232
- Fix support for alternates: duplicate entries based on word prefix, #238
- Show headword in title of alternate entries, #238, #245
- Strip full html definition, #246
CSV:
- Add delimiter option to Reader and Writer
- Read and write info
- Writer: accept bool option add_defi_format=True (default False)
AppleDict Writer:
- AppleDict Writer: replace fix_sound_link() code with a single line
- AppleDict Writer should not call glos.setDefaultDefiFormat
MDX Reader:
- Replace entry:// with bword:// in MDX Reader instead of AppleDict Writer
- Fix internal href="x:" and href="d:" links
- Fix file:// in images path, fix #243
User Interface improvements and fixes:
- ui_gtk: add About tab and more improvements
- ui_tk: replace About dialog with About tab and more improvements
- ui_cmd: improvements in progressbar
- ui_cmd: allow "=" in value of read/write options
Add a list of 208 languages and ~40 writing systems
- Detect sourceLang and targetLang from glossary name/title
- Auto-select between <b> and <big> tags depending on writing system
  - Using glos.titleElement method, used in FreeDict, JMDict and Dict.cc writers
- glos.sourceLang and glos.targetLang properties (with setters) as Lang objects
- glos.sourceLangName and glos.targetLangName properties (with setters) as str
  - Used in several plugins
Break compatibilty of plugins
- Drop support for read and write functions (outside a class)
- Now we only support Reader class and Writer class
- Reader class must have these methods
  - __init__(self, glos)
  - open(self, filename)
    - Here glossary info must be read from file and set with glos.setInfo
  - __len__(self) -> int
    - Should return the number or entries, or zero if it's too costly
  - __iter__(self) -> "Iterator[BaseEntry]"
    - Can be a generator
  - close(self)
- Writer class must have these methods
  - __init__(self, glos)
  - open(self, filename)
    - Here glossary info must be read from glos.getInfo or glos.iterInfo and written to file
  - write(self) -> "Generator[None, BaseEntry, None]"
    - Entries must be fetched with entry = yield in a while True loop:
      while True: entry = yield if entry is None: break # process and write entry into file(s)
  - finish(self)
- Read options and write options must be set to their default values as class attributes
  - See pyglossary/plugins/csv_pyg.py plugin for example
- sortKey must be an intance method of Writer, instead of a function outside any class
  - Only for plugins that need sorting before write
Refactor and cleanup Glossary class
- Removed or replaced most of class/static attributes of Glossary
  - To see the diff, run git diff 3.3.0..master -- pyglossary/glossary.py
- Removed glos.addEntry method
  - If you use it in your program, replace with glos.addEntryObj(glos.newEntry(word, defi, defiFormat))
- Removed instance methods:
  - getMostUsedDefiFormats
  - iterEntryBuckets
  - zipOutDir and archiveOutDir
    - Moved to pyglossary/glossary_utils.py
    - archiveOutDir renamed to compressOutDir
  - writeDict
  - iterSqlLines -> moved to pyglossary/plugins/sql.py
  - reverse, takeOutputWords, searchWordInDef -> moved to pyglossary/reverse.py
- Values of Glossary.plugins is changed to plugin_prop.PluginProp instances
- Change glos.writeTxt arguments
  - Replace sep1 and sep2 with entryFmt
  - Replace rplList with defiEscapeFunc, wordEscapeFunc and tail
  - Remove iterEntries, entryFilterFunc
  - Method returns Generator[None, BaseEntry, None] instead of bool
  - See for usage example:
    - pyglossary/glossary.py -> def writeTabfile
    - pyglossary/plugins/dict_org_source.py
    - pyglossary/plugins/json_plugin.py
    - pyglossary/plugins/lingoes_ldf.py
    - pyglossary/plugins/sdict_source.py
Refactor, cleanup and fixes in Entry and DataEntry classes
- Replace entry.getWord() with entry.word
- Replace entry.getWords() with entry.l_word
- Replace entry.getDefi() with entry.defi
- Remove entry.getDefis()
  - Drop handling alternate definitions in Entry objects
- Replace entry.getDefiFormat() with entry.defiFormat
- Add entry.b_word and entry.b_defi shortcuts that give bytes (UTF-8)
- Replace dataEntry.getData() with dataEntry.data
- Add __slots__ to Entry and DataEntry classes
- Fix DataEntry in indirect mode
  - Mistaken for Entry with defi=DATA, and file content discarded
  - Save resource files in user's cache directory when loading input glossary into memory
    - Move file to output glossary on dataEntry.save(...)
- Fix Entry.getRawEntrySortKey not being alternates-aware, broke StarDict Writer
- DataEntry: save: use shutil.copy if has _tmpPath, and set _tmpPath
New features of Entry
- entry.stripFullHtml(), remove <html... <head>...</head>...<body>
  - Used in Kobo and Kobo Dictfile writers
  - Add tests
Fix glos.writeTabfile:
- Remove \r from definitions and info values
- Fix not escaping word
Fix/improve html detection in definitions
Switch to lazy imports of non-standard modules in plugins
Optimize RAM usage of indirect conversion
- To write StarDict, EPUB and DictionaryForMIDs glossaries, we need to load all entries into RAM to sort them
Other new features of Glossary class
- glos.getAuthor() to get "author", or "publisher" (as fallback)
- glos.removeHtmlTagsAll() method, can be called by plugins' writer
- glos.collectDefiFormat(maxCount) extract defiFormat counts
  - by reading first maxCount entries. (then iterator will be reset)
  - Used in StarDict Writer
- Show memory usage in trace mode
Bug fixes and improvements in code base
- Apply entry filter when iterating over reader, fix #251
  - Fixes wrong sort order for some glossaries (converting to StarDict or other formats that need sort)
- Fixes and improvements in TextGlossaryReader class
  - Fix ignoring glossary defaultDefiFormat
- Fix evaluating None value in read/write options
Support reading multi-file Tabfile or other text formats
- Example: file.txt, file.txt.1, file.txt.2
- Need to add file_count info key, for example: ##file_count 3
Fixes in Tabfile Writer
- Fix not escaping ""
Add/update documentation
- Update README.md
- Add Termux guides in doc/termux.md
- Move AppleDict guides to doc/apple.md
- Move LZO notes to doc/lzo.md
- Minify and compress .svg files in doc/ folder
Switch to f-strings, pep8 fixes, add types, style changes and refactoring
New command line flags:
- --log-time to show datetime in logs (override log_time in config.json)
- --no-alts to disable alternates handling
- --normalize-html to lowercase tags (for now)
- --cleanup and --no-cleanup
- --info to save .info file alongside output file

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

4.0.0

Changes since 3.3.0