Skip to content

4.0.0

Compare
Choose a tag to compare
@ilius ilius released this 24 Oct 11:45
· 2099 commits to master since this release
047b747

Changes since 3.3.0

  • Require Python 3.7 or 3.8, drop support for Python 3.4, 3.5 and 3.6

  • Fix / rewrite setup.py

    • Fix python3 setup.py sdist bdist_wheel, and pypi paackage
      • Had to move ui/ directory into pyglossary/
    • Switch from distutils to setuptools
    • Remove py2exe
  • Add interactive command line user interface

    • Automatically selected if input & ouput file arguments are not passed and one of these:
      • On Linux and no $DISPLAY is not set
      • On Mac and no tkinter module is found
      • --ui=cmd flag is passed
  • New format support:

    • Add read support for FreeDict, #206
    • Add read support for Zim (Kiwix)
    • Add read and write support for Kobo E-Reader Dictfile (.df)
    • Add write support for DICT.org dictfmt source file
    • Add read support for dictunformat output file
    • Add write support for JSON
    • Add read support for Dict.cc (SQLite3)
    • Add read support for JMDict, #239
    • Add basic read support for Wiktionary Dump (.xml)
    • Add read support for cc-kedict
    • Add read support for DigitalNK (SQLite3)
    • Add read support for Wordset.org JSON directory
  • Remove Omnidic write support (Unmaintained J2ME dictionary)

  • Remove Octopus MDict Source plugin

  • Remove Babylon Source plugin

  • BGL Weader: improvements

  • DictionaryForMIDs Writer: fix non-working code

  • Gettext Source (po) Writer: fix info header

  • MOBI E-Book Writer: fix sort order, fix and test kindlegen codes, add kindlegen_path option, #112

  • EPUB-2 E-Book Writer: fix sort order

  • XDXF Reader: rewrite with etree.iterparse to avoid using too much RAM

  • Lingoes Source (LDF) Reader: fix ignoring info/metadata header

  • dict_org.py: rewrite broken plugin (Reader and Writer)

  • DSL Reader: fix loosing metadata/info

  • Aard 2 (slob) Reader:

    • Fix adding css/js files as normal entries
    • Add bword:// prefix to entry links
    • Fix duplicate entries issue by keeping a set of blob IDs, #224
    • Detect and pass defiFormat
  • Aard 2 (slob) Writer:

    • Fix content_type detection
    • Remove bword:// prefix from entry links
    • Add resource files / data entries, #243
    • Fix replacing image paths
    • Show log events from slob.py in debug mode
    • Change default compression to zlib
    • Allow passing empty compression
  • Octopus MDict Reader:

    • Read MDX file twice to load links
    • Count data entries as part of len(reader) for progressbar
  • StarDict Writer:

    • Copy "copyright" and "publisher" values to "description"
    • Add source and target language codes to the end of bookname
    • Add write-option stardict_client: bool
      Set True to make glossary more compatible with StarDict 3.x
    • Fix broken result when sametypesequence option is given and a definitions contains |
    • Allow sametypesequence=x for xdxf
    • Add merge_syns option
    • Allow sametypesequence=None option
  • XDXF Reader:

    • Fix/improve xdxf to html transformation
  • Kobo Writer:

    • Fix get_prefix algorithm and sorting order, with tests, #219
    • Replace <img src=... tags with [Image: name.bmp], #219
      • and show a warning about data entries
    • Additional keywords as alternatives, #232
    • Fix support for alternates: duplicate entries based on word prefix, #238
    • Show headword in title of alternate entries, #238, #245
    • Strip full html definition, #246
  • CSV:

    • Add delimiter option to Reader and Writer
    • Read and write info
    • Writer: accept bool option add_defi_format=True (default False)
  • AppleDict Writer:

    • AppleDict Writer: replace fix_sound_link() code with a single line
    • AppleDict Writer should not call glos.setDefaultDefiFormat
  • MDX Reader:

    • Replace entry:// with bword:// in MDX Reader instead of AppleDict Writer
    • Fix internal href="x:" and href="d:" links
    • Fix file:// in images path, fix #243
  • User Interface improvements and fixes:

    • ui_gtk: add About tab and more improvements
    • ui_tk: replace About dialog with About tab and more improvements
    • ui_cmd: improvements in progressbar
    • ui_cmd: allow "=" in value of read/write options
  • Add a list of 208 languages and ~40 writing systems

    • Detect sourceLang and targetLang from glossary name/title
    • Auto-select between <b> and <big> tags depending on writing system
      • Using glos.titleElement method, used in FreeDict, JMDict and Dict.cc writers
    • glos.sourceLang and glos.targetLang properties (with setters) as Lang objects
    • glos.sourceLangName and glos.targetLangName properties (with setters) as str
      • Used in several plugins
  • Break compatibilty of plugins

    • Drop support for read and write functions (outside a class)
    • Now we only support Reader class and Writer class
    • Reader class must have these methods
      • __init__(self, glos)
      • open(self, filename)
        • Here glossary info must be read from file and set with glos.setInfo
      • __len__(self) -> int
        • Should return the number or entries, or zero if it's too costly
      • __iter__(self) -> "Iterator[BaseEntry]"
        • Can be a generator
      • close(self)
    • Writer class must have these methods
      • __init__(self, glos)

      • open(self, filename)

        • Here glossary info must be read from glos.getInfo or glos.iterInfo and written to file
      • write(self) -> "Generator[None, BaseEntry, None]"

        • Entries must be fetched with entry = yield in a while True loop:

           while True:
           	entry = yield
           	if entry is None:
           		break
           	# process and write entry into file(s)
      • finish(self)

    • Read options and write options must be set to their default values as class attributes
      • See pyglossary/plugins/csv_pyg.py plugin for example
    • sortKey must be an intance method of Writer, instead of a function outside any class
      • Only for plugins that need sorting before write
  • Refactor and cleanup Glossary class

    • Removed or replaced most of class/static attributes of Glossary
      • To see the diff, run git diff 3.3.0..master -- pyglossary/glossary.py
    • Removed glos.addEntry method
      • If you use it in your program, replace with glos.addEntryObj(glos.newEntry(word, defi, defiFormat))
    • Removed instance methods:
      • getMostUsedDefiFormats
      • iterEntryBuckets
      • zipOutDir and archiveOutDir
        • Moved to pyglossary/glossary_utils.py
        • archiveOutDir renamed to compressOutDir
      • writeDict
      • iterSqlLines -> moved to pyglossary/plugins/sql.py
      • reverse, takeOutputWords, searchWordInDef -> moved to pyglossary/reverse.py
    • Values of Glossary.plugins is changed to plugin_prop.PluginProp instances
    • Change glos.writeTxt arguments
      • Replace sep1 and sep2 with entryFmt
      • Replace rplList with defiEscapeFunc, wordEscapeFunc and tail
      • Remove iterEntries, entryFilterFunc
      • Method returns Generator[None, BaseEntry, None] instead of bool
      • See for usage example:
        • pyglossary/glossary.py -> def writeTabfile
        • pyglossary/plugins/dict_org_source.py
        • pyglossary/plugins/json_plugin.py
        • pyglossary/plugins/lingoes_ldf.py
        • pyglossary/plugins/sdict_source.py
  • Refactor, cleanup and fixes in Entry and DataEntry classes

    • Replace entry.getWord() with entry.word
    • Replace entry.getWords() with entry.l_word
    • Replace entry.getDefi() with entry.defi
    • Remove entry.getDefis()
      • Drop handling alternate definitions in Entry objects
    • Replace entry.getDefiFormat() with entry.defiFormat
    • Add entry.b_word and entry.b_defi shortcuts that give bytes (UTF-8)
    • Replace dataEntry.getData() with dataEntry.data
    • Add __slots__ to Entry and DataEntry classes
    • Fix DataEntry in indirect mode
      • Mistaken for Entry with defi=DATA, and file content discarded
      • Save resource files in user's cache directory when loading input glossary into memory
        • Move file to output glossary on dataEntry.save(...)
    • Fix Entry.getRawEntrySortKey not being alternates-aware, broke StarDict Writer
    • DataEntry: save: use shutil.copy if has _tmpPath, and set _tmpPath
  • New features of Entry

    • entry.stripFullHtml(), remove <html... <head>...</head>...<body>
      • Used in Kobo and Kobo Dictfile writers
      • Add tests
  • Fix glos.writeTabfile:

    • Remove \r from definitions and info values
    • Fix not escaping word
  • Fix/improve html detection in definitions

  • Switch to lazy imports of non-standard modules in plugins

  • Optimize RAM usage of indirect conversion

    • To write StarDict, EPUB and DictionaryForMIDs glossaries, we need to load all entries into RAM to sort them
  • Other new features of Glossary class

    • glos.getAuthor() to get "author", or "publisher" (as fallback)
    • glos.removeHtmlTagsAll() method, can be called by plugins' writer
    • glos.collectDefiFormat(maxCount) extract defiFormat counts
      • by reading first maxCount entries. (then iterator will be reset)
      • Used in StarDict Writer
    • Show memory usage in trace mode
  • Bug fixes and improvements in code base

    • Apply entry filter when iterating over reader, fix #251

      • Fixes wrong sort order for some glossaries (converting to StarDict or other formats that need sort)
    • Fixes and improvements in TextGlossaryReader class

      • Fix ignoring glossary defaultDefiFormat
    • Fix evaluating None value in read/write options

  • Support reading multi-file Tabfile or other text formats

    • Example: file.txt, file.txt.1, file.txt.2
    • Need to add file_count info key, for example: ##file_count 3
  • Fixes in Tabfile Writer

    • Fix not escaping ""
  • Add/update documentation

    • Update README.md
    • Add Termux guides in doc/termux.md
    • Move AppleDict guides to doc/apple.md
    • Move LZO notes to doc/lzo.md
    • Minify and compress .svg files in doc/ folder
  • Switch to f-strings, pep8 fixes, add types, style changes and refactoring

  • New command line flags:

    • --log-time to show datetime in logs (override log_time in config.json)
    • --no-alts to disable alternates handling
    • --normalize-html to lowercase tags (for now)
    • --cleanup and --no-cleanup
    • --info to save .info file alongside output file