4.0.0
Changes since 3.3.0
-
Require Python 3.7 or 3.8, drop support for Python 3.4, 3.5 and 3.6
-
Fix / rewrite
setup.py
- Fix
python3 setup.py sdist bdist_wheel
, and pypi paackage- Had to move
ui/
directory intopyglossary/
- Had to move
- Switch from
distutils
tosetuptools
- Remove
py2exe
- Fix
-
Add interactive command line user interface
- Automatically selected if input & ouput file arguments are not passed and one of these:
- On Linux and no
$DISPLAY
is not set - On Mac and no
tkinter
module is found --ui=cmd
flag is passed
- On Linux and no
- Automatically selected if input & ouput file arguments are not passed and one of these:
-
New format support:
- Add read support for FreeDict, #206
- Add read support for Zim (Kiwix)
- Add read and write support for Kobo E-Reader Dictfile (.df)
- Add write support for DICT.org
dictfmt
source file - Add read support for dictunformat output file
- Add write support for JSON
- Add read support for Dict.cc (SQLite3)
- Add read support for JMDict, #239
- Add basic read support for Wiktionary Dump (.xml)
- Add read support for cc-kedict
- Add read support for DigitalNK (SQLite3)
- Add read support for Wordset.org JSON directory
-
Remove Omnidic write support (Unmaintained J2ME dictionary)
-
Remove Octopus MDict Source plugin
-
Remove Babylon Source plugin
-
BGL Weader: improvements
-
DictionaryForMIDs Writer: fix non-working code
-
Gettext Source (po) Writer: fix info header
-
MOBI E-Book Writer: fix sort order, fix and test kindlegen codes, add
kindlegen_path
option, #112 -
EPUB-2 E-Book Writer: fix sort order
-
XDXF Reader: rewrite with
etree.iterparse
to avoid using too much RAM -
Lingoes Source (LDF) Reader: fix ignoring info/metadata header
-
dict_org.py: rewrite broken plugin (Reader and Writer)
-
DSL Reader: fix loosing metadata/info
-
Aard 2 (slob) Reader:
- Fix adding css/js files as normal entries
- Add
bword://
prefix to entry links - Fix duplicate entries issue by keeping a set of blob IDs, #224
- Detect and pass defiFormat
-
Aard 2 (slob) Writer:
- Fix content_type detection
- Remove
bword://
prefix from entry links - Add resource files / data entries, #243
- Fix replacing image paths
- Show log events from
slob.py
in debug mode - Change default
compression
tozlib
- Allow passing empty
compression
-
Octopus MDict Reader:
- Read MDX file twice to load links
- Count data entries as part of
len(reader)
for progressbar
-
StarDict Writer:
- Copy "copyright" and "publisher" values to "description"
- Add source and target language codes to the end of bookname
- Add write-option
stardict_client: bool
SetTrue
to make glossary more compatible with StarDict 3.x - Fix broken result when
sametypesequence
option is given and a definitions contains|
- Allow
sametypesequence=x
for xdxf - Add
merge_syns
option - Allow
sametypesequence=None
option
-
XDXF Reader:
- Fix/improve xdxf to html transformation
-
Kobo Writer:
- Fix get_prefix algorithm and sorting order, with tests, #219
- Replace
<img src=...
tags with[Image: name.bmp]
, #219- and show a warning about data entries
- Additional keywords as alternatives, #232
- Fix support for alternates: duplicate entries based on word prefix, #238
- Show headword in title of alternate entries, #238, #245
- Strip full html definition, #246
-
CSV:
- Add
delimiter
option to Reader and Writer - Read and write info
- Writer: accept bool option
add_defi_format=True
(default False)
- Add
-
AppleDict Writer:
- AppleDict Writer: replace fix_sound_link() code with a single line
- AppleDict Writer should not call glos.setDefaultDefiFormat
-
MDX Reader:
- Replace
entry://
withbword://
in MDX Reader instead of AppleDict Writer - Fix internal
href="x:"
andhref="d:"
links - Fix
file://
in images path, fix #243
- Replace
-
User Interface improvements and fixes:
- ui_gtk: add About tab and more improvements
- ui_tk: replace About dialog with About tab and more improvements
- ui_cmd: improvements in progressbar
- ui_cmd: allow "=" in value of read/write options
-
Add a list of 208 languages and ~40 writing systems
- Detect
sourceLang
andtargetLang
from glossary name/title - Auto-select between
<b>
and<big>
tags depending on writing system- Using
glos.titleElement
method, used in FreeDict, JMDict and Dict.cc writers
- Using
glos.sourceLang
andglos.targetLang
properties (with setters) asLang
objectsglos.sourceLangName
andglos.targetLangName
properties (with setters) asstr
- Used in several plugins
- Detect
-
Break compatibilty of plugins
- Drop support for read and write functions (outside a class)
- Now we only support Reader class and Writer class
- Reader class must have these methods
__init__(self, glos)
open(self, filename)
- Here glossary info must be read from file and set with
glos.setInfo
- Here glossary info must be read from file and set with
__len__(self) -> int
- Should return the number or entries, or zero if it's too costly
__iter__(self) -> "Iterator[BaseEntry]"
- Can be a generator
close(self)
- Writer class must have these methods
-
__init__(self, glos)
-
open(self, filename)
- Here glossary info must be read from
glos.getInfo
orglos.iterInfo
and written to file
- Here glossary info must be read from
-
write(self) -> "Generator[None, BaseEntry, None]"
-
Entries must be fetched with
entry = yield
in awhile True
loop:while True: entry = yield if entry is None: break # process and write entry into file(s)
-
-
finish(self)
-
- Read options and write options must be set to their default values as class attributes
- See
pyglossary/plugins/csv_pyg.py
plugin for example
- See
sortKey
must be an intance method of Writer, instead of a function outside any class- Only for plugins that need sorting before write
-
Refactor and cleanup
Glossary
class- Removed or replaced most of class/static attributes of
Glossary
- To see the diff, run
git diff 3.3.0..master -- pyglossary/glossary.py
- To see the diff, run
- Removed
glos.addEntry
method- If you use it in your program, replace with
glos.addEntryObj(glos.newEntry(word, defi, defiFormat))
- If you use it in your program, replace with
- Removed instance methods:
getMostUsedDefiFormats
iterEntryBuckets
zipOutDir
andarchiveOutDir
- Moved to
pyglossary/glossary_utils.py
archiveOutDir
renamed tocompressOutDir
- Moved to
writeDict
iterSqlLines
-> moved topyglossary/plugins/sql.py
reverse
,takeOutputWords
,searchWordInDef
-> moved topyglossary/reverse.py
- Values of
Glossary.plugins
is changed toplugin_prop.PluginProp
instances - Change
glos.writeTxt
arguments- Replace
sep1
andsep2
withentryFmt
- Replace
rplList
withdefiEscapeFunc
,wordEscapeFunc
andtail
- Remove
iterEntries
,entryFilterFunc
- Method returns
Generator[None, BaseEntry, None]
instead ofbool
- See for usage example:
pyglossary/glossary.py
->def writeTabfile
pyglossary/plugins/dict_org_source.py
pyglossary/plugins/json_plugin.py
pyglossary/plugins/lingoes_ldf.py
pyglossary/plugins/sdict_source.py
- Replace
- Removed or replaced most of class/static attributes of
-
Refactor, cleanup and fixes in
Entry
andDataEntry
classes- Replace
entry.getWord()
withentry.word
- Replace
entry.getWords()
withentry.l_word
- Replace
entry.getDefi()
withentry.defi
- Remove
entry.getDefis()
- Drop handling alternate definitions in
Entry
objects
- Drop handling alternate definitions in
- Replace
entry.getDefiFormat()
withentry.defiFormat
- Add
entry.b_word
andentry.b_defi
shortcuts that givebytes
(UTF-8) - Replace
dataEntry.getData()
withdataEntry.data
- Add
__slots__
to Entry and DataEntry classes - Fix
DataEntry
in indirect mode- Mistaken for Entry with defi=DATA, and file content discarded
- Save resource files in user's cache directory when loading input glossary into memory
- Move file to output glossary on
dataEntry.save(...)
- Move file to output glossary on
- Fix
Entry.getRawEntrySortKey
not being alternates-aware, broke StarDict Writer DataEntry
: save: useshutil.copy
if has_tmpPath
, and set_tmpPath
- Replace
-
New features of
Entry
entry.stripFullHtml()
, remove<html... <head>...</head>...<body>
- Used in Kobo and Kobo Dictfile writers
- Add tests
-
Fix
glos.writeTabfile
:- Remove
\r
from definitions and info values - Fix not escaping word
- Remove
-
Fix/improve html detection in definitions
-
Switch to lazy imports of non-standard modules in plugins
-
Optimize RAM usage of indirect conversion
- To write StarDict, EPUB and DictionaryForMIDs glossaries, we need to load all entries into RAM to sort them
-
Other new features of Glossary class
glos.getAuthor()
to get "author", or "publisher" (as fallback)glos.removeHtmlTagsAll()
method, can be called by plugins' writerglos.collectDefiFormat(maxCount)
extract defiFormat counts- by reading first
maxCount
entries. (then iterator will be reset) - Used in StarDict Writer
- by reading first
- Show memory usage in trace mode
-
Bug fixes and improvements in code base
-
Apply entry filter when iterating over reader, fix #251
- Fixes wrong sort order for some glossaries (converting to StarDict or other formats that need sort)
-
Fixes and improvements in
TextGlossaryReader
class- Fix ignoring glossary defaultDefiFormat
-
Fix evaluating
None
value in read/write options
-
-
Support reading multi-file Tabfile or other text formats
- Example:
file.txt
,file.txt.1
,file.txt.2
- Need to add
file_count
info key, for example:##file_count 3
- Example:
-
Fixes in Tabfile Writer
- Fix not escaping ""
-
Add/update documentation
- Update README.md
- Add Termux guides in
doc/termux.md
- Move AppleDict guides to
doc/apple.md
- Move LZO notes to
doc/lzo.md
- Minify and compress
.svg
files indoc/
folder
-
Switch to f-strings, pep8 fixes, add types, style changes and refactoring
-
New command line flags:
--log-time
to show datetime in logs (overridelog_time
in config.json)--no-alts
to disable alternates handling--normalize-html
to lowercase tags (for now)--cleanup
and--no-cleanup
--info
to save.info
file alongside output file