Releases: naptha/tesseract.js
Releases · naptha/tesseract.js
v4.1.1
What's Changed
- Fixed detection of image orientation metadata (#783)
- Allows Tesseract.js to work with images taken on iOS devices
- See this comment for explanation
- Allows Tesseract.js to work with images taken on iOS devices
- Minor changes to documentation and types (#781, #782, #778)
New Contributors
- @racosa made their first contribution in #781
- @Tshetrim made their first contribution in #782
- @l2ysho made their first contribution in #778
Full Changelog: v4.1.0...v4.1.1
v4.1.0
What's Changed
- Added ability to run layout analysis without recognition (#656)
- See this comment for instructions
- Added support for
OffscreenCanvas
in browser version by @nathanbabcock (#766) - Fixed bug where
recognize
was running OCR even when not necessary (#769) - Fixed bug where certain valid
langPath
URLs caused errors in browser version (#558) - Removed problematic
file-type
andresolve-url
dependencies (#773, #711)
Full Changelog: v4.0.6...v4.1.0
v4.0.6
What's Changed
- Invalid langData (
.traineddata
files) are now cleared from cache (#753)- Note: setting
cacheMethod: 'none'
orcacheMethod: 'refresh'
to prevent invalid files from being cached should no longer be necessary- See this comment for an explanation
- Note: setting
- Added source maps to esm build (#761)
- Various updates to documentation
Full Changelog: v4.0.5...v4.0.6
v4.0.5
What's Changed
- No changes to code
- Removed unnecessary files to reduce the size of the npm package
Full Changelog: v4.0.4...v4.0.5
v4.0.4
What's Changed
- Added SIMD-detection when
corePath
is manually specified (#735)- Important note for users who set
corePath
: for significantly faster performance, setcorePath
to a directory that includes bothtesseract-core.wasm.js
andtesseract-core-simd.wasm.js
- See this comment for explanation
- Important note for users who set
- Improved auto-rotate feature (
rotateAuto: true
) (#747) - Switched default CDN from unpkg to jsdelivr (#743)
- Updated various dependencies (#729, #736, #737, #739, #741)
- Reduced size of npm package (#731, #734, #740)
New Contributors
Full Changelog: v4.0.3...v4.0.4
v4.0.3
What's Changed
- Updated Tesseract to v5.3.0
- This resolves bug with inverted (white on black) text recognition (#717)
- Minor documentation fixes (#612, #614, #682, #673)
- Better types for
addJob
by @nathanbabcock in #719
New Contributors
- @Sacramentix made their first contribution in #612
- @Porush made their first contribution in #682
- @eltociear made their first contribution in #673
- @Woutervdvelde made their first contribution in #614
- @nathanbabcock made their first contribution in #719
Full Changelog: v4.0.2...v4.0.3
v4.0.2
v4.0.1
What's Changed
- Running
recognize
ordetect
with invalidimage
argument now throws error message (#699) - Fixed bug with custom
langdata
paths (#697)
New Contributors
- @fmonpelat made their first contribution in #697
Full Changelog: v4.0.0...v4.0.1
v4.0.0
Breaking Changes
createWorker
is now async- In most code this means
worker = Tesseract.createWorker()
should be replaced withworker = await Tesseract.createWorker()
- Calling with invalid
workerPath
orcorePath
now produces error/rejected promise (#654)
- In most code this means
worker.load
is no longer needed (createWorker
now returns worker pre-loaded)getPDF
function replaced bypdf
recognize option (#488)
Major New Features
- Processed images created by Tesseract can be retrieved using
imageColor
,imageGrey
, andimageBinary
options (#588)- See image-processing.html example for usage
- Image rotation options
rotateAuto
androtateRadians
have been added, which significantly improve accuracy on certain documents- See Issue #648 example of how auto-rotation improves accuracy
- See image-processing.html example for usage of
rotateAuto
option
- Tesseract parameters (usually set using
worker.setParameters
) can now be set for single jobs usingworker.recognize
options (#665)- For example, a single job can be set to recognize only numbers using
worker.recognize(image, {tessedit_char_whitelist: "0123456789"})
- As these settings are reverted after the job, this allows for using different parameters for specific jobs when working with schedulers
- For example, a single job can be set to recognize only numbers using
- Initialization parameters (e.g.
load_system_dawg
,load_number_dawg
, andload_punc_dawg
) can now be set (#613)- The third argument to
worker.initialize
now accepts either (1) an object with key/value pairs or (2) a string containing contents to write to a config file - For example, both of these lines set
load_number_dawg
to 0:worker.initialize('eng', "0", {load_number_dawg: "0"});
worker.initialize('eng', "0", "load_number_dawg 0");
- The third argument to
Other Changes
loadLanguage
now resolves without error when language is loaded but writing to cache fails- This allows for running in Firefox incognito mode using default settings (#609)
detect
returnsnull
values when OS detection fails rather than throwing error (#526)- Memory leak causing crashes fixed (#678)
- Cache corruption should now be much less common (#666)
New Contributors
- @reda-alaoui made their first contribution in #570
Full Changelog: v3.0.3...v4.0.0