Releases: intersystems/iknow
Extra CRC data now available from the iKnow engine
From version 1.4 on, CRC data is now part of the indexing data produced by the engine. See the Wiki for more information.
C interface
Next to the existing Python and C++ APIs, the iKnow engine is now also available through a simple C interface. This allows embedding the iKnow engine in virtually any environment. The new API uses a simple JSON format for input and output, but otherwise has a very similar feature surface to what you're familiar with from the iknowpy
package. Please see the wiki for more details.
This release also includes a significant update of the Japanese language model.
Automatic Language Identification
We've added Automatic Language Identification to the iKnow engine and made it available through a new iknowpy
function.
To use ALI when indexing text, simply pass '*'
as the language name to the index()
method and iKnow will figure it out for you. If all you need is the language and not the rest of the indexing results, you can use the new IdentifyLanguage()
method. Check the wiki for more details!
Generic Attributes & Japanese Measurements
This release builds on the infrastructure added in v1.1 and now includes attribute labels and a basic set of rules for three Generic Attributes. These can be used by developers to tag use case specific attributes not covered by the built-in attribute types. Developers can add their own marker terms for these to leverage attribute expansion to flag syntactically "affected" portions of a sentence. A basic set of expansion rules are included for these generic attributes.
For example, we've helped customers in the healthcare industry add marker terms such as "mother", "brother", etc. so that mentions of "family history" can be identified in the text: "Patient mentioned mother suffered a stroke 10y ago, but denied experiencing chest pain himself"
Furthermore, this release includes one of the biggest extensions of the Japanese language model, significantly extending its support for Attributes including measurements and time expressions. To accommodate the nature of the language in which a single entity can include several measurements, we have enabled the measurement spans to include more than two pairs of value & unit when necessary.
See list of supported attributes for the most up-to-date information.
Improved Attributes, CI/CD, and more
This release rolls up a large number of changes applied since the first full v1.0 release:
- Extended support for semantic attributes
- Many improvements to the language models, especially English, Japanese and Czech
- Enhancements to the CI/CD procedures' speed and reliability
- Enhancements to user and developer documentation
- Various bugfixes to previously reported issues
Semantic Attributes
The v1.1 release significantly expands iKnow's ability to identify semantic attributes in natural language text, and in particular enhances support for measurements, time and certainty. iKnow now recognizes more markers in the various supported languages and has more accurate expansion rules to identify the affected span within each sentence. Check the wiki for more details on which attributes are supported in which language.
New since v1.0 is the introduction of a Certainty attribute, which has an attribute property expressing the level of certainty. A level of 9 means an expression of absolute certainty and a level of 1 means very low confidence. While you can specify (or override) an initial level of certainty with the attribute marker definition (e.g. in the User Dictionary), rules processing may modify the value, e.g. in the context of a Negation Attribute.
This release also introduces three new Generic attributes, which can be used by developers to tag use case specific attributes not covered by the built-in attribute types. Developers can add their own marker terms for these to leverage attribute expansion to flag syntactically "affected" portions of a sentence. A basic set of expansion rules are included for these generic attributes.
For example, we've helped customers in the healthcare industry add marker terms such as "mother", "brother", etc. so that mentions of "family history" can be identified in the text: "Patient mentioned mother suffered a stroke 10y ago, but denied experiencing chest pain himself"
CI/CD Pipeline
The Continuous Integration / Continuous Deployment pipeline for this repository is implemented through GitHub Actions, and now includes standard unit tests as well as reference tests against a gold standard to ensure the highest quality output.
Compatibility Notes
We made a change to the Sentence attribute structure emitted by the iknowpy
module. In v1.0, the fixed number of properties (value, unit, value2, unit2) has been converted to a list of pairs, enabling a more flexible way of passing sentence attribute properties:
struct Sent_Attribute:
Attribute type "type_"
size_t offset_start "offset_start_", offset_stop "offset_stop_"
string marker "marker_"
string value "value_", unit "unit_", value2 "value2_", unit2 "unit2_"
Entity_Ref entity_ref
Path entity_vector
was changed to :
ctypedef vector[pair[string, string]] Sent_Attribute_Parameters
struct Sent_Attribute:
Attribute type "type_"
size_t offset_start "offset_start_", offset_stop "offset_stop_"
string marker "marker_"
Sent_Attribute_Parameters parameters "parameters_"
Entity_Ref entity_ref
Path entity_vector
Existing code should change as follows :
sent_attribute['value'] = sent_attribute['parameters'][0][0]
sent_attribute['unit'] = sent_attribute['parameters'][0][1]
sent_attribute['value2'] = sent_attribute['parameters'][1][0]
sent_attribute['unit2'] = sent_attribute['parameters'][1][1]
iKnow 1.0
First full release of the iKnow NLP library for Python:
- Core indexing functions, identifying concepts and their context
- Support for 11 languages (en, es, pt, fr, de, nl, sv, ja, ru, uk & cs)
- Tuning available through the User Dictionary object
- Full documentation, including sample sentences for all rules