ts-edifact
is an Edifact parsing library written in typescript. This implementation was initially based onnode-edifact but has changed a bit since the start of the project. It now is able to build a full object tree based on the general message structure defined in the Edifact documentation and update the tokenizer with the the appropriate charset, which was defined in the UNB
segment.
By default ts-edifact
ships with a selection of D:01B
message structure definitions as well as their respective segment and element definition files. For convenience a parser was added recently to generate such definitions from the UNECE Web page. Plans to generate these definition files from the Edifact Directory exist, though due to time limitations this feature wasn't added yet.
Currently supported functionality:
- An ES6 streaming parser reading UN/EDIFACT messages.
- Provide your own event listeners to get the parser to do something useful.
- Construct structured javascript objects from UN/EDIFACT messages.
- Support for the UNA header and custom separators.
- Validating data elements and components accepted by a given segment.
- Parsing and checking standard UN/EDIFACT messages with segment tables.
- Support for envelopes.
- Check for well-formed Edifact documents according to the defined message type, version and revision within the
UNH
message header. - Generation of Edifact specification definition files (i.e.
D01B_INVOIC.struct.json
,D01B_INVOIC.segments.json
andD01B_INVOIC.elements.json
) obtained from the UNECE page directly. - On using the
Reader
class, updating the charset according to the one specified in theUNB
segment of the interchange is supported.
We switched from ts-edifact
to a currently proprietary Java-based implementation I developed in the past year. During this process I learned a lot in regards to parsing Edifact files and, sadly, the current ts-edifact
version has its limitations which briefly summarized look as such:
- The syntax version has no impact on the Edifact document. Neither charsets are correctly restricted or timestamps validated here correctly nor are service segments initialized here for the appropriate syntax version
- Charsets (v3, v4) in
UNB
segments are not correctly supported which can lead to issues forUNOX
andUNOY
encodings that may utilize up to 4 bytes per character - Code-list values are not parsed and therefore not included in the validation process
- Objects generated for the object tree do not respect the specifications in different Edifact directories
segments.ts
andelements.ts
only support basic service segments (UNB
,UNH
, ...), that are more or less mixing things of different syntax versions, but are missing a lot of others, especially ones defined in syntax version 4- Parsing and validation should be decoupled
- More configuration options are needed
- More sample files are needed, especially ones with uncommon charsets
- Better mechanism for loading message structure, segments and element definition tables needed to reduce the size of this artifact and to load definitions only when really needed. Any help from more experienced JS/TS developers is more than appreciated here on how such a mechanism may look like
Changing these things, unfortunately, takes a bit time of which I'm currently not having that much of. I try to port the necessary changes to this project as soon as possible, but my schedule for the upcoming months looks pretty exhausting TBH.
This example parses a document and translates it to a javascript array result
containing segments. Each segment is an object containing a name
and an elements
array. An element is an array of components.
import { Parser, Validator, ResultType } from 'ts-edifact';
const enc: string = ...;
const doc: string = ...;
const parser: Parser = new Parser(new Configuration({ validator: new ValidatorImpl() }));
// Provide some segment and element definitions.
parser.configuration.validator.define(...);
// Parsed segments will be collected in the result array.
let result: ResultType = [];
let elements: string[][];
let components: string[];
parser.onOpenSegment = (segment: string): void => {
// Started a new segment.
elements = [];
result.push({ name: segment, elements: elements });
});
parser.onElement = (): void => {
// Parsed a new element.
components = [];
elements.push(components);
});
parser.onComponent = (value: string): void => {
// Got a new component.
components.push(value);
});
// by default the tokenizer of the parser will use UNOA charset
// if characters outside of those defined in the UNOA charset should be used
// the parser needs to be informed about which characters it should allow for alpha(numeric) values
parser.updateCharset("UNOC");
// send the Edifact data to the parser. This method supports streaming and can be invoked when new data
// is available and it will continue from where it left
parser.write(doc);
// tell the parser that no more data is expected to be received
parser.end();
or more streamlined using the Reader
utility class
import { Reader, ResultType } from "ts-edifact";
const document: string = ...;
const specDir: string = ...;
const reader: Reader = new Reader(specDir);
const result: ResultType[] = reader.parse(document);
...
or if a custom set of segment- and element definitions should be used
import { Reader, ResultType, Dictionary, SegmentEntry, ElementEntry } from "ts-edifact";
import * as segmentsData from ".../segments.json";
import * as elementsData from ".../elements.json";
const document: string = ...;
const specDir: string = ...;
const segments: Dictionary<SegmentEntry> = new Dictionary<SegmentEntry>(segmentsData);
const elements: Dictionary<ElementEntry> = new Dictionary<ElmentEntry>(elementsData);
const reader: Reader = new Reader(specDir);
reader.define(segments);
reader.define(elements);
const result: ResultType[] = reader.parse(document);
...
The module can be installled through:
npm install ts-edifact
Keep in mind that this is an ES6 library. It currently can be used with node 4.0 or higher.
This module is build around a central Parser
class which provides the core UN/EDIFACT parsing functionality. It only exposes four methods:
- the
updateCharset(string)
method updates the charset used by theTokenizer
class to determine valid data values. If a charset is provided which the parser does not yet recognize, this method will throw an error. - the
write()
method to write some data to the parser - the
separators()
method does return the separators which are used by the parser - the
end()
method to close an EDI interchange.
Data read by the parser can be read by using hooks which will be called on specific parsing events.
Definitions can be provided to describe the structure of segments and elements. An example of a segment definition:
{
"BGM": {
"requires": 0,
"elements": ["C002", "C106", "1225", "4343"]
}
}
A corresponding definition from the UN/EDIFACT D01A
spec for the above mentioned BGM
segment can be seen following this link.
The requires
property indicates the number of elements which are required to obtain a valid segment. The elements
array contains the names of the elements which should be provided. Definitions can also be provided for these elements:
{
"C002": {
"requires": 4,
"components": ["an..3", "an..17", "an..3", "an..35"]
},
"C106": {
"requires": 3,
"components": ["an..35", "an..9", "an..6"]
}
}
An incomplete set of D01B definition files can be found in the src/messageSpec
folder. The *.struct.json
files contain the general structure definition of an Edifact message, i.e. INVOIC.struct.json
contains the message structure specification of a D01B
Edifact invoice, while *.segments.json
contain the respective admissible segments of the acutal processed message type and the *.elements.json
contain the respective component definitions of elements used by segments.
As of version 0.0.7
such definition files can be generated via the UNECEMessageStructureParser
class in case the definition is available online at the unece.org page. This parser will generate a EdifactMessageSpecification
object structure that holds the actual message type structe definition as well as the segment- and element tables needed to validate the document to process.
The persist
utility function of the src/util
class allows to persist the generated definitions to files which can be used onwards.
import { MessageStructureParser, UNECEMessageStructureParser } from "ts-edifact";
import { persist } from "ts-edifact/lib/util";
...
function storeSpecFiles(specDir: string, type: string, version: string, revision: string): Promise<...> {
const structureParser: MessageStructureParser = new UNECEMessageStructureParser(version + revision, type);
return loadTypeSpec()
.then((response: EdifactMessageSpecification) => {
// store downloaded and parsed definition files to the specified directory
persist(response, specDir));
... // some other work
}
.catch((error: Error) => handle(error));
}
...
await storeSpecFiles("/home/SomeUser/edifact", "invoic", "d", "01b")
.then(...);
On using the Reader
class it will attempt to read such specification files from either the provided directory or, if none was provided, it will try to read such definition files from the local directory. The *.segments.json
and *.elements.json
files are used during parsing time of the Edifact document to validate that only admissible values are provided for the respective segments/elements. By default, the ValidatorImpl
class will ignore any unknown segments or element definitions found. If a strict validation should be performed, that throws an error in case an unknown segment or element is contained within the document the validator needs to be initialized with the optional throwOnMissingDefinitions
parameter set to true.
// use strict validation; will throw an error if unknown segments and elements are found
const validator: Validator = new ValidatorImpl(true);
The *.struct.json
file is only used in case an object structure should be generated via the InterchangeBuilder
class.
Parsing speed including validation but without matching against a segment table is around 20Mbps. Around 30% of the time spent seems to be needed for the validation part.
If performance is critical the event callbacks can also be directly defined as methods on the Parser
instance. Defining an event callback onOpenSegment(callback)
then becomes:
const parser: Parser = new Parser();
const callback = (segment: string) => { ... };
parser.onOpenSegment = callback;
Keep in mind that this avoids any openSegment
events to be produced and as such, also it's associated overhead.
Class | Description |
---|---|
Parser | The Parser class encapsulates an online parsing algorithm. By itself it doesn't do anything useful, however the parser can be extended through several event callbacks. |
Reader | A convenience class which assigns default callbacks to the respective event callbacks on the parser and returns an array of name- and elements entries, where name is a string referencing the segment name and elements is a multidimensional array where the outer array represents an element of the segment and the inner array will contain the respective components of an element. |
Tracker | A utility class which validates segment order against a given message structure. |
Validator | The Validator can be used as an add-on to the Parser class, to enable validation of segments, elements and components. This class implements a tolerant validator, only segments and elements for which definitions are provided will be validated. Other segments or elements will pass through untouched. Validation includes:
|
Counter | The Counter class can be used as a validator for the Parser class. However it doesn't perform any validation, it only keeps track of segment, element and component counts. Component counts are reset when starting a new element, just like element counts are reset when closing the segment. |
InterchangeBuilder | The InterchangeBuilder class will use the parsed result obtained by either the reader or the parser and convert the array of segments, by using a corresponding message version definition, into a JavaScript object structure containing the respective messages contained in the parsed Edifact as well as respective segment groups which are further subgrouped by the iteration count on respective segments. I.e. if multiple LIN and accompanying segments are found, they are grouped in their own subgroup and any accompanying segment belonging to that segment group will be added to that subgroup as well. |
UNECEMessageStructureParser | A helper class to parse the online version of the UNECE hompeage for the respective Edifact message type structure as well as the admissible segments and elements for the respective message type |
SegmentTableBuilder | A builder for segment definition objects used by the validator and tracker classes |
ElementTableBuilder | A builder for element definition objects used by the validator and tracker classes |
A parser capable of accepting data formatted as an UN/EDIFACT interchange. The constructor accepts a Configuration
instance as an optional argument:
new Parser();
new Parser(configuration: Configuration);
The first constructor will initialize a NullValidator
, which does not perform any validation and therefore also not throw any errors.
Function | Description |
---|---|
onOpenSegment(segment: string): void |
Add a listener for a specific open segment event. |
onCloseSegment(): void |
Add a listener for a close segment event. |
onElement(): void |
Add a listener for starting processing an element within a segment. |
onComponent(data: string): void |
Add a listener for a parsed component part of an Edifact element. |
updateCharset(charset: string): void |
Specifies the character set to use while parsing the Edifact document. By default UNOA will be used. This method throws an error if an unknown charset is provided. |
write(chunk) |
Write a chunk of data to the parser |
separators() |
Since v0.0.7 Returns an object of the identified and used separators |
end() |
Terminate the EDI interchange |
A convenience class which already implements default callbacks for common parsing tasks.
new Reader(specDir?: string);
The optional specDir
parameter should point to the location where the Edifact specification files can be found. If none was provided the reader tries to find them in the local directory.
Function | Description |
---|---|
define(definitions: (Dictionary<SegmentEntry> | Dictionary<ElementEntry>)): void |
Feeds the validator used inside the reader with the set of known segment and element definitions. Parsed segments which do not adhere to the segments or elements defined in these tables will lead to a failure being thrown and therefore fail the parsing of the Edifact document. |
parse(document: string): ResultType[] |
Will attempt to parse the document to an array of segment objects where each segment object contains a name and a further multidimensional array of strings representing the elements in the outer array and the respective components of an element in the inner array. Any validation error encountered while reading the Edifact document will lead to an error being thrown and does ending the parsing process preemptively. |
Since 0.0.12
the encoding(string)
function was removed as the charset is now set once the UNB
header is processed. By default the parser will use the UNOA
charset and update to the specified one once the respective charset was extracted.
A utility class which validates segment order against a given message structure. The constructor accepts a segment table as its first argument:
new Tracker(table: MessageType[]);
Function | Description |
---|---|
accept(segment: string | MessageType): void |
Match a segment to the message structure and update the current position of the tracker. |
reset(): void |
Reset the tracker to the initial position of the current segment table. |
The Validator
can be used to validate segments, elements and components. It keeps track of element and component counts and checks if the component types match those in the segment definition.
new ValidatorImpl(throwOnMissingDefinitions?: boolean = false);
Since v0.0.7
a strict validation can be performed on setting the throwOnMissingDefinitions
parameter to true
which leads to failures if unknown segments or elements are used within the document under validation. Be default, the current implementation will ignore any unknown segments or elements.
Function | Description |
---|---|
disable(): void |
Disable validation. |
enable(): void |
Enable validation. |
define(definitions: (Dictionary<SegmentEntry> | Dictionary<ElementEntry>)): void |
Provision the validator with an array of segment and element definitions. |
format(formatString: string): FormatType | undefined |
Requests a component definition associated with a format string |
onOpenSegment(segment: string): void |
Start validation of a new segment |
onElement(): void |
Add an element |
onOpenComponent(buffer: Tokenizer): void |
Open a component |
onCloseComponent(buffer: Tokenizer): void |
Close a component |
onCloseSegment(segment: string): void |
Finish the segment |
The buffer
argument to both onOpenComponent()
and onCloseComponent()
should provide three methods alpha()
, alphanumeric()
, and numeric()
allowing the mode of the buffer to be set. It should also expose a length()
method to check the length of the data currently in the buffer.
The InterchangeBuilder
uses the result of the Parser
or Reader
to convert the array of segments into a JavaScript/TypeScript object structure representing the interchange envelop and the respective Edifact messages parsed.
It will attempt to use the specific message structure definition for the concrete version defined in the message header (UNH
) to check whether a mandatory segment according to the Edifact specification is missing. A D96A
invoice should therefore attempt to use the respective D96A_INVOIC
message structure specification and if not obtainable fall back of a common INVOIC
message struture specification.
The constructuro expects, besides the parsing result of the Edifact document a base path where the Edifact message structure definition files are located. An incomplete list can be found using the ./node_modules/ts-edifact/lib/messages/
base path.
new InterchangeBuilder(parsingResult: ResultType[], basePath: string);
Since v0.0.7
: This class will parse the unece.org Website in order to generate message structure definition as well as segment- and element definitions for a requested message type.
new UNECEmessageStructureParser(version: string, type: string);
Function | Description |
---|---|
loadTypeSpec(): Promise<EdifactMessageSpecification> |
Downloads the message structure definition page of a respective Edifact message type, i.e. INVOIC , and for all specified segments the respective segment definition pages. These pages will be parsed and converted to a object structure supporting the lookup of the respective entries. |
The generated EdifactMessageSpecification
object will hold the parsed values for the message type structure as well as the segments- and elements used by this message type.
Note that the respective segments and elements are mapped to own typescript objects which are defined in src/edifact.ts. The current set of classes is probably incomplete and may fall victim to differences between multiple versions and release tags of the Edifact specification. Consider further work to be done here in the future.
A helper class to load the respective segment definition files, from either the local directory or a specified one, and convert them to a usable data structure needed by the validator and tracker classes.
new SegmentTableBuilder(type: string);
Function | Description |
---|---|
forVersion(version: string): TableBuilder<SegmentEntry> |
Sets the version of the Edifact document this builder should fetch. |
specLocation(location: string): TableBuilder<SegmentEntry> |
Sets the path where to look for the segment definition files. |
build(): Dictionary<SegmentEntry> |
Attempts to load the *.segments.json definition file of the specified message type (and version if specified) and returns a dictionary object with the loaded data. If no file could be found only the basic UNB , UNH , UNS , UNT and UNZ segment definitions are loaded. |
A helper class to load the respective element definition files, from either the local directory or a specified one, and convert them to a usable data structure needed by the validator and tracker classes.
new ElementTableBuilder(type: string);
Function | Description |
---|---|
forVersion(version: string): TableBuilder<ElementEntry> |
Sets the version of the Edifact document this builder should fetch. |
specLocation(location: string): TableBuilder<ElementEntry> |
Sets the path where to look for the element definition files. |
build(): Dictionary<ElementEntry> |
Attempts to load the *.elements.json definition file of the specified message type (and version if specified) and returns a dictionary object with the loaded data. If no file could be found only the basic element definitions used by the UNB , UNH , UNS , UNT and UNZ segments are loaded. |