(NB: perhaps some terms are unnecessary, see #2/Rationale)
tuple: a logical ordered union of several elements with possible duplicates. (#todo #maybe harmonize with XY.3.1.x Tuple)
lexeme: a syntactic unit of a program (a Forth source code); (unless otherwise noted, it is a sequence of non-blank characters delimited by a blank).
to recognize a lexeme: to determine the interpretation semantics and the compilation semantics for the lexeme.
lexical context: the set of all possible system's states on which recognizing of lexemes depends on.
lexical context of a lexeme: the element of lexical context in which this lexeme is recognized.
to interpret a lexeme: to perform the interpretation semantics for the lexeme in its lexical context.
to compile a lexeme: to perform the compilation semantics for the lexeme in its lexical context.
to translate a lexeme: to interpret the lexeme if interpreting, or to compile the lexeme if compiling.
unqualified token: a tuple of arbitrary data objects that determines the interpretation semantics and the compilation semantics for a lexeme in its lexical context.
token: unqualified token (a synonym, when it is clear from context).
to interpret a token: to perform the interpretation semantics that are determined by the token.
to compile a token: to perform the compilation semantics that are determined by the token.
to translate a token: to interpret the token if interpreting, or to compile the token if compiling.
token translator: a Forth definition that translates a token; also, depending on context, an execution token for this Forth definition.
resolver: a Forth definition that tries to recognize a lexeme producing a tuple of a token and its token translator.
token descriptor object: an implementation dependent data object (a set of information) that describes how to interpret and how to compile a token.
token descriptor: a value that identifies a token descriptor object; also, less formally and depending on context, a Forth definition that just returns this value, or a token descriptor object itself.
fully qualified token: a tuple of a token and its token descriptor.
recognizer: a Forth definition that tries to recognize a lexeme producing a fully qualified token.
simple recognizer: a recognizer that may produce the same token descriptor only.
compound recognizer: a recognizer that may produce the different token descriptors.
perceptor: a recognizer that is currently used by the Forth text interpreter to translate a lexeme.
current recognizer: the perceptor (an unformal synonym).
initial perceptor: the perceptor before it was changed by a program, or after reverting these changes.
A tuple is described by a space separated list of data type symbols that is enclosed in parentheses:
( symbol_i ... symbol_2 symbol_1 )
A tuple that contains a nested tuple:
( symbol_k ... ( symbol_j ... symbol_i ) ... symbol_1 )
is equal to the tuple with removed nested parentheses:
( symbol_k ... symbol_j ... symbol_i ... symbol_1 )
In a stack diagram a tuple can be shown in an abbreviation form as i*x
,
or surrounded by curved brackets as
( stack-id: symbol_k ... { symbol_j ... symbol_i } ... symbol_1 -- ... )
(the latter option is a subject for a discussion)
Append table XY.1 to table 3.1
Symbol | Data type | Size on stack |
---|---|---|
td | token descriptor | 1 cell |
tt | token translator | 1 cell |
t | tuple | 0 or more cells |
ut | unqualified token | 0 or more cells |
qt | fully qualified token (qtoken) | 1 or more cells |
td => x ;
tt => xt ;
t => ( S: i*x F: j*r C: k*x ) ; where i >=0, j >= 0, k >= 0 ;
ut => t ;
qt => ( ut td ) ;
A tuple is an ordered union of data objects, with possible duplicates.
A tuple is characterized by the number of data objects and the data type for each object, that are constitute the tuple signature. The number of data objects in a tuple may be uncertain.
A tuple may be empty, that means the number of its data objects is zero.
When a tuple is placed on the stacks, the elements of the tuple (that are particular data objects) are placed on the data stack, the floating-point stack, and the control-flow stack, in accordance with the corresponding data types and their symbols order. The rightmost element in the tuple becomes the topmost in a stack. It's possible that none data object is placed on some stack.
(Rationale: the control flow stack is mentioned to allow to use the tuple notation for control flow operations too, if any)
(todo: maybe replace "token" with "tuple" everywhere)
An unqualified token is a tuple.
An unqualified token shall be placable on the data and floating point stacks only.
(In this section, "translator" means "token translator")
A translator translates a tuple. A translator is specialized to a tuple having some particular signature only. An ambiguous condition exists if a translator is applied to a tuple having another signature. An ambiguous condition exists if a translator cannot translate a tuple in the current state.
The stack effect of performing a translator is:
( t_2 t_1 -- t_3 )
. It takes the tuple t_1
from the stack and translates it.
Other stack effects and side effects are due to the particular translator specializing and translating of this tuple.
(In this section, "descriptor" means "token descriptor")
A descriptor is specialized to a tuple having some particular signature only.
NB: A token translator can play role of the descriptor.
A fully qualified token is a tuple of an unqualified token and a corresponding descriptor. The descriptor can be used to translate the token.
The stack effect of performing a recognizer is:
( c-addr u -- qt | 0 )
A recognizer tries to recognize the lexeme identified by the string ( c-addr u )
in its lexical context.
It returns a fully qualified token qt
if successful, or zero otherwise (if unsuccessful).
Neither interpretation state nor compilation state are the part of the lexical context.
A recognizer shall not have side effects that can be detectable by a standard program that is unaware of internal details of this recognizer. A recognizer shall return the semantically same results when it is performed consecutively with the same arguments.
If the Recognizer word set is present, the following specification should be used instead the specification in the section 3.4 beginning with "a. Skip leading spaces" and up to the sub-section "3.4.1".
- a. Skip leading spaces and parse a lexeme (see 3.4.1);
- b. Recognize the lexeme using the perceptor and producing a fully qualified token;
- 1. if interpreting, according to the token descriptor, perform the interpretation semantics that are determined by the token and continue at a).
- 2. if compiling, according to the token descriptor, perform the complication semantics that are determined by the token and continue at a).
- c. If unsuccessful, an ambiguous condition exists (see 3.4.4).
Initially the perceptor should recognize a lexeme in the following order
- 1. As the name of a local variable if the Locals word set is provided.
- 2. As the name of a Forth word according to the search order if the Search-Order word set is provided, or in the dictionary header space otherwise.
- 3. As a number according to 3.4.1.3.
- 4. As other implementation defined forms, if any.
The regions of data space produced by the operations described in 3.3.3.2 Contiguous regions may be non-contiguous if the following words are executed between allocations.
PERCEPTOR
SET-PERCEPTOR
SET-PERCEPTOR-BEFORE
SET-PERCEPTOR-AFTER
REVERT-PERCEPTOR
(the words are under construction, see also Issue #3)