From b925b8961588c19ac66e9fe9723120018ce8f113 Mon Sep 17 00:00:00 2001
From: aaaaaa123456789 <aaaaaa123456789@acidch.at>
Date: Wed, 22 Aug 2018 05:11:23 -0300
Subject: [PATCH 1/7] Added 'forked off' note to version in specification

---
 specification.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/specification.md b/specification.md
index d88dae6..390137b 100644
--- a/specification.md
+++ b/specification.md
@@ -1,6 +1,6 @@
 # BSP specification
 
-([version](versions.md): 0.5.2, rev. 37)
+(forked off [version](versions.md): 0.5.2, rev. 37)
 
 * [Introduction](#introduction)
 * [Execution model](#execution-model)

From 3edaee5c603069a09cbfb4f81979c3b6b098fd2f Mon Sep 17 00:00:00 2001
From: aaaaaa123456789 <aaaaaa123456789@acidch.at>
Date: Wed, 22 Aug 2018 06:05:11 -0300
Subject: [PATCH 2/7] Initial string handling section

---
 specification.md | 58 ++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 58 insertions(+)

diff --git a/specification.md b/specification.md
index 390137b..a05315c 100644
--- a/specification.md
+++ b/specification.md
@@ -7,6 +7,7 @@
 * [Opcodes](#opcodes)
 * [Instruction set](#instruction-set)
 * [Instruction description](#instruction-description)
+* [String handling](#string-handling)
 
 ## Introduction
 
@@ -1086,3 +1087,60 @@ child to the parent.
 If the child's execution triggers a fatal error, this fatal error must be propagated to the parent; in other words,
 a fatal error at any depth must halt the whole engine. Execution of the parent must **not** be resumed after a fatal
 error occurs in the child.
+
+## String handling
+
+Several instructions in this specification deal with strings — namely, the [`print`][print] and [`menu`][menu]
+instructions, as well as [those that manipulate the message buffer][msgbuffer]. This section specifies how the engine
+must behave when handling strings, and which part of the functionality is implementation-dependent.
+
+Valid strings in the BSP itself must be in UTF-8 format, as specified by [RFC 3629][rfc3629], regardless of the
+effective output format of the engine. Any UTF-8 decoding errors must be treated as fatal (including recoverable ones
+such as overlong encodings or surrogate characters (codepoints between `0x00d800` and `0x00dfff`) being encoded).
+
+Despite the engine must accept any valid UTF-8 string, it isn't required to be able to effectively display any Unicode
+character; an engine incapable of handling the full Unicode character set may choose to use a reduced one and replace
+characters not in its reduced set with zero or more suitable substitution characters. However, an engine is required
+to support at least Latin letters (A-Z, a-z), digits (0-9), spaces, and the following punctuation characters:
+`'-,.;:#%&!?/()[]`. All of these characters are encoded as single UTF-8 bytes, and belong to the following ranges:
+`0x20` - `0x21`, `0x23`, `0x25` - `0x29`, `0x2c` - `0x3b`, `0x3f`, `0x41` - `0x5b`, `0x5d`, and `0x61` - `0x7a`.
+
+The engine may enforce a limit on the number of bytes and/or characters that the message buffer can accept; this limit
+may also be dynamically determined during execution. If such a limit is enforced, characters and/or bytes in excess
+must be silently discarded without error; the engine must take care to discard multibyte characters as a whole, and
+not only some of their bytes. (For instance, if the last character to be added to the buffer is codepoint `0x0000a0`,
+encoded as `0xc2` `0xa0`, the engine may keep both bytes or discard them both, but it must not discard just the last
+byte.) If any of the instructions that append to the message buffer (i.e., `bufstring`, `bufchar` or `bufnumber`)
+cause some data to be silently discarded due to the buffer being full, any further such instructions must be silently
+ignored (i.e., wholly discarded) until the buffer is cleared via the `printbuf` or `clearbuf` instructions.
+
+The engine may enforce a similar limit on the number of bytes and/or characters to be printed by a single `print`
+instruction, as well as a maximum length for option labels for the `menu` instruction. Any text exceeding these limits
+must be silently truncated as given in the previous paragraph.
+
+Invalid UTF-8 strings given as arguments to the `bufstring` instruction cause a fatal error; this error may occur at
+the time of executing that instruction, or when executing any further instruction that manipulates the message buffer,
+up to the point where the message buffer is printed via the `printbuf` instruction. If the message buffer is never
+printed, the error may occur up to the point where the message buffer is cleared (either via the `clearbuf`
+instruction or due to terminating execution) or not at all; this is implementation-defined.
+
+Multibyte UTF-8 characters appended to the message buffer via `bufstring` instructions must be fully contained within
+a single string; if two or more consecutive instructions append parts of a multibyte UTF-8 character that build up to
+a valid character, the engine may accept those parts as a whole character or trigger a fatal error. For instance, the
+following snippet:
+
+```
+	bufstring .first
+	bufstring .second
+	; ...
+
+	; UTF encoding of U+00A0: 0xc2, 0xa0
+.first
+	db 0xc2, 0
+.second
+	db 0xa0, 0
+```
+
+may either append a `0x0000a0` codepoint to the message or cause a fatal error.
+
+[rfc3629]: https://tools.ietf.org/html/rfc3629

From 63e2cfed220b58791dfe33ecaf487d8c6218066a Mon Sep 17 00:00:00 2001
From: aaaaaa123456789 <aaaaaa123456789@acidch.at>
Date: Wed, 22 Aug 2018 06:12:37 -0300
Subject: [PATCH 3/7] Point to the new section and simplify the description of
 instructions that point to it

---
 specification.md | 17 ++++++-----------
 1 file changed, 6 insertions(+), 11 deletions(-)

diff --git a/specification.md b/specification.md
index a05315c..2383d37 100644
--- a/specification.md
+++ b/specification.md
@@ -672,12 +672,7 @@ This document does not specify how the engine will display the message; however,
 (or an environment that behaves in a similar fashion), it is recommended that the engine prints a newline character
 after the message.
 
-If the message is not valid UTF-8, the engine may choose to display the message anyway (handling the invalid characters
-in any way it can) or to treat it as a fatal error.
-
-An engine incapable of handling the full Unicode character set may choose to use a reduced character set and replace
-the remaining characters with a suitable substitution character; however, an engine must at least support the Latin
-letters (A-Z, a-z), digits (0-9), spaces, and the following punctuation characters: `'-,.;:#%&!?/()[]`.
+Further considerations regarding message strings are given in the [String handling](#string-handling) section.
 
 ### Manipulating the message buffer
 
@@ -694,11 +689,9 @@ The first three instructions concatenate data at the end of the message buffer.
 The `bufstring` instruction concatenates a string (in the same format as for the `print` instruction) at the end of
 the message buffer. No separator is inserted before or after the string.
 
-The `bufchar` instruction appends a single Unicode character to the message buffer. An engine incapable of handling the
-full Unicode character set may choose to use a reduced character set and replace the remaining characters with suitable
-substitutes; it must however support at least the letters (A-Z, a-z), numbers (0-9), basic punctuation characters
-(`'-,.;:#%&!?/()[]`) and the space character. Passing a value that isn't a valid Unicode codepoint (`0x000000` to 
-`0x00d7ff` and `0x00e000` to `0x10ffff`) is a fatal error.
+The `bufchar` instruction appends a single Unicode character to the message buffer. Passing a value that isn't a valid
+Unicode codepoint (`0x000000` to `0x00d7ff` and `0x00e000` to `0x10ffff`) is a fatal error; values above `0x1fffff`
+are reserved for further versions of the specification.
 
 The `bufnumber` instruction appends the decimal representation of a number to the message buffer. The number is
 treated as a 32-bit unsigned value and converted to decimal, and printed using the regular digit characters (0-9,
@@ -709,6 +702,8 @@ The `printbuf` instruction prints the contents of the message buffer as a messag
 `print` instruction) and clears the buffer, resetting it to the empty string. The `clearbuf` instruction resets the
 message buffer to the empty string without printing it.
 
+Further considerations regarding the message buffer are given in the [String handling](#string-handling) section.
+
 ### Option menus
 
 ```

From 1e2bf7fb3ecf770a151ecb4687d4b875b8a6924e Mon Sep 17 00:00:00 2001
From: aaaaaa123456789 <aaaaaa123456789@acidch.at>
Date: Wed, 22 Aug 2018 06:17:06 -0300
Subject: [PATCH 4/7] Add a note regarding control characters

---
 specification.md | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/specification.md b/specification.md
index 2383d37..0f309b0 100644
--- a/specification.md
+++ b/specification.md
@@ -1100,6 +1100,12 @@ to support at least Latin letters (A-Z, a-z), digits (0-9), spaces, and the foll
 `'-,.;:#%&!?/()[]`. All of these characters are encoded as single UTF-8 bytes, and belong to the following ranges:
 `0x20` - `0x21`, `0x23`, `0x25` - `0x29`, `0x2c` - `0x3b`, `0x3f`, `0x41` - `0x5b`, `0x5d`, and `0x61` - `0x7a`.
 
+Control characters in strings must be accepted, as they are valid UTF-8 characters; they are also valid arguments to
+the `bufchar` instruction. (In particular, `0` is a valid argument to `bufchar`, and therefore must not be treated as
+a string terminator in that context.) However, since they are not in the ranges listed in the previous paragraph,
+engines are not required to support them; control characters may be ignored (i.e., substituted by nothing) when the
+string (or the message buffer) is displayed to the user, or handled in any other appropriate way.
+
 The engine may enforce a limit on the number of bytes and/or characters that the message buffer can accept; this limit
 may also be dynamically determined during execution. If such a limit is enforced, characters and/or bytes in excess
 must be silently discarded without error; the engine must take care to discard multibyte characters as a whole, and

From 3a9e84f0d2765a163d8213de269ec5a09f78cbac Mon Sep 17 00:00:00 2001
From: aaaaaa123456789 <aaaaaa123456789@acidch.at>
Date: Wed, 22 Aug 2018 06:21:36 -0300
Subject: [PATCH 5/7] Another note for the `menu` instruction

---
 specification.md | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/specification.md b/specification.md
index 0f309b0..02d3497 100644
--- a/specification.md
+++ b/specification.md
@@ -740,6 +740,9 @@ Options:
 If the list of pointers is empty (i.e., if the first pointer is `0xffffffff`), no menu is shown to the user, and the
 variable is set to `0xffffffff`.
 
+Further considerations regarding the text used as option labels are given in the [String handling](#string-handling)
+section.
+
 Note that a menu with just one option must still be shown to the user, as it is possible to use such a menu to give the
 user the possibility of aborting the process by stopping the BSP engine.
 

From d6d9e4a31cac6dbed07e5aae99d8724468d1353c Mon Sep 17 00:00:00 2001
From: aaaaaa123456789 <aaaaaa123456789@acidch.at>
Date: Mon, 27 Aug 2018 23:47:16 -0300
Subject: [PATCH 6/7] Fix various writing errors and Unicode inaccuracies

---
 specification.md | 22 ++++++++++++----------
 1 file changed, 12 insertions(+), 10 deletions(-)

diff --git a/specification.md b/specification.md
index 02d3497..162fd87 100644
--- a/specification.md
+++ b/specification.md
@@ -689,9 +689,10 @@ The first three instructions concatenate data at the end of the message buffer.
 The `bufstring` instruction concatenates a string (in the same format as for the `print` instruction) at the end of
 the message buffer. No separator is inserted before or after the string.
 
-The `bufchar` instruction appends a single Unicode character to the message buffer. Passing a value that isn't a valid
-Unicode codepoint (`0x000000` to `0x00d7ff` and `0x00e000` to `0x10ffff`) is a fatal error; values above `0x1fffff`
-are reserved for further versions of the specification.
+The `bufchar` instruction appends a single Unicode character to the message buffer. The value passed to the `bufchar`
+instruction as an argument must represent a valid non-surrogate Unicode codepoint (i.e., it must be between `0x000000`
+and `0x00d7ff`, or `0x00e000` and `0x10ffff`); passing a value outside of those ranges is a fatal error. Values above
+`0x1fffff` are reserved for further versions of the specification.
 
 The `bufnumber` instruction appends the decimal representation of a number to the message buffer. The number is
 treated as a 32-bit unsigned value and converted to decimal, and printed using the regular digit characters (0-9,
@@ -1093,13 +1094,14 @@ instructions, as well as [those that manipulate the message buffer][msgbuffer].
 must behave when handling strings, and which part of the functionality is implementation-dependent.
 
 Valid strings in the BSP itself must be in UTF-8 format, as specified by [RFC 3629][rfc3629], regardless of the
-effective output format of the engine. Any UTF-8 decoding errors must be treated as fatal (including recoverable ones
-such as overlong encodings or surrogate characters (codepoints between `0x00d800` and `0x00dfff`) being encoded).
-
-Despite the engine must accept any valid UTF-8 string, it isn't required to be able to effectively display any Unicode
-character; an engine incapable of handling the full Unicode character set may choose to use a reduced one and replace
-characters not in its reduced set with zero or more suitable substitution characters. However, an engine is required
-to support at least Latin letters (A-Z, a-z), digits (0-9), spaces, and the following punctuation characters:
+effective output format of the engine. Any UTF-8 decoding errors (such as overlong encodings) must be treated as
+fatal, without attempting any recovery; surrogate codepoints (i.e., those between `0x00d800` and `0x00dfff`) must be
+treated as fatal errors as well.
+
+Although the engine must accept any valid UTF-8 string, it isn't required to be able to effectively display any
+Unicode character; an engine incapable of handling the full Unicode character set may choose to use a reduced one and
+replace characters not in its reduced set with zero or more suitable substitution characters. However, an engine is
+required to support at least Latin letters (A-Z, a-z), digits (0-9), spaces, and the following punctuation characters:
 `'-,.;:#%&!?/()[]`. All of these characters are encoded as single UTF-8 bytes, and belong to the following ranges:
 `0x20` - `0x21`, `0x23`, `0x25` - `0x29`, `0x2c` - `0x3b`, `0x3f`, `0x41` - `0x5b`, `0x5d`, and `0x61` - `0x7a`.
 

From 4224b8ef7a77e68d4a217031fbd94b3c040b45d3 Mon Sep 17 00:00:00 2001
From: aaaaaa123456789 <aaaaaa123456789@acidch.at>
Date: Mon, 24 Sep 2018 21:09:15 -0300
Subject: [PATCH 7/7] Assign version number and date

---
 specification.md |  2 +-
 versions.md      | 18 +++++++++++++++++-
 2 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/specification.md b/specification.md
index 162fd87..95fe4d1 100644
--- a/specification.md
+++ b/specification.md
@@ -1,6 +1,6 @@
 # BSP specification
 
-(forked off [version](versions.md): 0.5.2, rev. 37)
+([version](versions.md): 0.6.0, rev. 38)
 
 * [Introduction](#introduction)
 * [Execution model](#execution-model)
diff --git a/versions.md b/versions.md
index d7f9f46..54bbb28 100644
--- a/versions.md
+++ b/versions.md
@@ -36,7 +36,23 @@ specification. However, this does not convey as much information as the full ver
 
 ## Changelog
 
-#### [Version 0.5.2, rev. 37 (19 August 2018)](https://github.com/aaaaaa123456789/bsp/blob/master/specification.md)
+#### [Version 0.6.0, rev. 38 (25 September 2018)](https://github.com/aaaaaa123456789/bsp/blob/master/specification.md)
+
+Adds a section about characters and strings (referenced from all instructions that handle text data), including
+changes such as:
+
+* Forbidding the use of invalid UTF-8 strings, requiring a fatal error in all cases
+* Actually pointing to the UTF-8 spec (i.e., the RFC) instead of leaving it implied
+* Specifying the acceptable behaviors of valid UTF-8 strings: what characters must be supported by the engine, and
+  which ones can be substituted
+* Requiring that any valid UTF-8 string is accepted, including those containing control characters (despite those
+  control characters need not be actually displayed)
+* Elaborating on character/byte limits for strings, detailing how an engine that imposes such limits must behave
+* Adding a few notes that address corner cases regarding invalid UTF-8 strings
+* Indicating that arguments to bufchar above `0x1fffff` are reserved and may actually have some purpose in future
+  versions of the spec (since they are completely outside the range of Unicode)
+
+#### [Version 0.5.2, rev. 37 (19 August 2018)](https://github.com/aaaaaa123456789/bsp/blob/92d13c851899eeb06d26ce346ee4f6ab46123ee7/specification.md)
 
 * Adds a version number to the specification and a link to the (newly-written) changelog