Skip to content

Commit e59f39d

Browse files
author
Markus Armbruster
committed
json: Reject invalid UTF-8 sequences
We reject bytes that can't occur in valid UTF-8 (\xC0..\xC1, \xF5..\xFF in the lexer. That's insufficient; there's plenty of invalid UTF-8 not containing these bytes, as demonstrated by check-qjson: * Malformed sequences - Unexpected continuation bytes - Missing continuation bytes after start bytes other than \xC0..\xC1, \xF5..\xFD. * Overlong sequences with start bytes other than \xC0..\xC1, \xF5..\xFD. * Invalid code points Fixing this in the lexer would be bothersome. Fixing it in the parser is straightforward, so do that. Signed-off-by: Markus Armbruster <[email protected]> Reviewed-by: Eric Blake <[email protected]> Message-Id: <[email protected]>
1 parent a89d310 commit e59f39d

File tree

4 files changed

+122
-105
lines changed

4 files changed

+122
-105
lines changed

include/qemu/unicode.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,5 +2,6 @@
22
#define QEMU_UNICODE_H
33

44
int mod_utf8_codepoint(const char *s, size_t n, char **end);
5+
ssize_t mod_utf8_encode(char buf[], size_t bufsz, int codepoint);
56

67
#endif

qobject/json-parser.c

Lines changed: 14 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@
1313

1414
#include "qemu/osdep.h"
1515
#include "qemu/cutils.h"
16+
#include "qemu/unicode.h"
1617
#include "qapi/error.h"
1718
#include "qemu-common.h"
1819
#include "qapi/qmp/qbool.h"
@@ -133,6 +134,10 @@ static QString *qstring_from_escaped_str(JSONParserContext *ctxt,
133134
const char *ptr = token->str;
134135
QString *str;
135136
char quote;
137+
int cp;
138+
char *end;
139+
ssize_t len;
140+
char utf8_buf[5];
136141

137142
assert(*ptr == '"' || *ptr == '\'');
138143
quote = *ptr++;
@@ -194,12 +199,15 @@ static QString *qstring_from_escaped_str(JSONParserContext *ctxt,
194199
goto out;
195200
}
196201
} else {
197-
char dummy[2];
198-
199-
dummy[0] = *ptr++;
200-
dummy[1] = 0;
201-
202-
qstring_append(str, dummy);
202+
cp = mod_utf8_codepoint(ptr, 6, &end);
203+
if (cp <= 0) {
204+
parse_error(ctxt, token, "invalid UTF-8 sequence in string");
205+
goto out;
206+
}
207+
ptr = end;
208+
len = mod_utf8_encode(utf8_buf, sizeof(utf8_buf), cp);
209+
assert(len >= 0);
210+
qstring_append(str, utf8_buf);
203211
}
204212
}
205213

0 commit comments

Comments
 (0)