go version of mysql rangen for db fuzz test
go get -u github.com/pingcap/go-randgen/cmd/go-randgen
try it:
go-randgen -h
- install
go-bindata
command
go get -u github.com/jteeuwen/go-bindata/...
- compile
go-randgen
make all
- It has a built-in zz file, you can generate sqls just with a yy file
- Be able to generate sql without connection to db, so it is very fast if you just want sqls
- yy syntax is mysql randgen compatible, as long as it doesn't contain embedded perl code
- go-randgen supports embedded lua code blocks, just as mysql randgen supports perl code blocks
- Implemented with pure Golang, it's design is very flexible, easy to hack
- All exported functions are stateless other than cmd package, if needed, you can call them as a library
generate sqls to test window functions:
# -Y the path of yy file to use
# -Q sql num to generate by yy
# -B break ddls and generated sqls into two files
# if user doesn't specfied a zz file path, it will use built-in zz
./go-randgen gentest -Y examples/windows.yy -Q 10 -B
After execution, you will see a output.data.sql
, which includes
ddl (define table structrue) and dml (init data in tables), and
a output.rand.sql
, which includes sqls generated by specified
yy file.
If you don't want to use built-in zz,
you can write a zz by hand, and use -Z
to specify path,
the detail of zz syntax to see following Grammar Guide.
If you just want to generate some sqls by yy, without generating ddl,
you can use --skip-zz
to skip ddl generation,
but yy file can not include key word
related with table name or field name in this condition.
the detail of yy syntax to see following Grammar Guide.
generate table structrue and data in user specified dsns according to user specified or built-in zz file.
# generate table structure and data specified by zz in specified dsn
# dsns are split by comma
./go-randgen gendata --dsns "root:@tcp(127.0.0.1:3306)/randgen,root:@tcp(127.0.0.1:4000)/randgen"
parse yy to generate sqls by user specified dsn:
./go-randgen gensql -Y examples/functions.yy \
--dsn "root:@tcp(127.0.0.1:3306)/randgen" \
-Q 100
Note that gensql
assumes structures of tables in the dsn
are the same (Because data genrated by randgen has this trait).
User sepecifies two dsns, then exec command will execute generated sqls in two dsn and dump sqls with inconsistent execution result.
Example:
./go-randgen exec -Y examples/functions.yy \
--dsn1 "root:@tcp(127.0.0.1:4000)/randgen" \
--dsn2 "root:@tcp(127.0.0.1:3306)/randgen" \
-Q 100
In above command, go randgen will generate data in two dsns
first, then execute 100 sqls generated according to functions.yy,
if execution result of a sql is inconsistent, it will dump
related info into dump
directory.
(you can change this directory by --dump
)
Note that go randgen compare sql result without order in default. For example, go-randgen think that the following result is consistent:
Result1:
+------+------+
| p | s |
+------+------+
| 1 | aaa |
| 2 | bbb |
+------+------+
Result2:
+------+------+
| p | s |
+------+------+
| 2 | bbb |
| 1 | aaa |
+------+------+
If you want to compare in order byte by byte, you
should add --order
.
exec
can also skip data generation by set --skip-zz
,
it will generate sqls just like gensql
command.
Except packages under cmd
directory, all exported
functions is stateless, you can call them again and again
safely.
The implementation under cmd
directory is the best examples
to use it as a library.
A simple example: get an iterator by yy and generate ten sqls
package main
import (
"fmt"
"github.com/pingcap/go-randgen/grammar"
"github.com/pingcap/go-randgen/grammar/sql_generator"
"log"
)
func main() {
yy := `
{
i = 1
}
query:
create
create:
CREATE TABLE
{print(string.format("table%d", i)); i = i+1}
(a int)
`
iterator, err := grammar.NewIter(yy, "query", 5, nil, false)
if err != nil {
log.Fatalf("get iter err %v\n", err)
}
iterator.Visit(sql_generator.FixedTimesVisitor(func(_ int, sql string) {
fmt.Println(sql)
}, 5))
}
Note that there is no problem to pass nil in the third param because of the absence of keyword in the example yy. But if you use yy keyword, you should use
gendata.NewKeyfun()
to create this param.
Print result:
CREATE TABLE table1 (a int)
CREATE TABLE table2 (a int)
CREATE TABLE table3 (a int)
CREATE TABLE table4 (a int)
CREATE TABLE table5 (a int)
zz file is a lua script file, it will define the following three things:
- tables to generate
- fields in tables
- data in fields
Use built-in zz as an example:
-- table related defination
tables = {
-- record number
rows = {10, 20, 30, 90},
-- table character set
charsets = {'utf8', 'latin1', 'binary'},
-- table partition number, 'undef' represents no partition
partitions = {4, 6, 'undef'},
}
-- field related defination
fields = {
-- field data types
types = {'bigint', 'float', 'double', 'decimal(40, 20)',
'char(20)', 'varchar(20)'},
-- generate signed and unsigned field with above type
sign = {'signed', 'unsigned'}
}
-- data init related defination
data = {
-- optional value of number type
numbers = {'null', 'tinyint', 'smallint',
'12.991', '1.009', '-9.183',
'decimal',
},
-- optional value of char type
strings = {'null', 'letter', 'english'},
}
As you can see, there must be three lua Table variables in zz file, which is tables, fields and data.
tables defines attributes on table, like rows
, charsets
and partitions
in above example, these attributes'
cartesian product will be evaluated,
every tuple in which will generate a table.
tables definition in above example will generate
4(rows)*3(charsets)*3(partitions)=36 tables in total.
fields defines what fields in tables,
these infos' cartesian product
also will be evaluated,
every tuple in which will generate a field,
but the fields generated by above example is
less than 6(types)*2(sign)=12, because
sign
attribute can only apply on numberic type,
go-randgen will automatically ignore it for non-numberic
types, so example definition will only genrate
4(number)*2(sign)+2(char)=10 fields. Note that tables
generated by randgen all have the same fields.
data defines data in tables, key represents field types
(see following for detail), value is an array, which are
optional values of this type. For each genrated record,
it will random choose a value for fields of
corresponding type. Optional values include
"literal" and "generator".Take the definition of numbers
above as an example, null
,12.991
, etc. are literals,
they will be take directly as a value, but when tinyint
(it is a generator) is taken, it will generate a random
value in -128~127
(to know more generators, see following for detail).
keys | mean | options | default |
---|---|---|---|
rows | record number in table | any positive number | [0, 1, 2, 10, 100] |
charsets | table's character set | 'utf8','utf8mb4','ascii','latin1','binary', 'undef' means not set charset explicitly | ['undef'] |
partitions | partition number of table | any positive number or 'undef', 'undef' means no partition | ['undef'] |
Related source code is tablesVars
variable in
gendata/tables.go
keys | mean | options | default |
---|---|---|---|
types | field type | any valid MySQL type | ['int', 'varchar', 'date', 'time', 'datetime'] |
keys | index or not | 'key' means add index to the field, 'undef' means not | ['undef', 'key'] |
sign | unsigned or not | 'signed', 'unsigned' | ['signed'] |
Related source code is fieldVars
variable in
gendata/fields.go.
data definition in go-randgen is enhanced compared with mysql
randgen. mysql randgen only support five summary types, which
is numbers
, blobs
, temporals
, enum
and strings
.
go-randgen not only support the five summary types, but also
support fine-grained definition.
For example:
data = {
numbers = {'null', 'tinyint', 'smallint',
'12.991', '1.009', '-9.183',
'decimal',
},
bigint = {100, 10, 3},
}
For above data definition, it will take
randomly from 100
, 10
and 3
when go-randgen generate bigint field,
instead from numbers
definition, which is more rough.
The map from concrete data types to summary types is
summaryType
variable in gendata/data.go.
Among above numbers
definition, 'tinyint', 'smallint', 'decimal'
are go-randgen built-in generators.
To know all generators in go-randgen, you can see
init
function in
gendata/generators/register.go
A simple example:
# one line comment
/*
multiline comment
*/
query:
select
| select1
select:
SELECT fields FROM _table
fields:
_field
| _field_int
The generated result of this yy maybe:
select1
SELECT "a random field" FROM "a random table in this db"
SELECT "a random int field" FROM "a random table in this db"
select1
- one line comment:
#
- multiline comment:
/**/
- Non-Terminal: be composed of lowercase, number of underline, but can not begin with number
- Terminal: be composed of uppercase, special character or number, but can not begin with underline
- Keyword: begin with underline
For Non-Terminal in the right of production, if the corresponding production can not be found, the Non-Terminal degrade to Terminal
Keyword begin with underline.
The keywords to get table names and field names:
_table
: randomly get a table name from generated tables_field
: randomly get a field name from generated fields_field_int
: randomly get a field from generated int fields_field_char
: randomly get a field from generated char and varchar fields_field_list
: get all field split by comma in the table_field_int_list
: get all int field split by comma in the table_field_char_list
: get all char field split by comma in the table
Some sugars to randomly generate data(it will generate double quotes automatically in character related data):
_digit
: random number in 0-9_letter
: random char from 'a' to 'z'_english
: random English word_int
: radom int value_date
: generate randomlyyyyy-MM-dd
formatted date_year
: random year_time
: randomhh:mm:ss
formatted time_datetime
: randomyyyy-MM-dd hh:mm:ss
formatted time
I only write a part of keywords. If you want to know all keywords,
You can see the NewKeyfun
function in gendata/gendata.go
You can write lua code in "{}", call print
function to splice content
to sql
query:{a = 1}
CREATE TABLE
{print(string.format("t%d", a))} (a INT)
Above example will always generate sql CREATE TABLE t1 (a INT)
.
You can call any funcation in lua standard library, for example:
# random number from 10-20
query:
{print(math.random(10,20))}
Normal code block will be executed every time the corresponding branch is selected.
go-randgen support insert code in the head of yy file, this code will only be executed once, called head code block, the main application of head code block is the declaration of variables or functions:
# head code block
{
i = 1
a = 100
function add(num1, num2)
return num1 + num2
end
}
query:
select
select:
SELECT * FROM _table WHERE where_clause
where_clause:
_field_int > {print(i)}
| _field_char > {print(a)}
| _field_int + _field_int > {print(add(i, a))}
You may be afraid that the brace may be conflict with lua Table definition.In fact, you are worried for no thing, you are free to use lua Table in code block:
{
f={a=1, b=3}
arr={0,2,3,4}
}
query:
{print(arr[f.a])} | {print(arr[f.b])}
Above example will only genrate "0" or "3" (Note that index of lua array begin from 1).
Another important feature is that you can call yy key word in
lua code block by _xxx()
, for example:
query:{table = _table()}
BEGIN ; update ; select ; END
update:
UPDATE {print(table)} SET _field_int = 10
select:
SELECT * FROM {print(table)}
Above yy will ensure that random table in update
and select
is the same.
- recursive subquery
query:
select
select:
SELECT * FROM
(select)
WHERE _field_int > 10
| SELECT * FROM _table WHERE _field_char = _english
- empty rule
order:
ASC
|DESC
| # order can product empty
#...
- generate a sequence of sqls
For exampe, we want to test Prepared statement
query:
SET @stmt = {print('"')} select {print('"')};
PREPARE stmt FROM @stmt_create ;
EXECUTE stmt ;
select:
SELECT * FROM _table
If you generate 3 sqls with above yy(namely set -Q
with 3),
it will generate sqls as follow:
SET @stmt = " SELECT * FROM _table ";
PREPARE stmt FROM @stmt_create;
EXECUTE stmt;
If you generate 6 sqls, it will generate above sqls twice.
If you generate 2 sqls, result is as follow:
SET @stmt = " SELECT * FROM _table ";
PREPARE stmt FROM @stmt_create;
From above examples, we can see the mean of ;
. The symbol extends
from mysql randgen, represents generation of a sequence of sqls.
- create unique table when test create statements
Program 1: insert lua script, apply head code block
# declare i to 1
{
i = 1
}
query:
create
create:
CREATE TABLE
{print(string.format("table%d", i)); i = i+1}
(a int)
Program 2: create table first, then drop it in next sql
query:
create
create:
CREATE TABLE t (a int); DROP TABLE t
Result:
CREATE TABLE t (a int);
DROP TABLE t;
CREATE TABLE t (a int);
DROP TABLE t;
...
Compared to mysql randgen, go-randgen is very easy to hack. If you want some feature it doesn't have yet, you can add it conveniently.
If you think all built-in generators do not meet your demand,
you can register new generator
in init
function in
gendata/generators/register.go
Assume that you add a aaa
generator, you will not only can use
"aaa"
in zz data field, but alse can use _aaa
key word in yy.
You can add your keyword in
NewKeyfun
function in gendata/gendata.go
- do not need to add
;
at the end of bnf expression, which is the habit of mysql randgen. go-randgen do not depend on;
to recognize different bnf expression. Of course, there is no problem if you add;
at the end of bnf expression, because we are compatible with mysql randgen - go-randgen init zz data with
insert
statement, other thaninsert ignore
. So if it meetsunsigned
field, it will try 10 times at most to generate positive value, if it fails in 10 times, 1 will be assigned to this field - data definition in zz can be more precise, other than the only four types in mysql randgen
- support generate sqls without connection to db, which takes advantage of schema cache in ddls generation, it is very fast.