Skip to content

Mysql ROW_FORMAT, UTF8MB4

runout-at edited this page Mar 2, 2020 · 2 revisions

This reflects issue #242 and a solution to the error #1071 - Specified key was too long; max key length is 767 bytes which was caused on trying to solve a problem with DB lookups on header fields because of 4-byte-emoji/utf8 could not be stored/retreived correctly. For the latter problem we tried to convert tables from utf8 to utf8mb4.

I found an interesting solution to dump-modify-reimport a whole database.

The new server has following in the config: (deprecated: MariaDB >=10.2 https://mariadb.com/kb/en/innodb-system-variables/#innodb_large_prefix)

default_storage_engine  = InnoDB
innodb_file_format      = Barracuda
innodb_file_format_max  = Barracuda
innodb_file_per_table   = 1
innodb_large_prefix     = ON

There is a new RAW_FORMAT=DYNAMIC. This will be default from Mariadb 10.2.2 on. MariaDB 10.2.2, MariaDB 10.1.32 and MySQL 5.7 are introducing a new config variable innodb-default-row-format to set this system wide: https://mariadb.com/kb/en/library/xtradbinnodb-server-system-variables/#innodb_default_row_format But it is possible in earlier versions to use this. Tested with 10.1.26 (Debian Stretch).

Howto: Maybe the sed command could be shorter. It's just a proof of concept (created by try&error). I don't care if the string ' utf8 ' appears in a dumped row. This should be improofed.

First of all: BACKUP, BACKUP, BACKUP!

mysqldump -u <user> -p -h <oldmysql.server> -c -e --default-character-set=utf8mb4 --single-transaction --skip-set-charset --add-drop-database -B <databasename> > dump-utf8.sql
sed -e 's/DEFAULT CHARACTER SET utf8/DEFAULT CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci/' \
    -e 's/CHARACTER SET utf8 COLLATE utf8_unicode_ci/CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci/' \
    -e 's/COLLATE=utf8_unicode_ci/COLLATE=utf8mb4_unicode_ci/' \
    -e 's/DEFAULT CHARSET=utf8;/DEFAULT CHARSET=utf8mb4 ROW_FORMAT=DYNAMIC COLLATE=utf8mb4_unicode_ci;/' \
    -e 's/DEFAULT CHARSET=utf8 COLLATE/DEFAULT CHARSET=utf8mb4 ROW_FORMAT=DYNAMIC COLLATE/' \
    -e 's/DEFAULT CHARSET=utf8 COMMENT/DEFAULT CHARSET=utf8mb4 ROW_FORMAT=DYNAMIC COMMENT/' \
    -e 's/ DEFAULT CHARSET=utf8 / DEFAULT CHARSET=utf8mb4 /' \
    -e 's/utf8_/utf8mb4_/' \
    -e 's/utf8 /utf8mb4 /' \
    -e 's/ENGINE=MyISAM/ENGINE=InnoDB/' \
    dump-utf8.sql > dump-utf8mb4.sql
mysql -u <user> -p -v -h <newmysql.server> <dump-utf8mb4.sql