A wrapper library for compatibility with projects written for the Windows API
, where wchar_t
size == 2 bytes on *nix platforms.
Replaces libc
, glibc
functions using the wchar_t
type.
When porting software using the internal type wchar_t
from the Windows API
platform, a wchar_t
size equal to two bytes (16 bit) is required for normal operation. In *nix systems, by default the wchar_t
type size corresponds to 4 bytes (32 bit).
But, it is possible to compile a program with the type wchar_t
equal to 2 bytes. This feature is available when using GCC
or clang
compilers.
To ensure the efficiency of these collected programs, it is necessary that libc and as a consequence all other libraries be assembled with wchar_t
support equal to 2 bytes. This condition is usually impracticable..
Using compilation keys to build your program with this library.
Enable assembly with wchar_t
type equal to 2 bytes.
CC | key |
---|---|
GCC |
-fshort-wchar |
clang |
-fwchar-type=short -fno-signed-wchar |
- The library has no dependencies.
- It was tested under
Linux
, on other *nix platforms the behavior is not guaranteed. - It is not intended for assembly in the
Windows API
environment, because it is not needed on this platform. - The library includes both the main functions from the
libc
,glibc
libraries, for working with thewchar_t
type, and third-party developments and other extensions. - An additional library of extensions based on
LibWchar2
forWindows API
platforms is available. - There are files for the
cppcheck
code analyzer settings for testing programs using this library. - Attention, not all functions are properly tested, if you help with tests and search for inconsistencies, this will make the library better and more reliable.
A short free translation of the essence of what is stated in the standards concerning wchar_t
and related types.
wchar_t
which is an integer type whose range of values can be different codes for all members of the largest extended character set specified among the supported local encodings. A null character must have a zero code value. Each member of the base character set must have a code value equal to its value when used as a single character in an integer symbolic constant if the implementation does not specifySTDC_MB_MIGHT_NEQ_WC
.wint_t
- which is an integer type that is unchanged by default, since it can contain any value that corresponds to members of the extended character set, as well as at least one value that does not correspond to any extended character set.mbstate_t
- which is the complete type of the object, other than the type of the array, which can contain the conversion state information needed to convert between the sequences of many byte characters and wide characters.
- I use the compiler flags to provide the size of 2 bytes for the
wchar_t
type, you are doing this on your own responsibility. Using this, you lose the ability to work efficiently with external libraries assembled without these flags. Standard libraries are assembled by default withwchar_t
support of 32 bits for * nix systems. - The situations in which
wchar_t
is 2 bytes in size are very limited and probably relate mainly to standalone implementations along with changing the code of all libraries using wide characters. - Some *nix systems define
wchar_t
as a 16-bit type and thus strictly followUnicode
. This definition perfectly corresponds to the standard, but it also means that to represent all characters fromUnicode
andISO 10646
it is necessary to use the surrogate charactersUTF-16
, which in fact is an encoding with several wide-format symbols. But, the reference to a wide character encoding is contrary to the purpose of thiswchar_t
type. - It is not recommended to mix
input/output
functions of wide character and byte array. The functionsprintf/fprintf/putchar
are the output of bytes, regardless of whether it includes formats for wide characters in the same thread. The functionswprintf/vfwprintf/putwchar
on the contrary, are intended for the output of wide characters, although it is also able to print byte arrays. - Another problem with
glibc
,libc
is the difference between the types of variableswchar_t
andwint_t
,unsigned
andsigned
, respectively. Some of the functionsglibc
,libc
associated with wide characters take as input parameters the typewint_t
which is a certain problem associated with the compiler warnings about the different types of signatures of the variable. - An interesting historical note wchar_t: Unsafe at any size from Andy Finnell
Library LibWchar2 removes these restrictions, and does not require reassembly of all libraries, while allowing you to create applications with two byte type
wchar_t
.
In the library LibWchar2 the variable with the type
mbstate_t
is ignored, and even if you do not set this variable, it removes the intermediate states that are stored and prevent the mutual execution ofinput/output
in one thread.Also, work with the orientation of the stream is deleted in the
input/output
functions, its necessity is a very controversial issue, but this also affects the stability of the functions associated with theinput/output
operations.
In the library LibWchar2 the problem of types is solved, all functions that work in one way or another with wide characters are reduced to a single type
wchar_t
.
Assembling a *nix platform is performed in a typical way, using the autotool package.
Run the configure
installation script from the project's root directory.
In addition to typical keys, the script understands the following options:
--enable-werror
- collect the library and tests with the-Werror
key.--enable-devel
- collect the library and tests with the-Wextra
key.--disable-testlib
- do not collect library tests.--enable-debug
- collect a library with debugging information.
Next, you need to compile the library with the standard commands:
./configure --prefix=/usr
make
make check
make install
Also, you can use the script build.sh
from the root directory, this will allow you not to enter these commands with your hands.
If there is a need to rebuild the script ./configure
, execute:
./autogen.sh
- Installing extension
LibWchar2
Library forWindows API
.
To use the library in the project, connect the header last, after all system headers, while wchar.h
and wctype.h
can be omitted, they are already included.
#include <stdio.h>
#include <string.h>
#include ...
#include <wchar2.h>
The library itself is connected in the standard way:
Makefile:
CFLAGS = -I. -fshort-wchar /* GCC */
CFLAGS = -I. -fwchar-type=short -fno-signed-wchar /* clang */
LDFLAGS = -L. -lwchar2
Definitely convenient is the redefinition of standard functions working with the file system, such as: mkdir
, remove
, rename
, stat
, access
, basename
, dirname
, fopen
, fputc
, fputs
.
For this, before including the header, define the following definitions:
#define WS_FS_REDEFINE 1
#include <wchar2.h>
or, if there is a need to use only UTF-8
encoding:
#define WS_FS_REDEFINE 1
#define WS_FS_UTF8 1
#include <wchar2.h>
Attention, the WS_FS_UTF8
key will not work separately.
Example code snippets
In the WS_FS_UTF8
definition mode, functions mkdir
, remove
, rename
, stat
, fopen
, access
only accept input data in wchar_t
format, otherwise the input data can be in the formats shown in the table, in this case the determination is made automatically.
See: wchar2.h macro __wchar_type_id(..)
Type | const | array | const array |
---|---|---|---|
char* | const char* | char[] | const char[] |
wchar_t* | const wchar_t* | wchar_t[] | const wchar_t[] |
string_ws* | const string_ws* |
- By checking your code with analysis programs, you make it more reliable, if you use cppcheck you may be useful configuration files with a description of library functions for the
cppcheck
analysis. - If you have the time and the desire or the opportunity to help make the library better, you can take part in writing tests for library functions. This way you will take part in the testing.
- libexpat - Fast streaming XML parser written in
C
. Author: Expat development team - tinydir - patch v.1.2.3 - Lightweight, portable and easy to integrate C directory and file reader. Author: cxong
The project uses the revised code of the authors:
- Tim J. Robbins
- Nexenta Systems, Inc
- Fredrik Fornwall/Rich Felker
- Angel Ortega
- Mayu Laierlence
- Nodir Temirkhodjaev
for which they have a special thank you :)
MIT