In this subdirectory you will find implementations of a very simple string extractor in a number of languages. The extractor will split its input into lines and make every non-empty line into a PO entry.
All examples --- except for the Perl example --- use the Inline
Perl
module that allows embedding code written in other languages directly
into Perl code.
In order to use the example scanners you usually have to do the following:
sudo cpan install Inline::LANGUAGE
perl -Ilib samples/LANGUAGE/xgettext-lines.pl --help
perl -Ilib samples/LANGUAGE/xgettext-lines.pl README.md
Replace LANGUAGE with the language you want to test. In the case of Java you may have to run the command "sudo cpan install Inline::Java::Class" (not just "Inline::Java").
If your package manager already has a prebuilt package for the
Inline
module of your choice you should give it a try. In general,
installing Inline
modules is quite challenging.
Following is a list of possible candidate languages for that "Inline::*" bindings exists:
Some of the language names above link to a fully function example.
The list is not complete, but the languages Go and JavaScript (via NodeJS) are still missing at the time of this writing.
Let's look at a step-by-step instruction for writing an xgettext program in Python. The source code for the sample implementations for other languages will give you enough information to modify the example for your own needs.
We need the module that allows calling Python from Perl (and vice versa):
$ sudo cpan install Inline::Python
This will install "Inline::Python".
The Python
subdirectory contains a script xgettext-lines.pl
and a
Python module PythonGettext.py
.
The Perl script can be used without modification for any extractor that
you want to write in Python. It will turn every method it finds in
PythonGettext.py
into a Perl method. See
Locale::XGettext(3pm)
for details about the methods you can implement.
The minimal implementation in PythonGettext.py
will look like this:
class PythonXGettext:
def __init__(self, xgettext):
self.xgettext = xgettext
The constructor is called with the instance of the Perl in the variable
xgettext
. Note that the Perl object is not initialzed at this point and
you should not call any methods of it.
Your extractor is already almost functional:
$ ./xgettext-lines.pl --help
Usage: xgettext-lines.pl [OPTION] [INPUTFILE]...
...
It already prints out usage information. If you see an error message "Can't locate Locale/XGettext.pm in @INC ..." instead, you haven't installed the Perl module. Either install it or tell the sample script to use the source instead:
$ perl -I../../lib xgettext-lines.pl --help
Usage: xgettext-lines.pl [OPTION] [INPUTFILE]...
Not much happens but a directory _Inline
that contains cached information
from Inline::Python
was executed.
You can safely delete this directory at any point in time.
Now let's try our extractor with a real input file, for example the one that you are currently reading:
$ ./xgettext-lines.pl ../README.md
Can't locate object method "readFile" via package "Locale::XGettext::Python" at ../../lib/Locale/XGettext.pm line 184.
The method readFile
gets called for every input file and has to be
implemented. Let's add it to PythonGettext.py
:
def readFile(self, filename):
with open(filename) as f:
lineno = 0
for line in f:
lineno = lineno + 1
reference = "%s:%u" % (str(filename)[2:-1], ++lineno)
self.xgettext.addEntry({'msgid': line, 'reference': reference})
Now try it again:
$ ./xgettext-lines.pl ../README.md
This time nothing should happen. Don't worry, no news is good news. A file
messages.po
was created with one PO entry per line.
The other good news is: You're done! At least in many cases, this is already sufficient and you can now focus on writing a real parser for your source files.
Note that this is only implemented in the Python example. if you want to do the same for another language, please refer to the Python source code!
The Perl wrapper script xgettext-lines.pl
reads the python code from a
separate file, the Python module PythonXGettext.py
, so that the two
languages are separated cleanly. The script xgettext-lines.py
shows
another approach. It still contains Perl code at the top but the Python
code is added to the bottom. The overall layout of the script looks like
this:
#! /usr/bin/env perl
# Boilerplate Perl code.
# ...
use Inline Python => 'DATA'
# More boilerplate Perl code.
# ...
__DATA__
__Python__
class PythonXGettext:
def __init__(self, xgettext):
self.xgettext = xgettext
def readFile(self, filename):
with open(filename) as f:
for line in f:
self.xgettext.addEntry({'msgid': line});
The line use Inline Python => 'DATA'
has the effect that Perl, resp.
the Inline module will look for the Python code to compile at the end of
the file, after two lines containing the special markers __DATA__
and
__Python__
(or __Ruby__
, or __Java__
for other programming
languages).
Instead of __DATA__
you can also use __END__
. It has the same effect
in this particular case.
The script works standalone, without a separate Python module:
$ ./xgettext-lines.py
./xgettext-lines.py: no input file given
Try './xgettext-lines.py --help' for more information!
If Locale::XGettext
is not yet installed, you have to specify the path
to the Perl library:
$ perl -I../../lib xgettext-lines.py
./xgettext-lines.py: no input file given
Try './xgettext-lines.py --help' for more information!
Whether you want to mix Perl and Python in one file, or keep them separate - as described above - is a matter of taste and your individual requirements.
In the example, the PO entries only contain the message id and the source reference. You can set a lot more properties though, in particular the following:
- msgid_plural
- A possible plural form for the entry.
- keyword
- The name of the keyword used, such as "gettext" or "ngettext". Users can specify automatic comments for certain keywords with the option "--keyword". You have to tell B which keyword triggered the entry.
- flags
- A comma-separated list of flags to add to the entry, for example "perl-brace-format, no-wrap".
See
http://search.cpan.org/~guido/Locale-XGettext/lib/Locale/XGettext.pm#METHODS
or try the command perldoc Locale::XGettext
for more information.
Sometimes you want to read strings from another data source that is not
a file. One option is to simply interpret the command-line arguments not
as filenames but as identifiers for your data sources, for example
URLs, and then change the method readFile()
to read from that data
source.
Another option is to override the method extractFromNonFiles
. This
method is invoked after all input files have been read but before the output
is created:
def extractFromNonFiles(self):
# Read, for example from a database.
for string in database_records:
self.xgettext.addEntry({'msgid': line})
Your Python code can invoke all methods of the Perl object (that is
the property xgettext
of the Python object):
self.xgettext.addEntry({'msgid': line})
Adding a new message ID is just one example. See the documentation in http://search.cpan.org/~guido/Locale-XGettext/lib/Locale/XGettext.pm. for the complete interface.
When you run your extractor script with the option --help
you see a lot
of usage information from Locale::XGettext
. The API allows you to modify
the command line interface of your extractor to a certain degree.
If you implement the method fileInformation()
you can describe the type
of input files you expect.
def fileInformation(self):
return "Input files are plain text files and are converted into one PO entry\nfor every non-empty line."
Look at the usage information:
$ ./xgettext-lines.pl --help
Usage: ./xgettext-lines.pl [OPTION] [INPUTFILE]...
Extract translatable strings from given input files.
Input files are plain text files and are converted into one PO entry
for every non-empty line.
...
Your description is now printed after the generic usage information.
In order to add your own command-line options you have to override the method
languageSpecificOptions
. See this example:
def languageSpecificOptions(self):
return [
[
'test-binding',
'test_binding',
' --test-binding',
'print additional information for testing the language binding'
]
];
Print the usage description to see the effect:
$ ./xgettext-lines.pl --help
...
Language specific options:
-a, --extract-all extract all strings
-kWORD, --keyword=WORD look for WORD as an additional keyword
-k, --keyword do not to use default keywords"));
--flag=WORD:ARG:FLAG additional flag for strings inside the argument
number ARG of keyword WORD
--test-binding print additional information for testing the
language binding
...
Your new option --test-binding
is printed after generic options.
Custom ptions are defined as an array of arrays. Each definition has four elements:
The first element ('test-binding'
) contains the option specification.
The default are binary options that do not take arguments. For a string
argument you would use 'test-binding=s'
, for an integer argument
'test-binding=i'
. For a complete description please see
http://search.cpan.org/~jv/Getopt-Long/lib/Getopt/Long.pm.
The next element ('test_binding'
) is the name of the option. It is the
identifier that you have to use in order to access the value of the option.
See below for details.
The third element (' --test-binding'
) contains the left part of the
usage description. You can use leading spaces for aligning the string with
the rest of the usage description.
The last element contains the description of the option in the usage description.
You access command line options with the method option()
:
self.xgettext.option('test_binding')
The argument to the method is the name (the second element) from the option definition.
You can access the values of all other options as well. The option name is always the bare option description with hyphens converted to underscores:
self.xgettext.option('extract_all')
The above would extract the value of the option '--extract-all'
.
Keywords and Flags are always a mixture between default settings
and those specified on the command-line with --keyword
or
--flag
. There are two flavors of retrieving the merged
definitions:
keywords = self.xgettext.keywords()
This will a list of objects resp. hashes/associative arrays, each one having the following keys:
- function: the function name (the keyword)
- singular: the argument number of the singular form
- plural: the argument number of the plural form or 0
- context: the argument number of the message context or 0
- comment: an automatic comment or None, undefined, nil, NULL, ...
See the example source code for details how to retrieve that.
Alternatively, you can also just get an array of corresponding object strings and parse them yourself:
keywords = self.xgettext.keywordOptionStrings()
This would produce something like:
[
"gettext:1",
"ngettext:1,2",
"ncgettext:1c,2,3",
"greet:1,\"Hello, world!\""
]
Accessing flags is almost the same. The objects returned by the method "flags()" have the properties "function", "arg" (for the argument number), "flag" for the flag ("c-format", "perl-format", ...), "no" (true for "no-c-format", "no-perl-format"), and "pass" (for "pass-c-format", "pass-perl-format"). Alternatively, you have the method "flagOptionStrings()" if you want to parse the values yourself.
Under normal circumstances, you don't have to access flags.
Just pass the "keyword" property with "addEntry()" and
Locale::XGettext
will process the flags automatically.
By overriding certain methods you can enable or disable more options in your extractor:
def canExtractAll(self):
return 1
If canExtractAll()
returns a truthy value, the option --extract-all
is offered to the user. The default is false
.
def canKeywords(self):
return 1
If canKeywords
returns a truthy value, options for keyword specification
(see below) are added to the interface. The default is true
.
def canFlags(self):
return 1
If canFlags
returns a truthy value, options for flag specification
are added to the interface. The default is true
.
Note: Locale::XGettext
does not yet support flags.
If your extractor does not honor keyword specifications, you should override
the method canKeywords()
and return false
. If it does, you can
define the default keywords for your language like this:
def defaultKeywords(self):
return [
'gettext:1',
'ngettext:1,2',
'pgettext:1c,2',
'npgettext:1c,2,3'
]
The return value of defaultKeywords()
should be an array
of strings suitable as arguments for the command-line option
"--keyword". In the above example, the extractor
should extract the first argument to the function npgettext()
and
interpret it as the message context (hence the c
after the position),
arguments 2 and 3 should be interpreted as the singular and plural form
of the message.