-
Notifications
You must be signed in to change notification settings - Fork 4
Advanced usage
Please note that in this page, we will use the term configuration file
to mean ways to configure Phobos. This term encompasses both the actual configuration file and the alternative configuration methods as explained here.
To properly function, Phobos uses a daemon called the phobosd
(internally called the Local Resource Scheduler (LRS)). It centralizes all requests on its node and schedules them in an optimized way. However, it is not responsible for any I/O, which are left to the client, but rather reserves media, handle load/unload of the later on devices, and the mounting of the underlying filesystems.
When trying to put data in Phobos, a client will thus ask the phobosd for media where to do the I/O. In turn, the phobosd will reserve those media, load/mount them if necessary, and return them back to the client. The client will then actually perform the I/O, and when done, will notify the daemon is the operation is over.
To properly manage and keep the system information, Phobos also uses a component called the Distributed State Service (DSS) which is responsible for interaction with a database. This component is used both by the clients when trying to put data (for instance to check whether an object already exists) and by the phobosd (to check the devices already known at startup for example).
With this architecture in mind, we define a Phobos Server as:
- a set of devices to access a media library, shared or not,
- a Phobos service (the phobosd as explained above) to schedule I/Os,
- a tool/script which receives I/O requests and call Phobos client API.
Using this definition of a Phobos server, we can then parallelize Phobos so that it runs on multiples servers, synchronized by the DSS as a common database. I/O distribution is then done by a front-end which has access to every server and can chose which to use with the locate
feature, as explained in Store commands.
When putting data in Phobos, you can specify which storage layout you want and characteristics about this layout.
Currently though, Phobos can only manage the raid1
storage layout for mirroring, and the number of replica.
In Phobos, this value is the number of data replicates, so a replica count of 1 means that there is only one copy of the data (the original), and 0 additional copies of it. Therefore, a replica count of 2 means the original copy of the data, plus 1 additional copy.
Specifying the storage layout to use and its parameter can be done in multiple ways.
To specify the layout to use with the CLI, you can use the --layout
option of the phobos put
command, with the value being the name of the layout to use.
Alongside it, you can also provide the --lyt-params
option, with a list of comma-separated key=value.
These parameters will be interpreted by the layout, so if a given key/value couple does not make sense for the layout, it will be ignored.
Here is an example of a put
call with both parameters given, which aims to put the data on 4 different media:
phobos put --layout raid1 --lyt-params repl_count=4 /etc/hosts blob
Similarly, to specify the layout and its parameters in the API, you have to use the layout_name
and lyt_params
fields of the pho_xfer_put_params
structure.
In the configuration file (see Configure Phobos for more information), especially in the store
section, you can define the default_layout
key. The associated value will be used as the default layout for putting data if not provided by the client in the API or CLI.
In case a default layout is given in the configuration file and a layout is provided at put
, the later will be used if valid.
Moreover, you can also define in the configuration file the default parameters to use. Those are however defined per layout, meaning you have to put them in another section, called [layout_<layout]
, and define in it the key lyt-params
with the value a list of comma-separated key=value.
Here is an example for these sections:
[store]
default_layout = raid1
[layout_raid1]
repl_count = 2
However, do note that if the layout parameters have been provided by the user during the put
call, the ones in the configuration file will not be used, even though they may specify more information than given by the client.
Phobos also provides a way to generate templates to ease the process of putting data.
These templates are called alias
, and are defined in the configuration file as their own section, in the form of [alias "<alias name>"]
.
The aliases can define multiple parameters, which are the family to use, the layout, layout parameters and tags for the put
call (which will we use to write data on a specific subset of media).
For instance, in the configuration file we use for testing, we define the simple
alias, which corresponds to just writing data in Phobos without additional copies. Its definition is the following:
[alias “simple”]
family = tape
layout = raid1
lyt-params = repl_count=1
tags = foo-tag
Moreover, the phobos put
command also has an option to specify the alias to use.
For instance:
phobos put --alias simple file.in obj123
This will read the configuration file and use the information specified for that alias to complete the rest of the put
call.
Moreover, just like you can define a default storage layout in the store section, you can also define a default alias, so that a put
without option will still work out.
For instance:
[store]
default_alias = simple
Note that also like the default storage layout, if the alias is given by the client, the default alias specified in the configuration file will not be used.
As explained above, when putting data in Phobos, you can do it so that there are multiple copies of it on different media.
Moreover, when putting data, Phobos will first attempt to find a medium that has enough space for the whole content of the file, but if it cannot, it will try to find a combination of media with enough space to do so. These parts of an object which are not replicates of the data are called splits
.
Finally, Phobos can manage having replica of splits: if you do not have a medium large enough to hold all the data of one object, but still want to replicate an object, as long as you have enough media, Phobos will divide the object in multiple splits and replicate those splits on different media.
Both of these concepts (replicas and splits) are called extents
. Simply put, they represent where data is stored on a medium, and are the lowest level of data representation.
For these information, Phobos provides the phobos extent list
command.
This command can be provided with multiple options:
-
--output <column_name>
to only show the given attributes of the outputted extents (if not given, will only output the object ids, if givenall
, will output every attribute), -
--format <format>
to output information in the specified format, defaults to "human", -
--pattern "<pattern>"
to filter extents based on if theoid
respect a certain pattern, -
--name <medium_name>
to filter all extents based on if they are on a given medium, -
--degroup
to list by extents rather than by objects.
Note: The accepted patterns are Basic Regular Expressions (BRE) and Extended Regular Expressions (ERE). As defined in PostgreSQL manual, PSQL also accepts Advanced Regular Expressions (ARE), but we will not maintain this feature as ARE is not a POSIX standard.