Skip to content

Advanced usage

Yoann Valeri edited this page Mar 14, 2023 · 6 revisions

Please note that in this page, we will use the term configuration file to mean ways to configure Phobos. This term encompasses both the actual configuration file and the alternative configuration methods as explained here.

General architecture

To properly function, Phobos uses a daemon called the phobosd (internally called the Local Resource Scheduler (LRS)). It centralizes all requests on its node and schedules them in an optimized way. However, it is not responsible for any I/O, which are left to the client, but rather reserves media, handle load/unload of the later on devices, and the mounting of the underlying filesystems.

image

When trying to put data in Phobos, a client will thus ask the phobosd for media where to do the I/O. In turn, the phobosd will reserve those media, load/mount them if necessary, and return them back to the client. The client will then actually perform the I/O, and when done, will notify the daemon is the operation is over.

To properly manage and keep the system information, Phobos also uses a component called the Distributed State Service (DSS) which is responsible for interaction with a database. This component is used both by the clients when trying to put data (for instance to check whether an object already exists) and by the phobosd (to check the devices already known at startup for example).

With this architecture in mind, we define a Phobos Server as:

  • a set of devices to access a media library, shared or not,
  • a Phobos service (the phobosd as explained above) to schedule I/Os,
  • a tool/script which receives I/O requests and call Phobos client API.

image

Using this definition of a Phobos server, we can then parallelize Phobos so that it runs on multiples servers, synchronized by the DSS as a common database. I/O distribution is then done by a front-end which has access to every server and can chose which to use with the locate feature, as explained in Store commands.

image

Storage layouts

When putting data in Phobos, you can specify which storage layout you want and characteristics about this layout.

Currently though, Phobos can only manage the raid1 storage layout for mirroring, and the number of replica. In Phobos, this value is the number of data replicates, so a replica count of 1 means that there is only one copy of the data (the original), and 0 additional copies of it. Therefore, a replica count of 2 means the original copy of the data, plus 1 additional copy.

Specifying the storage layout to use and its parameter can be done in multiple ways.

Layout specification at put

To specify the layout to use with the CLI, you can use the --layout option of the phobos put command, with the value being the name of the layout to use.

Alongside it, you can also provide the --lyt-params option, with a list of comma-separated key=value. These parameters will be interpreted by the layout, so if a given key/value couple does not make sense for the layout, it will be ignored.

Here is an example of a put call with both parameters given, which aims to put the data on 4 different media:

phobos put --layout raid1 --lyt-params repl_count=4 /etc/hosts blob

Similarly, to specify the layout and its parameters in the API, you have to use the layout_name and lyt_params fields of the pho_xfer_put_params structure.

Default layout in the configuration file

In the configuration file (see Configure Phobos for more information), especially in the store section, you can define the default_layout key. The associated value will be used as the default layout for putting data if not provided by the client in the API or CLI.

In case a default layout is given in the configuration file and a layout is provided at put, the later will be used if valid.

Moreover, you can also define in the configuration file the default parameters to use. Those are however defined per layout, meaning you have to put them in another section, called [layout_<layout], and define in it the key lyt-params with the value a list of comma-separated key=value.

Here is an example for these sections:

[store]
default_layout = raid1

[layout_raid1]
repl_count = 2 

However, do note that if the layout parameters have been provided by the user during the put call, the ones in the configuration file will not be used, even though they may specify more information than given by the client.

Default aliases

Phobos also provides a way to generate templates to ease the process of putting data. These templates are called alias, and are defined in the configuration file as their own section, in the form of [alias "<alias name>"].

The aliases can define multiple parameters, which are the family to use, the layout, layout parameters and tags for the put call (which will we use to write data on a specific subset of media).

For instance, in the configuration file we use for testing, we define the simple alias, which corresponds to just writing data in Phobos without additional copies. Its definition is the following:

[alias “simple”]
family = tape
layout = raid1
lyt-params = repl_count=1
tags = foo-tag

Moreover, the phobos put command also has an option to specify the alias to use. For instance:

phobos put --alias simple file.in obj123

This will read the configuration file and use the information specified for that alias to complete the rest of the put call.

Moreover, just like you can define a default storage layout in the store section, you can also define a default alias, so that a put without option will still work out. For instance:

[store]
default_alias = simple

Note that also like the default storage layout, if the alias is given by the client, the default alias specified in the configuration file will not be used.

Extents

As explained above, when putting data in Phobos, you can do it so that there are multiple copies of it on different media. Moreover, when putting data, Phobos will first attempt to find a medium that has enough space for the whole content of the file, but if it cannot, it will try to find a combination of media with enough space to do so. These parts of an object which are not replicates of the data are called splits. Finally, Phobos can manage having replica of splits: if you do not have a medium large enough to hold all the data of one object, but still want to replicate an object, as long as you have enough media, Phobos will divide the object in multiple splits and replicate those splits on different media.

Both of these concepts (replicas and splits) are called extents. Simply put, they represent where data is stored on a medium, and are the lowest level of data representation.

For these information, Phobos provides the phobos extent list command.

This command can be provided with multiple options:

  • --output <column_name> to only show the given attributes of the outputted extents (if not given, will only output the object ids, if given all, will output every attribute),
  • --format <format> to output information in the specified format, defaults to "human",
  • --pattern "<pattern>" to filter extents based on if the oid respect a certain pattern,
  • --name <medium_name> to filter all extents based on if they are on a given medium,
  • --degroup to list by extents rather than by objects.

Note: The accepted patterns are Basic Regular Expressions (BRE) and Extended Regular Expressions (ERE). As defined in PostgreSQL manual, PSQL also accepts Advanced Regular Expressions (ARE), but we will not maintain this feature as ARE is not a POSIX standard.