This chapter explains how to configure dCache to allow FTP, a common network protocol that many clients support.
FTP is a long established protocol that allows clients to transfer files, and manage files and directories within dCache. FTP was originally specified without any encryption, with later standards adding support for encrypted communication. FTP differs from many other protocols by using separate TCP connections for issuing commands (the control channel) and transferring file data (the data channel).
Various extensions to FTP exist to support additional functionality. These are typically backwards compatible, allowing the ftp door to work with clients that support the extension in addition to those that don't.
The control channel is the TCP connection established by the client over which the client issues commands and receives replies indicating whether those commands were successful.
In general, dCache supports four flavours of control channel: plain
,
tls
(also known as FTPS), gsiftp
(also known as GridFTP), and
kerberos
. Each FTP door supports exactly one of these flavours.
These flavours differ in how the control channel is handled. In plain
FTP, the control channel is unencrypted; in many cases, this is
insecure and requires additional protection. With tls, gsiftp and
Kerberos FTP, the control channel is encrypted, preventing
eavesdropping or interfering with requests. Authentication with tls
is based on username and password, gsiftp is based on X.509
credentials, while Kerberos FTP uses Kerberos.
Although tls and gsi FTP doors are both X.509 based, they differ in how the encryption is handled. Support for tls FTP is more common and is often referred to as FTPS, FTP(E)S, FTPS-explicit or FTPES. Support for gsi FTP is limited to grid tools.
The door may be configured to accept network connections only from
specific clients. This is perhaps most useful with plain
(unencrypted) FTP, but may be used with all flavours. The
configuration property ftp.net.allowed-subnets
is a space-separated
list of either IP addresses of subnets (written in CIDR notation).
A client may request a directory listing. In the original FTP specification, the format of a directory listing was unspecified. Different FTP servers could respond in different ways. Returning the output of 'ls -l' became a de facto standard, although different implementations of the 'ls' command also different in their response.
Some clients exist that attempt to parse the directory listing, using various heuristics to guess in which format the server is replying.
dCache supports two formats: a legacy format and a format that more
closely emulates the output from 'ls -l'. The ftp.list-format
configuration property controls which format is returned.
Subsequent extensions to FTP support directory listing in a precise, prescribed format. This extension removes any ambiguity and allows clients to work with different servers. dCache supports this extension.
It is possible for dCache to run several ftp doors of the same type. These doors could be accessed through a DNS alias or through an haproxy server.
If a load-balancer is used then many support sending the client's IP
address when establishing the control channel using the ha-proxy
protocol. If the door receives such messages then must be configured
to process such message via the ftp.enable.proxy-protocol
configuration property. The door accepts both version 1 and version 2
of the ha-proxy protocol.
The FTP door will limit the number of concurrent connections. This is
controlled by the ftp.limits.clients
configuration property.
Anonymous access (also know as anonymous FTP or anonFTP) is a long established practice where publicly available data is made available to anyone who wants it. The client authenticates with a specific username (typically 'anonymous'). Although there is no specific password for these accounts, it is common practice that the client sends the user's email address as the password, as a courtesy.
dCache supports anonymous FTP for the plain and tls FTP doors. This
is disabled by default, but may be enabled using the
ftp.enable.anonymous-ftp
configuration property. When enabled,
users may access dCache as user NOBODY; e.g., world-readable files may
be downloaded and world-readable directories may be listed.
dCache may be configured to expose only part of the namespace by
configuring an anonymous-ftp specific root directory. This is
controlled by the ftp.anonymous-ftp.root
property. When someone
uses the anonymous FTP service then the root directory they see is
whichever path is configured in this property.
The name of the anonymous account is configured with the
ftp.authn.anonymous.user
property. If a regular dCache user has the
same username as this property then that dCache user will no longer be
able to log into their account via FTP username and password
authentication. The default value is "anonymous", which is the widely
accepted account name for anonymous access: many FTP clients will use
this name automatically.
The ftp.authn.anonymous.require-email-password
configuration
property controls whether to reject anonymous login attempts where the
password is not a valid email address. Note that, Globus transfer
service currently sends "dummy" as the password, which is not a valid
email address.
If the plain or tls FTP door should be used only for anonymous access
then regular username and password access may be disabled by
configuring the ftp.enable.username-password
property. This is
perhaps most useful with plain FTP doors to prevent normal dCache
users from typing in their password unencrypted.
Transferring a file with FTP involves establishing one (or more) TCP connect over which the data will travel. These are independent of the control channel.
The TCP connection over which a file's data travels (the data channels) are either established by dCache or by the client. Transfers where the data channel is establish by dCache are called active transfers; those where the client establishes the data channel are called passive transfers.
The FTP specification is clear that an upload that targets an existing file should overwrite that file (provided the user is authorisation to do so). However, historically, dCache has preferred to fail such transfers.
Currently, the ftp.enable.overwrite
configuration property controls
whether or not dCache allows clients to overwrite existing files when
uploading data. By default, this is allowed.
There are extensions to basic the FTP protocol (called GridFTP) that allow the ftp door to redirect the client to the pool. Clients that support this GridFTP extension will advertise their support, allowing dCache to establish a transfer directly between the client and the pool.
If the client does not support GridFTP, or is configured not to redirect, or if the ftp door is configured not to redirect the client then the door will create a proxy for data transfers. The data channel(s) will be established between the door and the client and an additional TCP connection is established between the door and the pool.
There are two configuration options that control whether a proxy is used for transfers: ftp.proxy.on-passive and ftp.proxy.on-active. These control whether a proxy is used for passive and active data channels, respectively. If set to 'true' then those transfers will always use a proxy, irrespective of whether the client supports the GridFTP extension.
A transfer involving a proxy requires more CPU and memory on the node hosting the ftp door. It also increases the network traffic that the door sees, which may become a performance bottleneck. Therefore, it is desirable not to proxy transfers, if possible. If proxying is necessary then this increased requirements may be managed by spread the load over multiple ftp doors.
Data transfers may use one of two modes: MODE S and MODE E.
In MODE S, a single TCP connection is established to transfer a file's data. The TCP connection is closed once all the file's data is sent. This is the standard way of delivering data and most commonly supported.
MODE E is part of the GridFTP extension. It allows multiple TCP connection to be used when transferring a file's data. It also allows those TCP connections to be reused when sending multiple files.
The configuration property ftp.limits.streams-per-client
controls
the maximum number of TCP connections that dCache will establish for
active MODE E transfers. For passive MODE E transfers, the client
establishes the data channels, so dCache cannot control the number of
streams.
For proxied transfers involving MODE E, dCache will try to keep any established data channels open. This is an optimisation especially for smaller files, where the time needed to establish a data channel is a significant compared to the time needed to transfer the file's data.
An aborted transfer is one where the client requested a new file be uploaded or all data from an existing file be downloaded, but only part of the file's contents were transferred. Such aborted transfers normally indicate a problem. As the problem may be transitory, it may not be clear what caused it and therefore it may be hard to recreate the problem.
To aid in fixing aborted transfers, the door can provide considerable
information about a transfer at the point the transfer is aborted.
This is currently disabled by default, but may be controlled with the
ftp.enable.log-aborted-transfers
configuration property.
Hosting an FTP server in a domain is as simple as:
[<domainName>/ftp]
The 'ftp.authn.protocol' configuration property controls which flavour of FTP server should be started; for example, to start a gsiftp server:
[<domainName>/nfs]
ftp.authn.protocol = gsi
There are distinct default TCP ports on which the different flavours of FTP server will listen, so a single host may run multiple FTP doors without requiring any port configuration:
[<domainName>/nfs]
ftp.authn.protocol = plain
[<domainName>/nfs]
ftp.authn.protocol = gsi
[<domainName>/nfs]
ftp.authn.protocol = kerberos
[<domainName>/nfs]
ftp.authn.protocol = tls