-
Notifications
You must be signed in to change notification settings - Fork 0
Setting_Cassandra_Data_File_Locations
Joe Winter edited this page Sep 4, 2014
·
1 revision
Cassandra Configuration and Operation : Cassandra Configuration Files : Setting Cassandra Data File Locations
Setting Cassandra Data File Locations
Cassandra creates three kinds of data files: commit logs, SSTables, and saved caches. (Somewhat confusingly, the SSTable files are sometimes called “data files” even though all three kinds of files hold data.) The folder path of each file kind is defined in cassandra.yaml. You should change all of the following folder options:
commitlog_directory: /var/lib/cassandra/commitlog
Set this option to the folder where commit logs should be stored. It should be a different disk than any of the SSTable disks (see below). Multiple commit logs are created in this folder, but they are deleted when they become obsolete, so typically commit logs do not require a lot of space.
data_file_directories:
- /var/lib/cassandra/data
Set this option to at least one root folder where SSTable files are to be stored. SSTables are the primary files containing application data. Each folder is listed on a secondary line, indented and beginning with a dash. Multiple data folders are recommended for better performance (see below).
saved_caches_directory: /var/lib/cassandra/saved_caches
Set this option to a valid folder name where Cassandra will save key and row caches that it builds. It can be the same disk as the commit log or where software is installed, but it shouldn’t be one of the SSTable disks. The size of disk space for caches depend on cache option settings.
When updates are sent to Cassandra, they are first written to a commit log file. The commit files are “replayed” when a restart occurs, thereby providing recovery for updates that may not have been written to an SSTable file. Because commit logs are removed when they are no longer needed, they typically do not use much disk space.
After updates are written to the commit log, they are stored in memory and eventually sorted and flushed to disk as SSTables. Each SSTable is represented by multiple files including data, hash, and index files. When Cassandra is configured with multiple data file directories, it flushes each SSTable to the directory that has the most available space. Therefore, best practices for the commit log and SSTable files are:
1. |
Each SSTable folder should reside on a separate disk. This allows concurrent I/Os: a separate I/O can be initiated for each disk. |
2. |
Each SSTable disk should be of the same size and used solely for SSTables. This prevents disk contention with other files, and it allows all disks to grow at the same rate. |
3. |
The commit log folder should reside on its own disk. Because data is flushed quickly as it is received, the commit log folder can receive a high volume of I/O, hence it should use its own disk to prevent contention with SSTable files. The disk does not have to be large since commit logs are discarded fairly quickly. |