layout | title | permalink | redirect_from | ||
---|---|---|---|---|---|
post |
PYTHON SDK |
/docs/python-sdk |
|
AIStore Python SDK is a growing set of client-side objects and methods to access and utilize AIS clusters. This document contains API documentation for the AIStore Python SDK.
For our PyTorch integration, please refer to the PyTorch Docs. For more information, please refer to AIS Python SDK available via Python Package Index (PyPI) or see https://github.com/NVIDIA/aistore/tree/main/python/aistore.
- authn.authn_client
- authn.cluster_manager
- authn.role_manager
- authn.token_manager
- authn.user_manager
- authn.access_attr
- bucket
- client
- cluster
- etl
- job
- multiobj.object_group
- multiobj.object_names
- multiobj.object_range
- multiobj.object_template
- obj.object
- obj.object_reader
- obj.object_file
- obj.object_props
- obj.object_attributes
class AuthNClient()
AuthN client for managing authentication.
This client provides methods to interact with AuthN Server. For more info on AuthN Server, see https://github.com/NVIDIA/aistore/blob/main/docs/authn.md
Arguments:
endpoint
str - AuthN service endpoint URL.skip_verify
bool, optional - If True, skip SSL certificate verification. Defaults to False.ca_cert
str, optional - Path to a CA certificate file for SSL verification.timeout
Union[float, Tuple[float, float], None], optional - Request timeout in seconds; a single float for both connect/read timeouts (e.g., 5.0), a tuple for separate connect/read timeouts (e.g., (3.0, 10.0)), or None to disable timeout.retry
urllib3.Retry, optional - Retry configuration object from the urllib3 library.token
str, optional - Authorization token.
@property
def client() -> RequestClient
Get the request client.
Returns:
RequestClient
- The client this AuthN client uses to make requests.
def login(username: str,
password: str,
expires_in: Optional[Union[int, float]] = None) -> str
Logs in to the AuthN Server and returns an authorization token.
Arguments:
username
str - The username to log in with.password
str - The password to log in with.expires_in
Optional[Union[int, float]] - The expiration duration of the token in seconds.
Returns:
str
- An authorization token to use for future requests.
Raises:
ValueError
- If the password is empty or consists only of spaces.AISError
- If the login request fails.
def logout() -> None
Logs out and revokes current token from the AuthN Server.
Raises:
AISError
- If the logout request fails.
def cluster_manager() -> ClusterManager
Factory method to create a ClusterManager instance.
Returns:
ClusterManager
- An instance to manage cluster operations.
def role_manager() -> RoleManager
Factory method to create a RoleManager instance.
Returns:
RoleManager
- An instance to manage role operations.
def user_manager() -> UserManager
Factory method to create a UserManager instance.
Returns:
UserManager
- An instance to manage user operations.
def token_manager() -> TokenManager
Factory method to create a TokenManager instance.
Returns:
TokenManager
- An instance to manage token operations.
class ClusterManager()
ClusterManager class for handling operations on clusters within the context of authentication.
This class provides methods to list, get, register, update, and delete clusters on AuthN server.
Arguments:
client
RequestClient - The request client to make HTTP requests.
@property
def client() -> RequestClient
RequestClient: The client this cluster manager uses to make requests.
def list() -> ClusterList
Retrieve all clusters.
Returns:
ClusterList
- A list of all clusters.
Raises:
AISError
- If an error occurs while listing clusters.
def get(cluster_id: Optional[str] = None,
cluster_alias: Optional[str] = None) -> ClusterInfo
Retrieve a specific cluster by ID or alias.
Arguments:
cluster_id
Optional[str] - The ID of the cluster. Defaults to None.cluster_alias
Optional[str] - The alias of the cluster. Defaults to None.
Returns:
ClusterInfo
- Information about the specified cluster.
Raises:
ValueError
- If neither cluster_id nor cluster_alias is provided.RuntimeError
- If no cluster matches the provided ID or alias.AISError
- If an error occurs while getting the cluster.
def register(cluster_alias: str, urls: List[str]) -> ClusterInfo
Register a new cluster.
Arguments:
cluster_alias
str - The alias for the new cluster.urls
List[str] - A list of URLs for the new cluster.
Returns:
ClusterInfo
- Information about the registered cluster.
Raises:
ValueError
- If no URLs are provided or an invalid URL is provided.AISError
- If an error occurs while registering the cluster.
def update(cluster_id: str,
cluster_alias: Optional[str] = None,
urls: Optional[List[str]] = None) -> ClusterInfo
Update an existing cluster.
Arguments:
cluster_id
str - The ID of the cluster to update.cluster_alias
Optional[str] - The new alias for the cluster. Defaults to None.urls
Optional[List[str]] - The new list of URLs for the cluster. Defaults to None.
Returns:
ClusterInfo
- Information about the updated cluster.
Raises:
ValueError
- If neither cluster_alias nor urls are provided.AISError
- If an error occurs while updating the cluster
def delete(cluster_id: Optional[str] = None,
cluster_alias: Optional[str] = None)
Delete a specific cluster by ID or alias.
Arguments:
cluster_id
Optional[str] - The ID of the cluster to delete. Defaults to None.cluster_alias
Optional[str] - The alias of the cluster to delete. Defaults to None.
Raises:
ValueError
- If neither cluster_id nor cluster_alias is provided.AISError
- If an error occurs while deleting the cluster
class RoleManager()
Manages role-related operations.
This class provides methods to interact with roles, including retrieving, creating, updating, and deleting role information.
Arguments:
client
RequestClient - The RequestClient used to make HTTP requests.
@property
def client() -> RequestClient
Returns the RequestClient instance used by this RoleManager.
def list() -> RolesList
Retrieves information about all roles.
Returns:
RoleList
- A list containing information about all roles.
Raises:
aistore.sdk.errors.AISError
- All other types of errors with AIStore.requests.RequestException
- If the HTTP request fails.
def get(role_name: str) -> RoleInfo
Retrieves information about a specific role.
Arguments:
role_name
str - The name of the role to retrieve.
Returns:
RoleInfo
- Information about the specified role.
Raises:
aistore.sdk.errors.AISError
- All other types of errors with AIStore.requests.RequestException
- If the HTTP request fails.
def create(name: str,
desc: str,
cluster_alias: str,
perms: List[AccessAttr],
bucket_name: str = None) -> RoleInfo
Creates a new role.
Arguments:
name
str - The name of the role.desc
str - A description of the role.cluster_alias
str - The alias of the cluster this role will have access to.perms
List[AccessAttr] - A list of permissions to be granted for this role.bucket_name
str, optional - The name of the bucket this role will have access to.
Returns:
RoleInfo
- Information about the newly created role.
Raises:
aistore.sdk.errors.AISError
- All other types of errors with AIStore.requests.RequestException
- If the HTTP request fails.
def update(name: str,
desc: str = None,
cluster_alias: str = None,
perms: List[AccessAttr] = None,
bucket_name: str = None) -> RoleInfo
Updates an existing role.
Arguments:
name
str - The name of the role.desc
str, optional - An updated description of the role.cluster_alias
str, optional - The alias of the cluster this role will have access to.perms
List[AccessAttr], optional - A list of updated permissions to be granted for this role.bucket_name
str, optional - The name of the bucket this role will have access to.
Raises:
aistore.sdk.errors.AISError
- All other types of errors with AIStore.requests.RequestException
- If the HTTP request fails.ValueError
- If the role does not exist or if invalid parameters are provided.
def delete(name: str, missing_ok: bool = False) -> None
Deletes a role.
Arguments:
name
str - The name of the role to delete.missing_ok
bool - Ignore error if role does not exist. Defaults to False
Raises:
aistore.sdk.errors.AISError
- All other types of errors with AIStore.requests.RequestException
- If the HTTP request fails.ValueError
- If the role does not exist.
class TokenManager()
Manages token-related operations.
This class provides methods to interact with tokens in the AuthN server. .
Arguments:
client
RequestClient - The RequestClient used to make HTTP requests.
@property
def client() -> RequestClient
Returns the RequestClient instance used by this TokenManager.
def revoke(token: str) -> None
Revokes the specified authentication token.
Arguments:
token
str - The token to be revoked.
Raises:
ValueError
- If the token is not provided.AISError
- If the revoke token request fails.
class UserManager()
UserManager provides methods to manage users in the AuthN service.
Arguments:
client
RequestClient - The RequestClient used to make HTTP requests.
@property
def client() -> RequestClient
Returns the RequestClient instance used by this UserManager.
def get(username: str) -> UserInfo
Retrieve user information from the AuthN Server.
Arguments:
username
str - The username to retrieve.
Returns:
UserInfo
- The user's information.
Raises:
AISError
- If the user retrieval request fails.
def delete(username: str, missing_ok: bool = False) -> None
Delete an existing user from the AuthN Server.
Arguments:
username
str - The username of the user to delete.missing_ok
bool - Ignore error if user does not exist. Defaults to False.
Raises:
AISError
- If the user deletion request fails.
def create(username: str, roles: List[str], password: str) -> UserInfo
Create a new user in the AuthN Server.
Arguments:
username
str - The name or ID of the user to create.password
str - The password for the user.roles
List[str] - The list of names of roles to assign to the user.
Returns:
UserInfo
- The created user's information.
Raises:
AISError
- If the user creation request fails.
def list()
List all users in the AuthN Server.
Returns:
str
- The list of users in the AuthN Server.
Raises:
AISError
- If the user list request fails.
def update(username: str,
password: Optional[str] = None,
roles: Optional[List[str]] = None) -> UserInfo
Update an existing user's information in the AuthN Server.
Arguments:
username
str - The ID of the user to update.password
str, optional - The new password for the user.roles
List[str], optional - The list of names of roles to assign to the user.
Returns:
UserInfo
- The updated user's information.
Raises:
AISError
- If the user update request fails.
class AccessAttr(IntFlag)
AccessAttr defines permissions as bitwise flags for access control (for more details, refer to the Go API).
@staticmethod
def describe(perms: int) -> str
Returns a comma-separated string describing the permissions based on the provided bitwise flags.
class Bucket(AISSource)
A class representing a bucket that contains user data.
Arguments:
client
RequestClient - Client for interfacing with AIS clustername
str - name of bucketprovider
str, optional - Provider of bucket (one of "ais", "aws", "gcp", ...), defaults to "ais"namespace
Namespace, optional - Namespace of bucket, defaults to None
@property
def client() -> RequestClient
The client bound to this bucket.
@client.setter
def client(client) -> RequestClient
Update the client bound to this bucket.
@property
def qparam() -> Dict
Default query parameters to use with API calls from this bucket.
@property
def provider() -> str
The provider for this bucket.
@property
def name() -> str
The name of this bucket.
@property
def namespace() -> Namespace
The namespace for this bucket.
def list_urls(prefix: str = "", etl_name: str = None) -> Iterable[str]
Implementation of the abstract method from AISSource that provides an iterator of full URLs to every object in this bucket matching the specified prefix
Arguments:
prefix
str, optional - Limit objects selected by a given string prefixetl_name
str, optional - ETL to include in URLs
Returns:
Iterator of full URLs of all objects matching the prefix
def list_all_objects_iter(prefix: str = "",
props: str = "name,size") -> Iterable[Object]
Implementation of the abstract method from AISSource that provides an iterator of all the objects in this bucket matching the specified prefix.
Arguments:
prefix
str, optional - Limit objects selected by a given string prefixprops
str, optional - Comma-separated list of object properties to return. Default value is "name,size".Properties
- "name", "size", "atime", "version", "checksum", "target_url", "copies".
Returns:
Iterator of all object URLs matching the prefix
def create(exist_ok=False)
Creates a bucket in AIStore cluster. Can only create a bucket for AIS provider on localized cluster. Remote cloud buckets do not support creation.
Arguments:
exist_ok
bool, optional - Ignore error if the cluster already contains this bucket
Raises:
aistore.sdk.errors.AISError
- All other types of errors with AIStoreaistore.sdk.errors.InvalidBckProvider
- Invalid bucket provider for requested operationrequests.ConnectionError
- Connection errorrequests.ConnectionTimeout
- Timed out connecting to AIStorerequests.exceptions.HTTPError
- Service unavailablerequests.RequestException
- "There was an ambiguous exception that occurred while handling..."requests.ReadTimeout
- Timed out receiving response from AIStore
def delete(missing_ok=False)
Destroys bucket in AIStore cluster. In all cases removes both the bucket's content and the bucket's metadata from the cluster. Note: AIS will not call the remote backend provider to delete the corresponding Cloud bucket (iff the bucket in question is, in fact, a Cloud bucket).
Arguments:
missing_ok
bool, optional - Ignore error if bucket does not exist
Raises:
aistore.sdk.errors.AISError
- All other types of errors with AIStoreaistore.sdk.errors.InvalidBckProvider
- Invalid bucket provider for requested operationrequests.ConnectionError
- Connection errorrequests.ConnectionTimeout
- Timed out connecting to AIStorerequests.exceptions.HTTPError
- Service unavailablerequests.RequestException
- "There was an ambiguous exception that occurred while handling..."requests.ReadTimeout
- Timed out receiving response from AIStore
def rename(to_bck_name: str) -> str
Renames bucket in AIStore cluster. Only works on AIS buckets. Returns job ID that can be used later to check the status of the asynchronous operation.
Arguments:
to_bck_name
str - New bucket name for bucket to be renamed as
Returns:
Job ID (as str) that can be used to check the status of the operation
Raises:
aistore.sdk.errors.AISError
- All other types of errors with AIStoreaistore.sdk.errors.InvalidBckProvider
- Invalid bucket provider for requested operationrequests.ConnectionError
- Connection errorrequests.ConnectionTimeout
- Timed out connecting to AIStorerequests.exceptions.HTTPError
- Service unavailablerequests.RequestException
- "There was an ambiguous exception that occurred while handling..."requests.ReadTimeout
- Timed out receiving response from AIStore
def evict(keep_md: bool = False)
Evicts bucket in AIStore cluster. NOTE: only Cloud buckets can be evicted.
Arguments:
keep_md
bool, optional - If true, evicts objects but keeps the bucket's metadata (i.e., the bucket's name and its properties)
Raises:
aistore.sdk.errors.AISError
- All other types of errors with AIStoreaistore.sdk.errors.InvalidBckProvider
- Invalid bucket provider for requested operationrequests.ConnectionError
- Connection errorrequests.ConnectionTimeout
- Timed out connecting to AIStorerequests.exceptions.HTTPError
- Service unavailablerequests.RequestException
- "There was an ambiguous exception that occurred while handling..."requests.ReadTimeout
- Timed out receiving response from AIStore
def head() -> Header
Requests bucket properties.
Returns:
Response header with the bucket properties
Raises:
aistore.sdk.errors.AISError
- All other types of errors with AIStorerequests.ConnectionError
- Connection errorrequests.ConnectionTimeout
- Timed out connecting to AIStorerequests.exceptions.HTTPError
- Service unavailablerequests.RequestException
- "There was an ambiguous exception that occurred while handling..."requests.ReadTimeout
- Timed out receiving response from AIStore
def summary(uuid: str = "",
prefix: str = "",
cached: bool = True,
present: bool = True)
Returns bucket summary (starts xaction job and polls for results).
Arguments:
uuid
str - Identifier for the bucket summary. Defaults to an empty string.prefix
str - Prefix for objects to be included in the bucket summary. Defaults to an empty string (all objects).cached
bool - If True, summary entails cached entities. Defaults to True.present
bool - If True, summary entails present entities. Defaults to True.
Raises:
requests.ConnectionError
- Connection errorrequests.ConnectionTimeout
- Timed out connecting to AIStorerequests.exceptions.HTTPError
- Service unavailablerequests.RequestException
- "There was an ambiguous exception that occurred while handling..."requests.ReadTimeout
- Timed out receiving response from AIStoreaistore.sdk.errors.AISError
- All other types of errors with AIStore
def info(flt_presence: int = FLTPresence.FLT_EXISTS,
bsumm_remote: bool = True,
prefix: str = "")
Returns bucket summary and information/properties.
Arguments:
flt_presence
FLTPresence - Describes the presence of buckets and objects with respect to their existence or non-existence in the AIS cluster using the enum FLTPresence. Defaults to value FLT_EXISTS and values are: FLT_EXISTS - object or bucket exists inside and/or outside cluster FLT_EXISTS_NO_PROPS - same as FLT_EXISTS but no need to return summary FLT_PRESENT - bucket is present or object is present and properly located FLT_PRESENT_NO_PROPS - same as FLT_PRESENT but no need to return summary FLT_PRESENT_CLUSTER - objects present anywhere/how in the cluster as replica, ec-slices, misplaced FLT_EXISTS_OUTSIDE - not present; exists outside clusterbsumm_remote
bool - If True, returned bucket info will include remote objects as wellprefix
str - Only include objects with the given prefix in the bucket
Raises:
requests.ConnectionError
- Connection errorrequests.ConnectionTimeout
- Timed out connecting to AIStorerequests.exceptions.HTTPError
- Service unavailablerequests.RequestException
- "There was an ambiguous exception that occurred while handling..."requests.ReadTimeout
- Timed out receiving response from AIStoreValueError
-flt_presence
is not one of the expected valuesaistore.sdk.errors.AISError
- All other types of errors with AIStore
def copy(to_bck: Bucket,
prefix_filter: str = "",
prepend: str = "",
dry_run: bool = False,
force: bool = False,
latest: bool = False,
sync: bool = False) -> str
Returns job ID that can be used later to check the status of the asynchronous operation.
Arguments:
to_bck
Bucket - Destination bucketprefix_filter
str, optional - Only copy objects with names starting with this prefixprepend
str, optional - Value to prepend to the name of copied objectsdry_run
bool, optional - Determines if the copy should actually happen or notforce
bool, optional - Override existing destination bucketlatest
bool, optional - GET the latest object version from the associated remote bucketsync
bool, optional - synchronize destination bucket with its remote (e.g., Cloud or remote AIS) source
Returns:
Job ID (as str) that can be used to check the status of the operation
Raises:
aistore.sdk.errors.AISError
- All other types of errors with AIStorerequests.ConnectionError
- Connection errorrequests.ConnectionTimeout
- Timed out connecting to AIStorerequests.exceptions.HTTPError
- Service unavailablerequests.RequestException
- "There was an ambiguous exception that occurred while handling..."requests.ReadTimeout
- Timed out receiving response from AIStore
def list_objects(prefix: str = "",
props: str = "",
page_size: int = 0,
uuid: str = "",
continuation_token: str = "",
flags: List[ListObjectFlag] = None,
target: str = "") -> BucketList
Returns a structure that contains a page of objects, job ID, and continuation token (to read the next page, if available).
Arguments:
prefix
str, optional - Return only objects that start with the prefixprops
str, optional - Comma-separated list of object properties to return. Default value is "name,size".Properties
- "name", "size", "atime", "version", "checksum", "cached", "target_url", "status", "copies", "ec", "custom", "node".page_size
int, optional - Return at most "page_size" objects. The maximum number of objects in response depends on the bucket backend. E.g, AWS bucket cannot return more than 5,000 objects in a single page.NOTE
- If "page_size" is greater than a backend maximum, the backend maximum objects are returned. Defaults to "0" - return maximum number of objects.uuid
str, optional - Job ID, required to get the next page of objectscontinuation_token
str, optional - Marks the object to start reading the next pageflags
List[ListObjectFlag], optional - Optional list of ListObjectFlag enums to include as flags in the request target(str, optional): Only list objects on this specific target node
Returns:
BucketList
- the page of objects in the bucket and the continuation token to get the next page Empty continuation token marks the final page of the object list
Raises:
aistore.sdk.errors.AISError
- All other types of errors with AIStorerequests.ConnectionError
- Connection errorrequests.ConnectionTimeout
- Timed out connecting to AIStorerequests.exceptions.HTTPError
- Service unavailablerequests.RequestException
- "There was an ambiguous exception that occurred while handling..."requests.ReadTimeout
- Timed out receiving response from AIStore
def list_objects_iter(prefix: str = "",
props: str = "",
page_size: int = 0,
flags: List[ListObjectFlag] = None,
target: str = "") -> ObjectIterator
Returns an iterator for all objects in bucket
Arguments:
prefix
str, optional - Return only objects that start with the prefixprops
str, optional - Comma-separated list of object properties to return. Default value is "name,size".Properties
- "name", "size", "atime", "version", "checksum", "cached", "target_url", "status", "copies", "ec", "custom", "node".page_size
int, optional - return at most "page_size" objects The maximum number of objects in response depends on the bucket backend. E.g, AWS bucket cannot return more than 5,000 objects in a single page.NOTE
- If "page_size" is greater than a backend maximum, the backend maximum objects are returned. Defaults to "0" - return maximum number objectsflags
List[ListObjectFlag], optional - Optional list of ListObjectFlag enums to include as flags in the request target(str, optional): Only list objects on this specific target node
Returns:
ObjectIterator
- object iterator
Raises:
aistore.sdk.errors.AISError
- All other types of errors with AIStorerequests.ConnectionError
- Connection errorrequests.ConnectionTimeout
- Timed out connecting to AIStorerequests.exceptions.HTTPError
- Service unavailablerequests.RequestException
- "There was an ambiguous exception that occurred while handling..."requests.ReadTimeout
- Timed out receiving response from AIStore
def list_all_objects(prefix: str = "",
props: str = "",
page_size: int = 0,
flags: List[ListObjectFlag] = None,
target: str = "") -> List[BucketEntry]
Returns a list of all objects in bucket
Arguments:
prefix
str, optional - return only objects that start with the prefixprops
str, optional - comma-separated list of object properties to return. Default value is "name,size".Properties
- "name", "size", "atime", "version", "checksum", "cached", "target_url", "status", "copies", "ec", "custom", "node".page_size
int, optional - return at most "page_size" objects The maximum number of objects in response depends on the bucket backend. E.g, AWS bucket cannot return more than 5,000 objects in a single page.NOTE
- If "page_size" is greater than a backend maximum, the backend maximum objects are returned. Defaults to "0" - return maximum number objectsflags
List[ListObjectFlag], optional - Optional list of ListObjectFlag enums to include as flags in the request target(str, optional): Only list objects on this specific target node
Returns:
List[BucketEntry]
- list of objects in bucket
Raises:
aistore.sdk.errors.AISError
- All other types of errors with AIStorerequests.ConnectionError
- Connection errorrequests.ConnectionTimeout
- Timed out connecting to AIStorerequests.exceptions.HTTPError
- Service unavailablerequests.RequestException
- "There was an ambiguous exception that occurred while handling..."requests.ReadTimeout
- Timed out receiving response from AIStore
def transform(etl_name: str,
to_bck: Bucket,
timeout: str = DEFAULT_ETL_TIMEOUT,
prefix_filter: str = "",
prepend: str = "",
ext: Dict[str, str] = None,
force: bool = False,
dry_run: bool = False,
latest: bool = False,
sync: bool = False) -> str
Visits all selected objects in the source bucket and for each object, puts the transformed result to the destination bucket
Arguments:
etl_name
str - name of etl to be used for transformationsto_bck
str - destination bucket for transformationstimeout
str, optional - Timeout of the ETL job (e.g. 5m for 5 minutes)prefix_filter
str, optional - Only transform objects with names starting with this prefixprepend
str, optional - Value to prepend to the name of resulting transformed objectsext
Dict[str, str], optional - dict of new extension followed by extension to be replaced (i.e. {"jpg": "txt"})dry_run
bool, optional - determines if the copy should actually happen or notforce
bool, optional - override existing destination bucketlatest
bool, optional - GET the latest object version from the associated remote bucketsync
bool, optional - synchronize destination bucket with its remote (e.g., Cloud or remote AIS) source
Returns:
Job ID (as str) that can be used to check the status of the operation
def put_files(path: str,
prefix_filter: str = "",
pattern: str = "*",
basename: bool = False,
prepend: str = None,
recursive: bool = False,
dry_run: bool = False,
verbose: bool = True) -> List[str]
Puts files found in a given filepath as objects to a bucket in AIS storage.
Arguments:
path
str - Local filepath, can be relative or absoluteprefix_filter
str, optional - Only put files with names starting with this prefixpattern
str, optional - Shell-style wildcard pattern to filter filesbasename
bool, optional - Whether to use the file names only as object names and omit the path informationprepend
str, optional - Optional string to use as a prefix in the object name for all objects uploaded No delimiter ("/", "-", etc.) is automatically applied between the prepend value and the object namerecursive
bool, optional - Whether to recurse through the provided path directoriesdry_run
bool, optional - Option to only show expected behavior without an actual put operationverbose
bool, optional - Whether to print upload info to standard output
Returns:
List of object names put to a bucket in AIS
Raises:
requests.RequestException
- "There was an ambiguous exception that occurred while handling..."requests.ConnectionError
- Connection errorrequests.ConnectionTimeout
- Timed out connecting to AIStorerequests.ReadTimeout
- Timed out waiting response from AIStoreValueError
- The path provided is not a valid directory
def object(obj_name: str, props: ObjectProps = None) -> Object
Factory constructor for an object in this bucket. Does not make any HTTP request, only instantiates an object in a bucket owned by the client.
Arguments:
obj_name
str - Name of objectsize
int, optional - Size of object in bytes
Returns:
The object created.
def objects(obj_names: List = None,
obj_range: ObjectRange = None,
obj_template: str = None) -> ObjectGroup
Factory constructor for multiple objects belonging to this bucket.
Arguments:
obj_names
list - Names of objects to include in the groupobj_range
ObjectRange - Range of objects to include in the groupobj_template
str - String template defining objects to include in the group
Returns:
The ObjectGroup created
def make_request(method: str,
action: str,
value: Dict = None,
params: Dict = None) -> requests.Response
Use the bucket's client to make a request to the bucket endpoint on the AIS server
Arguments:
method
str - HTTP method to use, e.g. POST/GET/DELETEaction
str - Action string used to create an ActionMsg to pass to the servervalue
dict - Additional value parameter to pass in the ActionMsgparams
dict, optional - Optional parameters to pass in the request
Returns:
Response from the server
def verify_cloud_bucket()
Verify the bucket provider is a cloud provider
def get_path() -> str
Get the path representation of this bucket
def as_model() -> BucketModel
Return a data-model of the bucket
Returns:
BucketModel representation
def write_dataset(config: DatasetConfig, skip_missing: bool = True, **kwargs)
Write a dataset to a bucket in AIS in webdataset format using wds.ShardWriter. Logs the missing attributes
Arguments:
config
DatasetConfig - Configuration dict specifying how to process and store each part of the dataset itemskip_missing
bool, optional - Skip samples that are missing one or more attributes, defaults to True**kwargs
optional - Optional keyword arguments to pass to the ShardWriter
class Client()
AIStore client for managing buckets, objects, ETL jobs
Arguments:
endpoint
str - AIStore endpointskip_verify
bool, optional - If True, skip SSL certificate verification. Defaults to False.ca_cert
str, optional - Path to a CA certificate file for SSL verification. If not provided, the 'AIS_CLIENT_CA' environment variable will be used. Defaults to None.timeout
Union[float, Tuple[float, float], None], optional - Request timeout in seconds; a single float for both connect/read timeouts (e.g., 5.0), a tuple for separate connect/read timeouts (e.g., (3.0, 10.0)), or None to disable timeout.retry
urllib3.Retry, optional - Retry configuration object from the urllib3 library.token
str, optional - Authorization token. If not provided, the 'AIS_AUTHN_TOKEN' environment variable will be used. Defaults to None.
def bucket(bck_name: str,
provider: str = PROVIDER_AIS,
namespace: Namespace = None)
Factory constructor for bucket object. Does not make any HTTP request, only instantiates a bucket object.
Arguments:
bck_name
str - Name of bucketprovider
str - Provider of bucket, one of "ais", "aws", "gcp", ... (optional, defaults to ais)namespace
Namespace - Namespace of bucket (optional, defaults to None)
Returns:
The bucket object created.
def cluster()
Factory constructor for cluster object. Does not make any HTTP request, only instantiates a cluster object.
Returns:
The cluster object created.
def job(job_id: str = "", job_kind: str = "")
Factory constructor for job object, which contains job-related functions. Does not make any HTTP request, only instantiates a job object.
Arguments:
job_id
str, optional - Optional ID for interacting with a specific jobjob_kind
str, optional - Optional specific type of job empty for all kinds
Returns:
The job object created.
def etl(etl_name: str)
Factory constructor for ETL object. Contains APIs related to AIStore ETL operations. Does not make any HTTP request, only instantiates an ETL object.
Arguments:
etl_name
str - Name of the ETL
Returns:
The ETL object created.
def dsort(dsort_id: str = "")
Factory constructor for dSort object. Contains APIs related to AIStore dSort operations. Does not make any HTTP request, only instantiates a dSort object.
Arguments:
dsort_id
- ID of the dSort job
Returns:
dSort object created
def fetch_object_by_url(url: str) -> Object
Retrieve an object based on its URL.
Arguments:
url
str - Full URL of the object (e.g., "ais://bucket1/file.txt")
Returns:
Object
- The object retrieved from the specified URL
class Cluster()
A class representing a cluster bound to an AIS client.
@property
def client()
Client this cluster uses to make requests
def get_info() -> Smap
Returns state of AIS cluster, including the detailed information about its nodes.
Returns:
aistore.sdk.types.Smap
- Smap containing cluster information
Raises:
requests.RequestException
- "There was an ambiguous exception that occurred while handling..."requests.ConnectionError
- Connection errorrequests.ConnectionTimeout
- Timed out connecting to AIStorerequests.ReadTimeout
- Timed out waiting response from AIStore
def get_primary_url() -> str
Returns: URL of primary proxy
def list_buckets(provider: str = PROVIDER_AIS)
Returns list of buckets in AIStore cluster.
Arguments:
provider
str, optional - Name of bucket provider, one of "ais", "aws", "gcp", "az" or "ht". Defaults to "ais". Empty provider returns buckets of all providers.
Returns:
List[BucketModel]
- A list of buckets
Raises:
requests.RequestException
- "There was an ambiguous exception that occurred while handling..."requests.ConnectionError
- Connection errorrequests.ConnectionTimeout
- Timed out connecting to AIStorerequests.ReadTimeout
- Timed out waiting response from AIStore
def list_jobs_status(job_kind="", target_id="") -> List[JobStatus]
List the status of jobs on the cluster
Arguments:
job_kind
str, optional - Only show jobs of a particular typetarget_id
str, optional - Limit to jobs on a specific target node
Returns:
List of JobStatus objects
def list_running_jobs(job_kind="", target_id="") -> List[str]
List the currently running jobs on the cluster
Arguments:
job_kind
str, optional - Only show jobs of a particular typetarget_id
str, optional - Limit to jobs on a specific target node
Returns:
List of jobs in the format job_kind[job_id]
def list_running_etls() -> List[ETLInfo]
Lists all running ETLs.
Note: Does not list ETLs that have been stopped or deleted.
Returns:
List[ETLInfo]
- A list of details on running ETLs
def is_ready() -> bool
Checks if cluster is ready or still setting up.
Returns:
bool
- True if cluster is ready, or false if cluster is still setting up
def get_performance(get_throughput: bool = True,
get_latency: bool = True,
get_counters: bool = True) -> ClusterPerformance
Retrieves and calculates the performance metrics for each target node in the AIStore cluster. It compiles throughput, latency, and various operational counters from each target node, providing a comprehensive view of the cluster's overall performance
Arguments:
get_throughput
bool, optional - get cluster throughputget_latency
bool, optional - get cluster latencyget_counters
bool, optional - get cluster counters
Returns:
ClusterPerformance
- An object encapsulating the detailed performance metrics of the cluster, including throughput, latency, and counters for each node
Raises:
requests.RequestException
- If there's an ambiguous exception while processing the requestrequests.ConnectionError
- If there's a connection error with the clusterrequests.ConnectionTimeout
- If the connection to the cluster times outrequests.ReadTimeout
- If the timeout is reached while awaiting a response from the cluster
def get_uuid() -> str
Returns: UUID of AIStore Cluster
class Job()
A class containing job-related functions.
Arguments:
client
RequestClient - Client for interfacing with AIS clusterjob_id
str, optional - ID of a specific job, empty for all jobsjob_kind
str, optional - Specific kind of job, empty for all kinds
@property
def job_id()
Return job id
@property
def job_kind()
Return job kind
def status() -> JobStatus
Return status of a job
Returns:
The job status including id, finish time, and error info.
Raises:
requests.RequestException
- "There was an ambiguous exception that occurred while handling..."requests.ConnectionError
- Connection errorrequests.ConnectionTimeout
- Timed out connecting to AIStorerequests.ReadTimeout
- Timed out waiting response from AIStore
def wait(timeout: int = DEFAULT_JOB_WAIT_TIMEOUT, verbose: bool = True)
Wait for a job to finish
Arguments:
timeout
int, optional - The maximum time to wait for the job, in seconds. Default timeout is 5 minutes.verbose
bool, optional - Whether to log wait status to standard output
Returns:
None
Raises:
requests.RequestException
- "There was an ambiguous exception that occurred while handling..."requests.ConnectionError
- Connection errorrequests.ConnectionTimeout
- Timed out connecting to AIStorerequests.ReadTimeout
- Timed out waiting response from AIStoreerrors.Timeout
- Timeout while waiting for the job to finish
def wait_for_idle(timeout: int = DEFAULT_JOB_WAIT_TIMEOUT,
verbose: bool = True)
Wait for a job to reach an idle state
Arguments:
timeout
int, optional - The maximum time to wait for the job, in seconds. Default timeout is 5 minutes.verbose
bool, optional - Whether to log wait status to standard output
Returns:
None
Raises:
requests.RequestException
- "There was an ambiguous exception that occurred while handling..."requests.ConnectionError
- Connection errorrequests.ConnectionTimeout
- Timed out connecting to AIStorerequests.ReadTimeout
- Timed out waiting response from AIStoreerrors.Timeout
- Timeout while waiting for the job to finisherrors.JobInfoNotFound
- Raised when information on a job's status could not be found on the AIS cluster
def wait_single_node(timeout: int = DEFAULT_JOB_WAIT_TIMEOUT,
verbose: bool = True)
Wait for a job running on a single node
Arguments:
timeout
int, optional - The maximum time to wait for the job, in seconds. Default timeout is 5 minutes.verbose
bool, optional - Whether to log wait status to standard output
Returns:
None
Raises:
requests.RequestException
- "There was an ambiguous exception that occurred while handling..."requests.ConnectionError
- Connection errorrequests.ConnectionTimeout
- Timed out connecting to AIStorerequests.ReadTimeout
- Timed out waiting response from AIStoreerrors.Timeout
- Timeout while waiting for the job to finisherrors.JobInfoNotFound
- Raised when information on a job's status could not be found on the AIS cluster
def start(daemon_id: str = "",
force: bool = False,
buckets: List[Bucket] = None) -> str
Start a job and return its ID.
Arguments:
daemon_id
str, optional - For running a job that must run on a specific target node (e.g. resilvering).force
bool, optional - Override existing restrictions for a bucket (e.g., run LRU eviction even if the bucket has LRU disabled).buckets
List[Bucket], optional - List of one or more buckets; applicable only for jobs that have bucket scope (for details on job types, seeTable
in xact/api.go).
Returns:
The running job ID.
Raises:
requests.RequestException
- "There was an ambiguous exception that occurred while handling..."requests.ConnectionError
- Connection errorrequests.ConnectionTimeout
- Timed out connecting to AIStorerequests.ReadTimeout
- Timed out waiting response from AIStore
def get_within_timeframe(start_time: datetime.datetime,
end_time: datetime.datetime) -> List[JobSnapshot]
Checks for jobs that started and finished within a specified timeframe.
Arguments:
start_time
datetime.datetime - The start of the timeframe for monitoring jobs.end_time
datetime.datetime - The end of the timeframe for monitoring jobs.
Returns:
List[JobSnapshot]
- A list of jobs that have finished within the specified timeframe.
Raises:
JobInfoNotFound
- Raised when information on a job's status could not be found.
class ObjectGroup(AISSource)
A class representing multiple objects within the same bucket. Only one of obj_names, obj_range, or obj_template should be provided.
Arguments:
bck
Bucket - Bucket the objects belong toobj_names
list[str], optional - List of object names to include in this collectionobj_range
ObjectRange, optional - Range defining which object names in the bucket should be includedobj_template
str, optional - String argument to pass as template value directly to api
@property
def client() -> RequestClient
The client bound to the bucket used by the ObjectGroup.
@client.setter
def client(client) -> RequestClient
Update the client bound to the bucket used by the ObjectGroup.
def list_urls(prefix: str = "", etl_name: str = None) -> Iterable[str]
Implementation of the abstract method from AISSource that provides an iterator of full URLs to every object in this bucket matching the specified prefix
Arguments:
prefix
str, optional - Limit objects selected by a given string prefixetl_name
str, optional - ETL to include in URLs
Returns:
Iterator of all object URLs in the group
def list_all_objects_iter(prefix: str = "",
props: str = "name,size") -> Iterable[Object]
Implementation of the abstract method from AISSource that provides an iterator of all the objects in this bucket matching the specified prefix.
Arguments:
prefix
str, optional - Limit objects selected by a given string prefixprops
str, optional - By default, will include all object properties. Pass in None to skip and avoid the extra API call.
Returns:
Iterator of all the objects in the group
def delete()
Deletes a list or range of objects in a bucket
Raises:
aistore.sdk.errors.AISError
- All other types of errors with AIStorerequests.ConnectionError
- Connection errorrequests.ConnectionTimeout
- Timed out connecting to AIStorerequests.exceptions.HTTPError
- Service unavailablerequests.RequestException
- "There was an ambiguous exception that occurred while handling..."requests.ReadTimeout
- Timed out receiving response from AIStore
Returns:
Job ID (as str) that can be used to check the status of the operation
def evict()
Evicts a list or range of objects in a bucket so that they are no longer cached in AIS NOTE: only Cloud buckets can be evicted.
Raises:
aistore.sdk.errors.AISError
- All other types of errors with AIStorerequests.ConnectionError
- Connection errorrequests.ConnectionTimeout
- Timed out connecting to AIStorerequests.exceptions.HTTPError
- Service unavailablerequests.RequestException
- "There was an ambiguous exception that occurred while handling..."requests.ReadTimeout
- Timed out receiving response from AIStore
Returns:
Job ID (as str) that can be used to check the status of the operation
def prefetch(blob_threshold: int = None,
num_workers: int = None,
latest: bool = False,
continue_on_error: bool = False)
Prefetches a list or range of objects in a bucket so that they are cached in AIS NOTE: only Cloud buckets can be prefetched.
Arguments:
latest
bool, optional - GET the latest object version from the associated remote bucketcontinue_on_error
bool, optional - Whether to continue if there is an error prefetching a single objectblob_threshold
int, optional - Utilize built-in blob-downloader for remote objects greater than the specified (threshold) size in bytesnum_workers
int, optional - Number of concurrent workers (readers). Defaults to the number of target mountpaths if omitted or zero. A value of -1 indicates no workers at all (i.e., single-threaded execution). Any positive value will be adjusted not to exceed the number of target CPUs.
Raises:
aistore.sdk.errors.AISError
- All other types of errors with AIStorerequests.ConnectionError
- Connection errorrequests.ConnectionTimeout
- Timed out connecting to AIStorerequests.exceptions.HTTPError
- Service unavailablerequests.RequestException
- "There was an ambiguous exception that occurred while handling..."requests.ReadTimeout
- Timed out receiving response from AIStore
Returns:
Job ID (as str) that can be used to check the status of the operation
def copy(to_bck: "Bucket",
prepend: str = "",
continue_on_error: bool = False,
dry_run: bool = False,
force: bool = False,
latest: bool = False,
sync: bool = False,
num_workers: int = None)
Copies a list or range of objects in a bucket
Arguments:
to_bck
Bucket - Destination bucketprepend
str, optional - Value to prepend to the name of copied objectscontinue_on_error
bool, optional - Whether to continue if there is an error copying a single objectdry_run
bool, optional - Skip performing the copy and just log the intended actionsforce
bool, optional - Force this job to run over others in case it conflicts (see "limited coexistence" and xact/xreg/xreg.go)latest
bool, optional - GET the latest object version from the associated remote bucketsync
bool, optional - synchronize destination bucket with its remote (e.g., Cloud or remote AIS) sourcenum_workers
int, optional - Number of concurrent workers (readers). Defaults to the number of target mountpaths if omitted or zero. A value of -1 indicates no workers at all (i.e., single-threaded execution). Any positive value will be adjusted not to exceed the number of target CPUs.
Raises:
aistore.sdk.errors.AISError
- All other types of errors with AIStorerequests.ConnectionError
- Connection errorrequests.ConnectionTimeout
- Timed out connecting to AIStorerequests.exceptions.HTTPError
- Service unavailablerequests.RequestException
- "There was an ambiguous exception that occurred while handling..."requests.ReadTimeout
- Timed out receiving response from AIStore
Returns:
Job ID (as str) that can be used to check the status of the operation
def transform(to_bck: "Bucket",
etl_name: str,
timeout: str = DEFAULT_ETL_TIMEOUT,
prepend: str = "",
continue_on_error: bool = False,
dry_run: bool = False,
force: bool = False,
latest: bool = False,
sync: bool = False,
num_workers: int = None)
Performs ETL operation on a list or range of objects in a bucket, placing the results in the destination bucket
Arguments:
to_bck
Bucket - Destination bucketetl_name
str - Name of existing ETL to applytimeout
str - Timeout of the ETL job (e.g. 5m for 5 minutes)prepend
str, optional - Value to prepend to the name of resulting transformed objectscontinue_on_error
bool, optional - Whether to continue if there is an error transforming a single objectdry_run
bool, optional - Skip performing the transform and just log the intended actionsforce
bool, optional - Force this job to run over others in case it conflicts (see "limited coexistence" and xact/xreg/xreg.go)latest
bool, optional - GET the latest object version from the associated remote bucketsync
bool, optional - synchronize destination bucket with its remote (e.g., Cloud or remote AIS) sourcenum_workers
int, optional - Number of concurrent workers (readers). Defaults to the number of target mountpaths if omitted or zero. A value of -1 indicates no workers at all (i.e., single-threaded execution). Any positive value will be adjusted not to exceed the number of target CPUs.
Raises:
aistore.sdk.errors.AISError
- All other types of errors with AIStorerequests.ConnectionError
- Connection errorrequests.ConnectionTimeout
- Timed out connecting to AIStorerequests.exceptions.HTTPError
- Service unavailablerequests.RequestException
- "There was an ambiguous exception that occurred while handling..."requests.ReadTimeout
- Timed out receiving response from AIStore
Returns:
Job ID (as str) that can be used to check the status of the operation
def archive(archive_name: str,
mime: str = "",
to_bck: "Bucket" = None,
include_source_name: bool = False,
allow_append: bool = False,
continue_on_err: bool = False)
Create or append to an archive
Arguments:
archive_name
str - Name of archive to create or appendmime
str, optional - MIME type of the contentto_bck
Bucket, optional - Destination bucket, defaults to current bucketinclude_source_name
bool, optional - Include the source bucket name in the archived objects' namesallow_append
bool, optional - Allow appending to an existing archivecontinue_on_err
bool, optional - Whether to continue if there is an error archiving a single object
Returns:
Job ID (as str) that can be used to check the status of the operation
def list_names() -> List[str]
List all the object names included in this group of objects
Returns:
List of object names
class ObjectNames(ObjectCollection)
A collection of object names, provided as a list of strings
Arguments:
names
List[str] - A list of object names
class ObjectRange(ObjectCollection)
Class representing a range of object names
Arguments:
prefix
str - Prefix contained in all names of objectsmin_index
int - Starting index in the name of objectsmax_index
int - Last index in the name of all objectspad_width
int, optional - Left-pad indices with zeros up to the width provided, e.g. pad_width = 3 will transform 1 to 001step
int, optional - Size of iterator steps between each itemsuffix
str, optional - Suffix at the end of all object names
@classmethod
def from_string(cls, range_string: str)
Construct an ObjectRange instance from a valid range string like 'input-{00..99..1}.txt'
Arguments:
range_string
str - The range string to parse
Returns:
ObjectRange
- An instance of the ObjectRange class
class ObjectTemplate(ObjectCollection)
A collection of object names specified by a template in the bash brace expansion format
Arguments:
template
str - A string template that defines the names of objects to include in the collection
class Object()
A class representing an object of a bucket bound to a client.
Arguments:
bucket
Bucket - Bucket to which this object belongsname
str - name of objectsize
int, optional - size of object in bytesprops
ObjectProps, optional - Properties of object
@property
def bucket()
Bucket containing this object.
@property
def name() -> str
Name of this object.
@property
def props() -> ObjectProps
Properties of this object.
def head() -> Header
Requests object properties and returns headers. Updates props.
Returns:
Response header with the object properties.
Raises:
requests.RequestException
- "There was an ambiguous exception that occurred while handling..."requests.ConnectionError
- Connection errorrequests.ConnectionTimeout
- Timed out connecting to AIStorerequests.ReadTimeout
- Timed out waiting response from AIStorerequests.exceptions.HTTPError(404)
- The object does not exist
def get(archive_config: ArchiveConfig = None,
blob_download_config: BlobDownloadConfig = None,
chunk_size: int = DEFAULT_CHUNK_SIZE,
etl_name: str = None,
writer: BufferedWriter = None,
latest: bool = False,
byte_range: str = None) -> ObjectReader
Creates and returns an ObjectReader with access to object contents and optionally writes to a provided writer.
Arguments:
archive_config
ArchiveConfig, optional - Settings for archive extractionblob_download_config
BlobDownloadConfig, optional - Settings for using blob downloadchunk_size
int, optional - chunk_size to use while reading from streametl_name
str, optional - Transforms an object based on ETL with etl_namewriter
BufferedWriter, optional - User-provided writer for writing content output User is responsible for closing the writerlatest
bool, optional - GET the latest object version from the associated remote bucketbyte_range
str, optional - Specify a specific data segment of the object for transfer, including both the start and end of the range (e.g. "bytes=0-499" to request the first 500 bytes)
Returns:
An ObjectReader which can be iterated over to stream chunks of object content or used to read all content directly.
Raises:
requests.RequestException
- "There was an ambiguous exception that occurred while handling..."requests.ConnectionError
- Connection errorrequests.ConnectionTimeout
- Timed out connecting to AIStorerequests.ReadTimeout
- Timed out waiting response from AIStore
def get_semantic_url() -> str
Get the semantic URL to the object
Returns:
Semantic URL to get object
def get_url(archpath: str = "", etl_name: str = None) -> str
Get the full url to the object including base url and any query parameters
Arguments:
archpath
str, optional - If the object is an archive, usearchpath
to extract a single file from the archiveetl_name
str, optional - Transforms an object based on ETL with etl_name
Returns:
Full URL to get object
def put_content(content: bytes) -> Response
Puts bytes as an object to a bucket in AIS storage.
Arguments:
content
bytes - Bytes to put as an object.
Raises:
requests.RequestException
- "There was an ambiguous exception that occurred while handling..."requests.ConnectionError
- Connection errorrequests.ConnectionTimeout
- Timed out connecting to AIStorerequests.ReadTimeout
- Timed out waiting response from AIStore
def put_file(path: str = None) -> Response
Puts a local file as an object to a bucket in AIS storage.
Arguments:
path
str - Path to local file
Raises:
requests.RequestException
- "There was an ambiguous exception that occurred while handling..."requests.ConnectionError
- Connection errorrequests.ConnectionTimeout
- Timed out connecting to AIStorerequests.ReadTimeout
- Timed out waiting response from AIStoreValueError
- The path provided is not a valid file
def promote(path: str,
target_id: str = "",
recursive: bool = False,
overwrite_dest: bool = False,
delete_source: bool = False,
src_not_file_share: bool = False) -> str
Promotes a file or folder an AIS target can access to a bucket in AIS storage. These files can be either on the physical disk of an AIS target itself or on a network file system the cluster can access. See more info here: https://aiatscale.org/blog/2022/03/17/promote
Arguments:
path
str - Path to file or folder the AIS cluster can reachtarget_id
str, optional - Promote files from a specific target noderecursive
bool, optional - Recursively promote objects from files in directories inside the pathoverwrite_dest
bool, optional - Overwrite objects already on AISdelete_source
bool, optional - Delete the source files when done promotingsrc_not_file_share
bool, optional - Optimize if the source is guaranteed to not be on a file share
Returns:
Job ID (as str) that can be used to check the status of the operation, or empty if job is done synchronously
Raises:
requests.RequestException
- "There was an ambiguous exception that occurred while handling..."requests.ConnectionError
- Connection errorrequests.ConnectionTimeout
- Timed out connecting to AIStorerequests.ReadTimeout
- Timed out waiting response from AIStoreAISError
- Path does not exist on the AIS cluster storage
def delete() -> Response
Delete an object from a bucket.
Returns:
None
Raises:
requests.RequestException
- "There was an ambiguous exception that occurred while handling..."requests.ConnectionError
- Connection errorrequests.ConnectionTimeout
- Timed out connecting to AIStorerequests.ReadTimeout
- Timed out waiting response from AIStorerequests.exceptions.HTTPError(404)
- The object does not exist
def blob_download(chunk_size: int = None,
num_workers: int = None,
latest: bool = False) -> str
A special facility to download very large remote objects a.k.a. BLOBs Returns job ID that for the blob download operation.
Arguments:
chunk_size
int - chunk size in bytesnum_workers
int - number of concurrent blob-downloading workers (readers)latest
bool - GET the latest object version from the associated remote bucket
Returns:
Job ID (as str) that can be used to check the status of the operation
Raises:
aistore.sdk.errors.AISError
- All other types of errors with AIStorerequests.ConnectionError
- Connection errorrequests.ConnectionTimeout
- Timed out connecting to AIStorerequests.exceptions.HTTPError
- Service unavailablerequests.RequestException
- "There was an ambiguous exception that occurred while handling..."
def append_content(content: bytes,
handle: str = "",
flush: bool = False) -> str
Append bytes as an object to a bucket in AIS storage.
Arguments:
content
bytes - Bytes to append to the object.handle
str - Handle string to use for subsequent appends or flush (empty for the first append).flush
bool - Whether to flush and finalize the append operation, making the object accessible.
Returns:
handle
str - Handle string to pass for subsequent appends or flush.
Raises:
requests.RequestException
- "There was an ambiguous exception that occurred while handling..."requests.ConnectionError
- Connection errorrequests.ConnectionTimeout
- Timed out connecting to AIStorerequests.ReadTimeout
- Timed out waiting response from AIStorerequests.exceptions.HTTPError(404)
- The object does not exist
def set_custom_props(custom_metadata: Dict[str, str],
replace_existing: bool = False) -> Response
Set custom properties for the object.
Arguments:
custom_metadata
Dict[str, str] - Custom metadata key-value pairs.replace_existing
bool, optional - Whether to replace existing metadata. Defaults to False.
class ObjectReader()
Provide a way to read an object's contents and attributes, optionally iterating over a stream of content.
Arguments:
object_client
ObjectClient - Client for making requests to a specific object in AISchunk_size
int, optional - Size of each data chunk to be fetched from the stream. Defaults to DEFAULT_CHUNK_SIZE.
def head() -> ObjectAttributes
Make a head request to AIS to update and return only object attributes.
Returns:
ObjectAttributes
containing metadata for this object.
@property
def attributes() -> ObjectAttributes
Object metadata attributes.
Returns:
ObjectAttributes
- Parsed object attributes from the headers returned by AIS.
def read_all() -> bytes
Read all byte data directly from the object response without using a stream.
This requires all object content to fit in memory at once and downloads all content before returning.
Returns:
bytes
- Object content as bytes.
def raw() -> requests.Response
Return the raw byte stream of object content.
Returns:
requests.Response
- Raw byte stream of the object content.
def as_file(max_resume: Optional[int] = 5) -> ObjectFile
Create an ObjectFile
for reading object data in chunks. ObjectFile
supports
resuming and retrying from the last known position in the case the object stream
is prematurely closed due to an unexpected error.
Arguments:
max_resume
int, optional - Maximum number of resume attempts in case of streaming failure. Defaults to 5.
Returns:
ObjectFile
- A file-like object that can be used to read the object content.
Raises:
requests.RequestException
- An ambiguous exception occurred while handling the request.requests.ConnectionError
- A connection error occurred.requests.ConnectionTimeout
- The connection to AIStore timed out.requests.ReadTimeout
- Waiting for a response from AIStore timed out.requests.exceptions.HTTPError(404)
- The object does not exist.
def iter_from_position(start_position: int = 0) -> Iterator[bytes]
Make a request to get a stream from the provided object starting at a specific byte position and yield chunks of the stream content.
Arguments:
start_position
int, optional - The byte position to start reading from. Defaults to 0.
Returns:
Iterator[bytes]
- An iterator over each chunk of bytes in the object starting from the specific position.
def __iter__() -> Iterator[bytes]
Make a request to get a stream from the provided object and yield chunks of the stream content.
Returns:
Iterator[bytes]
- An iterator over each chunk of bytes in the object.
class SimpleBuffer()
A buffer for efficiently handling streamed data with position tracking.
It stores incoming chunks of data in a bytearray and tracks the current read position. Once data is read, it is discarded from the buffer to free memory, ensuring efficient usage.
def __len__()
Return the number of unread bytes in the buffer.
Returns:
int
- The number of unread bytes remaining in the buffer.
def read(size: int = -1) -> bytes
Read bytes from the buffer and advance the read position.
Arguments:
size
int, optional - Number of bytes to read from the buffer. If -1, reads all remaining bytes.
Returns:
bytes
- The data read from the buffer.
def fill(source: Iterator[bytes], size: int = -1)
Fill the buffer with data from the source, up to the specified size.
Arguments:
source
Iterator[bytes] - The data source (chunks).size
int, optional - The target size to fill the buffer up to. Default is -1 for unlimited.
Returns:
int
- Number of bytes in the buffer.
def empty()
Empty the buffer.
class ObjectFile(BufferedIOBase)
A file-like object for reading object data, with support for both reading a fixed size of data and reading until the end of the stream (EOF). It provides the ability to resume and continue reading from the last known position in the event of a ChunkedEncodingError.
Data is fetched in chunks via the object reader iterator and temporarily stored in an internal
buffer. The buffer is filled either to the required size or until EOF is reached. If a
ChunkedEncodingError
occurs during this process, ObjectFile catches and automatically attempts
to resume the buffer filling process from the last known chunk position. The number of resume
attempts is tracked across the entire object file, and if the total number of attempts exceeds
the configurable max_resume
, a ChunkedEncodingError
is raised.
Once the buffer is adequately filled, the read()
method reads and returns the requested amount
of data from the buffer.
Arguments:
content_iterator
ContentIterator - An iterator that can fetch object data from AIS in chunks.max_resume
int - Maximum number of retry attempts in case of a streaming failure.
def close() -> None
Close the file and release resources.
Raises:
ValueError
- I/O operation on closed file.
def tell() -> int
Return the current file position.
Returns:
The current file position.
Raises:
ValueError
- I/O operation on closed file.
def readable() -> bool
Return whether the file is readable.
Returns:
True if the file is readable, False otherwise.
Raises:
ValueError
- I/O operation on closed file.
def seekable() -> bool
Return whether the file supports seeking.
Returns:
False since the file does not support seeking.
def read(size=-1)
Read bytes from the object, handling retries in case of stream errors.
Arguments:
size
int, optional - Number of bytes to read. If -1, reads until the end of the stream.
Returns:
bytes
- The data read from the object.
class ObjectProps(ObjectAttributes)
Represents the attributes parsed from the response headers returned from an API call to get an object. Extends ObjectAtributes and is a superset of that class.
Arguments:
response_headers
CaseInsensitiveDict, optional - Response header dict containing object attributes
@property
def bucket_name()
Name of object's bucket
@property
def bucket_provider()
Provider of object's bucket.
@property
def name() -> str
Name of the object.
@property
def location() -> str
Location of the object.
@property
def mirror_paths() -> List[str]
List of mirror paths.
@property
def mirror_copies() -> int
Number of mirror copies.
@property
def present() -> bool
True if object is present in cluster.
class ObjectAttributes()
Represents the attributes parsed from the response headers returned from an API call to get an object.
Arguments:
response_headers
CaseInsensitiveDict - Response header dict containing object attributes
@property
def size() -> int
Size of object content.
@property
def checksum_type() -> str
Type of checksum, e.g. xxhash or md5.
@property
def checksum_value() -> str
Checksum value.
@property
def access_time() -> str
Time this object was accessed.
@property
def obj_version() -> str
Object version.
@property
def custom_metadata() -> Dict[str, str]
Dictionary of custom metadata.