-
Notifications
You must be signed in to change notification settings - Fork 413
Open
Labels
component/storagenextgenIndicates that the Issue or PR belongs to the nextgen kernel architecture.Indicates that the Issue or PR belongs to the nextgen kernel architecture.type/feature-requestCategorizes issue or PR as related to a new feature.Categorizes issue or PR as related to a new feature.
Description
Feature Request
Is your feature request related to a problem? Please describe:
Describe the feature you'd like:
A part of #10205
Support deploying TiFlash disagg arch on Azure. And read/write workload with tiflash replica works well. TiFlash need to support accessing to the Azure Blob Storage using the Managed identities
env variables for tikv-next-gen connecting to azure
DFS_PREFIX: cse
DFS_S3_BUCKET: tidbcloud123abc/321efg-westus3-data
DFS_S3_REGION: westus3
DFS_S3_ENDPOINT: tidbcloud123abc.blob.core.windows.net
DFS_BACKEND: azure
AZURE_CLIENT_ID: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
DFS_AZURE_ACCOUNT_NAME: tidbcloud123abc
DFS_AZURE_CONTAINER: 321efg-westus3-data
Sub Tasks:
- Support accessing to Azure blob storage for tiflash-proxy-next-gen
- Goal: Let tiflash-proxy-next-gen support fetching the SSTs and other files generated by tikv that stored on Azure blob storage
- Try to port the tikv-next-gen changes for supporting Azure blob storage for tiflash-proxy-next-gen: https://github.com/tidbcloud/cloud-storage-engine/pull/4285
- Support accessing to Azure blob storage for tiflash disaggregated arch
- Goal: TiFlash currently heavily depends on the aws-sdk-cpp access to object storage. However, Azure blob storage does not provide S3 compatible API. So we need to add a abstraction above how TiFlash manipulating the objects on object storage service like AWS S3, Azure Blob Storage (and maybe Google Cloud Storage in the future). And then implement the logic of manipulating the objects on Azure Blob Storage.
- Add configurations for Azure Blob Storage. We need to support the configurations of
connection_string,account_name,account_key,container,endpoint. For support accessing to the Azure Blob Storage using the "Managed identities", we likely need to support loading envAZURE_CLIENT_ID,AZURE_TENANT_ID,AZURE_CLIENT_SECRET,AZURE_STORAGE_ACCOUNT,AZURE_STORAGE_KEYas credential. - Add an abstraction layer of manipulating the objects on object storage services. Currently most API for accessing objects on object storage service are defined under
dbms/src/Storages/S3/S3Common.handdbms/src/Storages/S3/S3RandomAccessFile.h. - Implement the logic of manipulating the objects on Azure Blob Storage.
- Consider use the azure-sdk-for-cpp or the cpp-bindings of OpenDAL
- With OpenDAL, TiFlash could adapt to the Google Cloud Storage with less changes if needed in the future.
- In order to keep the current logic of accessing to AWS S3 stable, we'd better keep using aws-sdk-cpp for accessing to AWS S3 even if we use OpenDAL for Azure Blob Storage in the first release version.
- Note that to overcome the inefficiency of AWS SDK C++'s default use of
curlfor HTTP access, TiFlash portsdbms/src/Storages/S3/PocoHTTPClient.hand replaces the curl HTTP client in AWS SDK C++ with the Poco network library. This should not be needed in the AWS Blob Storage SDK. But we should consider porting the necessary logging and HTTP metrics that added inPocoHttpClient
- Consider use the azure-sdk-for-cpp or the cpp-bindings of OpenDAL
Describe alternatives you've considered:
Teachability, Documentation, Adoption, Migration Strategy:
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
component/storagenextgenIndicates that the Issue or PR belongs to the nextgen kernel architecture.Indicates that the Issue or PR belongs to the nextgen kernel architecture.type/feature-requestCategorizes issue or PR as related to a new feature.Categorizes issue or PR as related to a new feature.