Skip to content

[next gen] Support deploy on Azure #10712

@JaySon-Huang

Description

@JaySon-Huang

Feature Request

Is your feature request related to a problem? Please describe:

Describe the feature you'd like:

A part of #10205

Support deploying TiFlash disagg arch on Azure. And read/write workload with tiflash replica works well. TiFlash need to support accessing to the Azure Blob Storage using the Managed identities

env variables for tikv-next-gen connecting to azure

DFS_PREFIX:              cse
DFS_S3_BUCKET:           tidbcloud123abc/321efg-westus3-data
DFS_S3_REGION:           westus3
DFS_S3_ENDPOINT:         tidbcloud123abc.blob.core.windows.net
DFS_BACKEND:             azure
AZURE_CLIENT_ID:         xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
DFS_AZURE_ACCOUNT_NAME:  tidbcloud123abc
DFS_AZURE_CONTAINER:     321efg-westus3-data

Sub Tasks:

  • Support accessing to Azure blob storage for tiflash-proxy-next-gen
  • Support accessing to Azure blob storage for tiflash disaggregated arch
    • Goal: TiFlash currently heavily depends on the aws-sdk-cpp access to object storage. However, Azure blob storage does not provide S3 compatible API. So we need to add a abstraction above how TiFlash manipulating the objects on object storage service like AWS S3, Azure Blob Storage (and maybe Google Cloud Storage in the future). And then implement the logic of manipulating the objects on Azure Blob Storage.
    • Add configurations for Azure Blob Storage. We need to support the configurations of connection_string, account_name, account_key, container, endpoint. For support accessing to the Azure Blob Storage using the "Managed identities", we likely need to support loading env AZURE_CLIENT_ID, AZURE_TENANT_ID, AZURE_CLIENT_SECRET, AZURE_STORAGE_ACCOUNT, AZURE_STORAGE_KEY as credential.
    • Add an abstraction layer of manipulating the objects on object storage services. Currently most API for accessing objects on object storage service are defined under dbms/src/Storages/S3/S3Common.h and dbms/src/Storages/S3/S3RandomAccessFile.h.
    • Implement the logic of manipulating the objects on Azure Blob Storage.
      • Consider use the azure-sdk-for-cpp or the cpp-bindings of OpenDAL
        • With OpenDAL, TiFlash could adapt to the Google Cloud Storage with less changes if needed in the future.
        • In order to keep the current logic of accessing to AWS S3 stable, we'd better keep using aws-sdk-cpp for accessing to AWS S3 even if we use OpenDAL for Azure Blob Storage in the first release version.
        • Note that to overcome the inefficiency of AWS SDK C++'s default use of curl for HTTP access, TiFlash ports dbms/src/Storages/S3/PocoHTTPClient.h and replaces the curl HTTP client in AWS SDK C++ with the Poco network library. This should not be needed in the AWS Blob Storage SDK. But we should consider porting the necessary logging and HTTP metrics that added in PocoHttpClient

Describe alternatives you've considered:

Teachability, Documentation, Adoption, Migration Strategy:

Metadata

Metadata

Assignees

No one assigned

    Labels

    component/storagenextgenIndicates that the Issue or PR belongs to the nextgen kernel architecture.type/feature-requestCategorizes issue or PR as related to a new feature.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions