Skip to content

Commit

Permalink
Deployed 6e44998 with MkDocs version: 1.6.0
Browse files Browse the repository at this point in the history
  • Loading branch information
windsonsea committed Nov 15, 2024
1 parent 8f6fa39 commit 89a7dd8
Show file tree
Hide file tree
Showing 52 changed files with 103 additions and 104 deletions.
2 changes: 1 addition & 1 deletion admin/kpanda/backup/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -532,7 +532,7 @@ <h1 id="_1">备份恢复<a class="headerlink" href="#_1" title="Permanent link">
<li>
<p>应用备份</p>
<p>应用备份指,备份集群中的某个工作负载的数据,然后将该工作负载的数据恢复到本集群或者其他集群。支持备份整个命名空间下的所有资源,也支持通过标签选择器过滤,仅备份带有特定标签的资源。</p>
<p>应用备份支持跨集群备份有状态应用,具体步骤可参考<a href="../../best-practice/backup-mysql-on-nfs.md">MySQL 应用及数据的跨集群备份恢复</a></p>
<p>应用备份支持跨集群备份有状态应用,具体步骤可参考<a href="../best-practice/backup-mysql-on-nfs.html">MySQL 应用及数据的跨集群备份恢复</a></p>
</li>
<li>
<p>ETCD 备份</p>
Expand Down
2 changes: 1 addition & 1 deletion en/admin/ghippo/report-billing/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -483,7 +483,7 @@ <h1 id="operation-management">Operation Management<a class="headerlink" href="#o
<p>You need to <a href="./gmagpie-offline-install.md">install or upgrade the Operations Management module</a> first, and then you can experience report management and billing metering.</p>
<h2 id="report-management">Report Management<a class="headerlink" href="#report-management" title="Permanent link"></a></h2>
<p>Report Management provides data statistics for cluster, node, pods, workspace, and namespace across
five dimensions: CPU Utilization, Memory Utilization, Storage Utilization, GPU Computing Power Utilization,
five dimensions: CPU Utilization, Memory Utilization, Storage Utilization, GPU Utilization,
and GPU Memory Utilization. It also integrates with the audit and alert modules to support the statistical
management of audit and alert data, supporting a total of seven types of reports.</p>
<h2 id="accounting-billing">Accounting &amp; Billing<a class="headerlink" href="#accounting-billing" title="Permanent link"></a></h2>
Expand Down
2 changes: 1 addition & 1 deletion en/admin/kpanda/gpu/Iluvatar_usage.html
Original file line number Diff line number Diff line change
Expand Up @@ -646,7 +646,7 @@ <h2 id="procedure">Procedure<a class="headerlink" href="#procedure" title="Perma
<h3 id="configuration-via-user-interface">Configuration via User Interface<a class="headerlink" href="#configuration-via-user-interface" title="Permanent link"></a></h3>
<ol>
<li>
<p>Check if the GPU card in the cluster has been detected. Click <strong>Clusters</strong> -&gt; <strong>Cluster Settings</strong> -&gt; <strong>Addon Plugins</strong> , and check if the proper GPU type has been automatically enabled and detected.
<p>Check if the GPU in the cluster has been detected. Click <strong>Clusters</strong> -&gt; <strong>Cluster Settings</strong> -&gt; <strong>Addon Plugins</strong> , and check if the proper GPU type has been automatically enabled and detected.
Currently, the cluster will automatically enable <strong>GPU</strong> and set the GPU type as <strong>Iluvatar</strong> .</p>
</li>
<li>
Expand Down
2 changes: 1 addition & 1 deletion en/admin/kpanda/gpu/ascend/Ascend_usage.html
Original file line number Diff line number Diff line change
Expand Up @@ -422,7 +422,7 @@ <h2 id="quick-start">Quick Start<a class="headerlink" href="#quick-start" title=
<h2 id="ui-usage">UI Usage<a class="headerlink" href="#ui-usage" title="Permanent link"></a></h2>
<ol>
<li>
<p>Confirm whether the cluster has detected the GPU card. Click <strong>Clusters</strong> -&gt; <strong>Cluster Settings</strong> -&gt; <strong>Addon Plugins</strong> ,
<p>Confirm whether the cluster has detected the GPU. Click <strong>Clusters</strong> -&gt; <strong>Cluster Settings</strong> -&gt; <strong>Addon Plugins</strong> ,
and check whether the proper GPU type is automatically enabled and detected.
Currently, the cluster will automatically enable <strong>GPU</strong> and set the <strong>GPU</strong> type to <strong>Ascend</strong> .</p>
<p><img alt="Cluster Settings" src="https://docs.daocloud.io/daocloud-docs-images/docs/zh/docs/kpanda/gpu/images/cluster-setting-ascend-gpu.jpg"/></p>
Expand Down
2 changes: 1 addition & 1 deletion en/admin/kpanda/gpu/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -591,7 +591,7 @@ <h2 id="introduction-to-gpu-capabilities">Introduction to GPU Capabilities<a cla
<li>Compatibility with various training frameworks such as TensorFlow and PyTorch.</li>
</ul>
<h2 id="introduction-to-gpu-operator">Introduction to GPU Operator<a class="headerlink" href="#introduction-to-gpu-operator" title="Permanent link"></a></h2>
<p>Similar to regular computer hardware, NVIDIA GPUs, as physical devices, need to have the NVIDIA GPU driver installed in order to be used. To reduce the cost of using GPUs on Kubernetes, NVIDIA provides the NVIDIA GPU Operator component to manage various components required for using NVIDIA GPUs. These components include the NVIDIA driver (for enabling CUDA), NVIDIA container runtime, GPU node labeling, DCGM-based monitoring, and more. In theory, users only need to plug the GPU card into a compute device managed by Kubernetes, and they can use all the capabilities of NVIDIA GPUs through the GPU Operator. For more information about NVIDIA GPU Operator, refer to the <a href="https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/index.html">NVIDIA official documentation</a>. For deployment instructions, refer to <a href="nvidia/install_nvidia_driver_of_operator.html">Offline Installation of GPU Operator</a>.</p>
<p>Similar to regular computer hardware, NVIDIA GPUs, as physical devices, need to have the NVIDIA GPU driver installed in order to be used. To reduce the cost of using GPUs on Kubernetes, NVIDIA provides the NVIDIA GPU Operator component to manage various components required for using NVIDIA GPUs. These components include the NVIDIA driver (for enabling CUDA), NVIDIA container runtime, GPU node labeling, DCGM-based monitoring, and more. In theory, users only need to plug the GPU into a compute device managed by Kubernetes, and they can use all the capabilities of NVIDIA GPUs through the GPU Operator. For more information about NVIDIA GPU Operator, refer to the <a href="https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/index.html">NVIDIA official documentation</a>. For deployment instructions, refer to <a href="nvidia/install_nvidia_driver_of_operator.html">Offline Installation of GPU Operator</a>.</p>
<p>Architecture diagram of NVIDIA GPU Operator:</p>
</article>
</div>
Expand Down
6 changes: 3 additions & 3 deletions en/admin/kpanda/gpu/metax/usemetax.html
Original file line number Diff line number Diff line change
Expand Up @@ -662,7 +662,7 @@
<div class="md-content" data-md-component="content">
<article class="md-content__inner md-typeset">
<h1 id="metax-gpu-component-installation-and-usage">MetaX GPU Component Installation and Usage<a class="headerlink" href="#metax-gpu-component-installation-and-usage" title="Permanent link"></a></h1>
<p>This chapter provides installation guidance for MetaX's gpu-extensions, gpu-operator, and other components, as well as usage methods for both the full GPU card and vGPU modes.</p>
<p>This chapter provides installation guidance for MetaX's gpu-extensions, gpu-operator, and other components, as well as usage methods for both the full GPU and vGPU modes.</p>
<h2 id="prerequisites">Prerequisites<a class="headerlink" href="#prerequisites" title="Permanent link"></a></h2>
<ol>
<li>The required tar package has been downloaded and installed from the <a href="https://sw-download.metax-tech.com/software-list">MetaX Software Center</a>. This article uses metax-gpu-k8s-package.0.7.10.tar.gz as an example.</li>
Expand All @@ -671,8 +671,8 @@ <h2 id="prerequisites">Prerequisites<a class="headerlink" href="#prerequisites"
<h2 id="component-introduction">Component Introduction<a class="headerlink" href="#component-introduction" title="Permanent link"></a></h2>
<p>Metax provides two helm-chart packages: metax-extensions and gpu-operator. Depending on the usage scenario, different components can be selected for installation.</p>
<ol>
<li>Metax-extensions: Includes two components, gpu-device and gpu-label. When using the Metax-extensions solution, the user's application container image needs to be built based on the MXMACA® base image. Moreover, Metax-extensions is only suitable for scenarios using the full GPU card.</li>
<li>gpu-operator: Includes components such as gpu-device, gpu-label, driver-manager, container-runtime, and operator-controller. When using the gpu-operator solution, users can choose to create application container images that do not include the MXMACA® SDK. The gpu-operator is suitable for both full GPU card and vGPU scenarios.</li>
<li>Metax-extensions: Includes two components, gpu-device and gpu-label. When using the Metax-extensions solution, the user's application container image needs to be built based on the MXMACA® base image. Moreover, Metax-extensions is only suitable for scenarios using the full GPU.</li>
<li>gpu-operator: Includes components such as gpu-device, gpu-label, driver-manager, container-runtime, and operator-controller. When using the gpu-operator solution, users can choose to create application container images that do not include the MXMACA® SDK. The gpu-operator is suitable for both full GPU and vGPU scenarios.</li>
</ol>
<h2 id="operation-steps">Operation Steps<a class="headerlink" href="#operation-steps" title="Permanent link"></a></h2>
<ol>
Expand Down
16 changes: 8 additions & 8 deletions en/admin/kpanda/gpu/mlu/use-mlu.html
Original file line number Diff line number Diff line change
Expand Up @@ -484,9 +484,9 @@
</a>
</li>
<li class="md-nav__item">
<a class="md-nav__link" href="#using-cambricon-in-suanfeng-ai-computing-platform">
<a class="md-nav__link" href="#using-cambricon-in-suanova-ai-computing-platform">
<span class="md-ellipsis">
Using Cambricon in SuanFeng AI Computing Platform
Using Cambricon in Suanova AI Computing Platform
</span>
</a>
</li>
Expand Down Expand Up @@ -605,9 +605,9 @@
</a>
</li>
<li class="md-nav__item">
<a class="md-nav__link" href="#using-cambricon-in-suanfeng-ai-computing-platform">
<a class="md-nav__link" href="#using-cambricon-in-suanova-ai-computing-platform">
<span class="md-ellipsis">
Using Cambricon in SuanFeng AI Computing Platform
Using Cambricon in Suanova AI Computing Platform
</span>
</a>
</li>
Expand All @@ -626,18 +626,18 @@
<div class="md-content" data-md-component="content">
<article class="md-content__inner md-typeset">
<h1 id="using-cambricon-gpu">Using Cambricon GPU<a class="headerlink" href="#using-cambricon-gpu" title="Permanent link"></a></h1>
<p>This article introduces how to use Cambricon GPU in the SuanFeng AI computing platform.</p>
<p>This article introduces how to use Cambricon GPU in the Suanova AI computing platform.</p>
<h2 id="prerequisites">Prerequisites<a class="headerlink" href="#prerequisites" title="Permanent link"></a></h2>
<ul>
<li>The SuanFeng AI computing platform's container management platform has been deployed and is running normally.</li>
<li>The Suanova AI computing platform's container management platform has been deployed and is running normally.</li>
<li>The container management module has either <a href="../../clusters/integrate-cluster.html">integrated with a Kubernetes cluster</a> or <a href="../../clusters/create-cluster.html">created a Kubernetes cluster</a>, and is able to access the cluster's UI interface.</li>
<li>The current cluster has installed the Cambricon firmware, drivers, and DevicePlugin components. For installation details, please refer to the official documentation:<ul>
<li><a href="https://www.cambricon.com/docs/sdk_1.15.0/driver_5.10.22/user_guide/index.html">Driver Firmware Installation</a></li>
<li><a href="https://github.com/Cambricon/cambricon-k8s-device-plugin/blob/master/device-plugin/README.md">DevicePlugin Installation</a></li>
</ul>
</li>
</ul>
<p>When installing DevicePlugin, please disable the <strong>--enable-device-type</strong> parameter; otherwise, the SuanFeng AI computing platform will not be able to correctly recognize the Cambricon GPU.</p>
<p>When installing DevicePlugin, please disable the <strong>--enable-device-type</strong> parameter; otherwise, the Suanova AI computing platform will not be able to correctly recognize the Cambricon GPU.</p>
<h2 id="introduction-to-cambricon-gpu-modes">Introduction to Cambricon GPU Modes<a class="headerlink" href="#introduction-to-cambricon-gpu-modes" title="Permanent link"></a></h2>
<p>Cambricon GPUs have the following modes:</p>
<ul>
Expand All @@ -646,7 +646,7 @@ <h2 id="introduction-to-cambricon-gpu-modes">Introduction to Cambricon GPU Modes
<li>Dynamic SMLU Mode: Further refines resource allocation, allowing control over the size of memory and computing power allocated to containers.</li>
<li>MIM Mode: Allows the Cambricon GPU to be divided into multiple GPUs of fixed specifications for use.</li>
</ul>
<h2 id="using-cambricon-in-suanfeng-ai-computing-platform">Using Cambricon in SuanFeng AI Computing Platform<a class="headerlink" href="#using-cambricon-in-suanfeng-ai-computing-platform" title="Permanent link"></a></h2>
<h2 id="using-cambricon-in-suanova-ai-computing-platform">Using Cambricon in Suanova AI Computing Platform<a class="headerlink" href="#using-cambricon-in-suanova-ai-computing-platform" title="Permanent link"></a></h2>
<p>Here, we take the Dynamic SMLU mode as an example:</p>
<ol>
<li>
Expand Down
8 changes: 4 additions & 4 deletions en/admin/kpanda/gpu/nvidia/full_gpu_userguide.html
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@
<input autocomplete="off" class="md-toggle" data-md-toggle="search" id="__search" type="checkbox"/>
<label class="md-overlay" for="__drawer"></label>
<div data-md-component="skip">
<a class="md-skip" href="#using-the-whole-nvidia-gpu-card-for-an-application">
<a class="md-skip" href="#using-the-whole-nvidia-gpu-for-an-application">
Skip to content
</a>
</div>
Expand Down Expand Up @@ -714,14 +714,14 @@
</div>
<div class="md-content" data-md-component="content">
<article class="md-content__inner md-typeset">
<h1 id="using-the-whole-nvidia-gpu-card-for-an-application">Using the Whole NVIDIA GPU Card for an Application<a class="headerlink" href="#using-the-whole-nvidia-gpu-card-for-an-application" title="Permanent link"></a></h1>
<p>This section describes how to allocate the entire NVIDIA GPU card to a single application on the AI platform platform.</p>
<h1 id="using-the-whole-nvidia-gpu-for-an-application">Using the Whole NVIDIA GPU for an Application<a class="headerlink" href="#using-the-whole-nvidia-gpu-for-an-application" title="Permanent link"></a></h1>
<p>This section describes how to allocate the entire NVIDIA GPU to a single application on the AI platform platform.</p>
<h2 id="prerequisites">Prerequisites<a class="headerlink" href="#prerequisites" title="Permanent link"></a></h2>
<ul>
<li>AI platform container management platform has been <a href="https://docs.daocloud.io/install/index.html">deployed</a> and is running properly.</li>
<li>The container management module has been <a href="../../clusters/integrate-cluster.html">connected to a Kubernetes cluster</a> or a Kubernetes cluster has been <a href="../../clusters/create-cluster.html">created</a>, and you can access the UI interface of the cluster.</li>
<li>GPU Operator has been offline installed and NVIDIA DevicePlugin has been enabled on the current cluster. Refer to <a href="install_nvidia_driver_of_operator.html">Offline Installation of GPU Operator</a> for instructions.</li>
<li>The GPU card in the current cluster has not undergone any virtualization operations or been occupied by other applications.</li>
<li>The GPU in the current cluster has not undergone any virtualization operations or been occupied by other applications.</li>
</ul>
<h2 id="procedure">Procedure<a class="headerlink" href="#procedure" title="Permanent link"></a></h2>
<h3 id="configuring-via-the-user-interface">Configuring via the User Interface<a class="headerlink" href="#configuring-via-the-user-interface" title="Permanent link"></a></h3>
Expand Down
4 changes: 2 additions & 2 deletions en/admin/kpanda/gpu/nvidia/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@
<input autocomplete="off" class="md-toggle" data-md-toggle="search" id="__search" type="checkbox"/>
<label class="md-overlay" for="__drawer"></label>
<div data-md-component="skip">
<a class="md-skip" href="#nvidia-gpu-card-usage-modes">
<a class="md-skip" href="#nvidia-gpu-usage-modes">
Skip to content
</a>
</div>
Expand Down Expand Up @@ -642,7 +642,7 @@
</div>
<div class="md-content" data-md-component="content">
<article class="md-content__inner md-typeset">
<h1 id="nvidia-gpu-card-usage-modes">NVIDIA GPU Card Usage Modes<a class="headerlink" href="#nvidia-gpu-card-usage-modes" title="Permanent link"></a></h1>
<h1 id="nvidia-gpu-usage-modes">NVIDIA GPU Usage Modes<a class="headerlink" href="#nvidia-gpu-usage-modes" title="Permanent link"></a></h1>
<p>NVIDIA, as a well-known graphics computing provider, offers various software and hardware solutions to enhance computational power. Among them, NVIDIA provides the following three solutions for GPU usage:</p>
<h4 id="full-gpu">Full GPU<a class="headerlink" href="#full-gpu" title="Permanent link"></a></h4>
<p>Full GPU refers to allocating the entire NVIDIA GPU to a single user or application. In this configuration, the application can fully occupy all the resources of the GPU and achieve maximum computational performance. Full GPU is suitable for workloads that require a large amount of computational resources and memory, such as deep learning training, scientific computing, etc.</p>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -803,7 +803,7 @@ <h1 id="offline-install-gpu-operator">Offline Install gpu-operator<a class="head
<h2 id="prerequisites">Prerequisites<a class="headerlink" href="#prerequisites" title="Permanent link"></a></h2>
<ul>
<li>The kernel version of the cluster nodes where the gpu-operator is to be deployed must be
completely consistent. The distribution and GPU card model of the nodes must fall within
completely consistent. The distribution and GPU model of the nodes must fall within
the scope specified in the <a href="../gpu_matrix.html">GPU Support Matrix</a>.</li>
<li>When installing the gpu-operator, select v23.9.0+2 or above.</li>
</ul>
Expand Down
2 changes: 1 addition & 1 deletion en/admin/kpanda/gpu/nvidia/mig/create_mig.html
Original file line number Diff line number Diff line change
Expand Up @@ -768,7 +768,7 @@ <h1 id="enabling-mig-features">Enabling MIG Features<a class="headerlink" href="
<li><strong>Single mode</strong> : Nodes expose a single type of MIG device on all their GPUs.</li>
<li><strong>Mixed mode</strong> : Nodes expose a mixture of MIG device types on all their GPUs.</li>
</ul>
<p>For more details, refer to the <a href="../index.html">NVIDIA GPU Card Usage Modes</a>.</p>
<p>For more details, refer to the <a href="../index.html">NVIDIA GPU Usage Modes</a>.</p>
<h2 id="prerequisites">Prerequisites<a class="headerlink" href="#prerequisites" title="Permanent link"></a></h2>
<ul>
<li>Check the system requirements for the GPU driver installation on the target node: <a href="../../gpu_matrix.html">GPU Support Matrix</a></li>
Expand Down
Loading

0 comments on commit 89a7dd8

Please sign in to comment.