Skip to content

Commit

Permalink
Merge pull request #94 from sophongo/quick
Browse files Browse the repository at this point in the history
add end-user/baize/quick-start.md
  • Loading branch information
windsonsea authored Nov 22, 2024
2 parents 243c34e + fcb2a3c commit 151f7ad
Show file tree
Hide file tree
Showing 5 changed files with 217 additions and 0 deletions.
Binary file added docs/en/end-user/baize/images/baize-05.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/en/end-user/baize/images/baize-07.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
103 changes: 103 additions & 0 deletions docs/en/end-user/baize/quick-start.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,103 @@
# Quick Start Guide

This article serves as a straightforward manual for users to leverage AI Lab throughout the development and training process involving datasets, Notebooks, and job training.

## Preparing Your Dataset

Start by clicking on **Data Management** -> **Datasets**, and then select the **Create** button to set up the three datasets outlined below.

### Dataset: Training Code

- **Code Source:** [https://github.com/samzong/training-sample-code.git](https://github.com/samzong/training-sample-code.git). This repository contains a simple TensorFlow code sample.
- If you're located in China, you can access it more quickly via Gitee: [https://gitee.com/samzong_lu/training-sample-code.git](https://gitee.com/samzong_lu/training-sample-code.git)
- The code can be found at: `tensorflow/tf-fashion-mnist-sample`


!!! note

Currently, only the `StorageClass` with read-write mode `ReadWriteMany` is supported. Please use NFS or the recommended [JuiceFS](https://juicefs.com/en/).

### Dataset: Training Data

For this training session, we will use the Fashion-MNIST dataset, which can be found at [https://github.com/zalandoresearch/fashion-mnist.git](https://github.com/zalandoresearch/fashion-mnist.git).

If you're in China, you can use Gitee for a quicker download: [https://gitee.com/samzong_lu/fashion-mnist.git](https://gitee.com/samzong_lu/fashion-mnist.git)


!!! note

If the training data dataset isn't created beforehand, it will be automatically downloaded during the training script execution. Preparing the dataset in advance can help speed up the training process.

### Dataset: Empty Dataset

AI Lab allows you to use `PVC` as the data source type for datasets. After creating an empty PVC bound to the dataset, you can utilize this empty dataset to store the output datasets from future training jobs, including models and logs.


## Environment Dependency: TensorFlow

When running the script, you'll need the `TensorFlow` Python library. You can use AI Lab's environment dependency management feature to download and prepare the necessary Python libraries in advance, eliminating the need for image builds.

> Check out the [Environment Dependency](./dataset/environments.md) guide to add a `CONDA` environment.
```yaml
name: tensorflow
channels:
- defaults
- conda-forge
dependencies:
- python=3.12
- tensorflow
prefix: /opt/conda/envs/tensorflow
```
!!! note
After the environment is successfully set up, you only need to mount this environment to the Notebook or training jobs, using the base image provided by AI Lab.
## Using a Notebook to Debug Your Script
Prepare your development environment by clicking on **Notebooks** in the navigation bar, then hit **Create**.
- Associate the [three datasets](#preparing-your-dataset) you prepared earlier, filling in the mount paths as shown in the image below. Make sure to configure the empty dataset in the output dataset location.
- Select and bind the [environment dependency package](#tensorflow).
Wait for the Notebook to be successfully created, then click the access link in the list to enter the Notebook. In the Notebook terminal, run the following command to start the training job:
![Enter Notebook](../images/baize-05.png)
!!! note
The script uses TensorFlow; if you forget to associate the dependency library, you can temporarily install it using `pip install tensorflow`.

```shell
python /home/jovyan/code/tensorflow/tf-fashion-mnist-sample/train.py
```

## Creating a Training Job

1. Click on **Job Center** -> **Training Jobs** in the navigation bar to create a standalone `TensorFlow` job.
2. Fill in the basic parameters and click **Next**.
3. In the job resource configuration, correctly set up the job resources and click **Next**.

- **Image:** If you prepared the environment dependency package earlier, you can use the default image. Otherwise, make sure the image includes the `TensorFlow` Python library.
- **Shell:** Use `bash`.
- **Enable Command:**

```bash
python /home/jovyan/code/tensorflow/tf-fashion-mnist-sample/train.py
```

4. In the advanced configuration, enable **Job Analysis (TensorBoard)**, and click **OK**.

!!! note

Logs will be saved in the output dataset at `/home/jovyan/model/train/logs/`.


5. Return to the training job list and wait for the status to change to **Success**. Click on the **┇** icon on the right side of the list to view details, clone jobs, update priority, view logs, and delete jobs, among other options.

6. Once the job is successfully created, click on **Job Analysis** in the left navigation bar to check the job status and fine-tune your training.

![View Job](../images/baize-07.png)
113 changes: 113 additions & 0 deletions docs/end-user/baize/quick-start.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,113 @@
# 快速入门

本文提供了简单的操作手册以便用户使用 AI Lab 进行数据集、Notebook、任务训练的整个开发、训练流程。

## 准备数据集

点击 **数据管理** -> **数据集** ,选择 **创建** 按钮,分别创建以下三个数据集。

### 数据集:训练代码

- 代码数据源:[https://github.com/samzong/training-sample-code.git](https://github.com/samzong/training-sample-code.git),主要是一个简单的 Tensorflow 代码。
- 如果是中国境内的用户,可以使用 Gitee 加速:[https://gitee.com/samzong_lu/training-sample-code.git](https://gitee.com/samzong_lu/training-sample-code.git)
- 代码路径为 `tensorflow/tf-fashion-mnist-sample`

![训练代码的数据集](../images/baize-01.png)

!!! note

目前仅支持读写模式为 `ReadWriteMany` 的 `StorageClass`,请使用 NFS 或者推荐的 [JuiceFS](https://juicefs.com/zh-cn/)。

### 数据集:训练数据

本次训练使用的数据为 [https://github.com/zalandoresearch/fashion-mnist.git](https://github.com/zalandoresearch/fashion-mnist.git)
这是 Fashion-MNIST 数据集。

如果是中国境内的用户,可以使用 Gitee 加速:[https://gitee.com/samzong_lu/fashion-mnist.git](https://gitee.com/samzong_lu/fashion-mnist.git)

![训练数据的数据集](../images/baize-02.png)

!!! note

如果未创建训练数据的数据集,通过训练脚本也会自动下载;提前准备训练数据可以提高训练速度。

### 数据集:空数据集

AI Lab 支持将 `PVC` 作为数据集的数据源类型,所以创建一个空 PVC 绑定到数据集后,可将空数据集作为存放后续训练任务的输出数据集,存放模型和日志。

![空数据集](../images/baize-03.png)

## 环境依赖: tensorflow

脚本在运行时,需要依赖 `Tensorflow` 的 Python 库,可以使用 AI Lab 的环境依赖管理功能,提前将需要的 Python 库下载和准备完成,无需依赖镜像构建

> 参考[环境依赖](./dataset/environments.md) 的操作方式,添加一个 `CONDA` 环境.
```yaml
name: tensorflow
channels:
- defaults
- conda-forge
dependencies:
- python=3.12
- tensorflow
prefix: /opt/conda/envs/tensorflow
```
![创建环境依赖](../images/baize-08.png)
!!! note
等待环境预热成功后,只需要将此环境挂载到 Notebook、训练任务中,使用 AI Lab 提供的基础镜像就可以
## 使用 Notebook 调试脚本
准备开发环境,点击导航栏的 **Notebooks** ,点击 **创建** 。
- 将[准备好的三个数据集](#_2)进行关联,挂载路径请参照下图填写,注意将需要使用的空数据集在 输出数据集位置配置
![挂载路径](../images/baize-06.png)
- 选择并绑定[环境依赖包](#tensorflow)
等待 Notebook 创建成功,点击列表中的访问地址,进入 Notebook。并在 Notebook 的终端中执行以下命令进行任务训练。
![进入 notebook](../images/baize-05.png)
!!! note
脚本使用 Tensorflow,如果忘记关联依赖库,也可以临时用 `pip install tensorflow` 安装。

```shell
python /home/jovyan/code/tensorflow/tf-fashion-mnist-sample/train.py
```

## 创建训练任务

1. 点击导航栏的 **任务中心** -> **训练任务** ,创建一个 `Tensorflow` 单机任务
1. 先填写基本参数后,点击 **下一步**
1. 在任务资源配置中,正确配置任务资源后,点击 **下一步**

- 镜像:如果前序环境依赖包准备好了,使用默认镜像即可; 如果未准备,要确认镜像内有 `tensorflow` 的 Python 库
- shell:使用 `bash` 即可
- 启用命令:

```bash
python /home/jovyan/code/tensorflow/tf-fashion-mnist-sample/train.py
```

1. 在高级配置中,启用 **任务分析(Tensorboard)** ,点击 **确定** 。

!!! note

日志所在位置为输出数据集的 `/home/jovyan/model/train/logs/`

![高级配置](../images/enable-analy.png)

1. 返回训练任务列表,等到状态变为 **成功** 。点击列表右侧的 **┇** ,可以查看详情、克隆任务、更新优先级、查看日志和删除等操作。

![提交训练任务](../images/othera.png)

1. 成功创建任务后,在左侧导航栏点击 **任务分析** ,可以查看任务状态并对任务训练进行调优。

![查看任务](../images/baize-07.png)
1 change: 1 addition & 0 deletions navigation.yml
Original file line number Diff line number Diff line change
Expand Up @@ -85,6 +85,7 @@ nav:
- 集群和命名空间授权: end-user/kpanda/permissions/cluster-ns-auth.md
- 增加容器管理内置权限点: end-user/kpanda/permissions/custom-kpanda-role.md
- 算法开发:
- 快速入门: end-user/quick-start.md
- 创建 AI 工作负载: end-user/share/workload.md
- 使用 Notebook: end-user/share/notebook.md
- 创建训练任务:
Expand Down

0 comments on commit 151f7ad

Please sign in to comment.