[Example] Add preformer for precipitation nowcasting #976

EricKing19 · 2024-08-19T01:59:49Z

PR types

Others

PR changes

Others

Describe

add Preformer model for precipitation nowcasting
add docs for Preformer
add examples for Preformer

paddle-bot · 2024-08-19T01:59:54Z

Thanks for your contribution!

CLAassistant · 2024-08-19T01:59:55Z

All committers have signed the CLA.

HydrogenSulfate

感谢提交PR，有几处小问题麻烦看一下

HydrogenSulfate · 2024-08-20T06:38:17Z

docs/zh/examples/preformer.md

+
+    ``` sh
+    # 模型训练
+    python examples/preformer/train.py


Suggested change

python examples/preformer/train.py

python train.py

HydrogenSulfate · 2024-08-20T06:38:25Z

docs/zh/examples/preformer.md

+
+    ``` sh
+    # 模型评估
+    python examples/preformer/train.py mode=eval


Suggested change

python examples/preformer/train.py mode=eval

python train.py mode=eval

HydrogenSulfate · 2024-08-20T06:38:45Z

examples/preformer/train.py

文件建议改名为main.py

HydrogenSulfate · 2024-08-20T06:39:19Z

examples/preformer/train.py

+    # set random seed for reproducibility
+    ppsci.utils.misc.set_random_seed(cfg.seed)
+    # initialize logger
+    logger.init_logger("ppsci", osp.join(cfg.output_dir, "train.log"), "info")
+


HydrogenSulfate · 2024-08-20T06:42:37Z

examples/preformer/train.py

+                "num_replicas": NUM_GPUS_PER_NODE,
+                "rank": dist.get_rank() % NUM_GPUS_PER_NODE,


这两个参数应该不需要，并且paddlescience也没有对应的处理逻辑，默认会根据环境中设置的卡数自动设置

HydrogenSulfate · 2024-08-20T06:54:11Z

ppsci/data/dataset/era5sq_dataset.py

+            mon = str("0") + mon
+        day = str(self.time_table[idxs].timetuple().tm_mday)
+        if len(day) == 1:
+            day = str("0") + day


str("0")是否可以直接写成"0"？，下同

HydrogenSulfate · 2024-08-20T06:54:48Z

ppsci/data/dataset/era5sq_dataset.py

+        r_data = np.load(
+            os.path.join(self.file_path, year, "r_" + year + mon + day + hour + ".npy")
+        )
+        t_data = np.load(
+            os.path.join(self.file_path, year, "t_" + year + mon + day + hour + ".npy")
+        )
+        u_data = np.load(
+            os.path.join(self.file_path, year, "u_" + year + mon + day + hour + ".npy")
+        )
+        v_data = np.load(
+            os.path.join(self.file_path, year, "v_" + year + mon + day + hour + ".npy")
+        )


可以直接使用f-string化简字符串拼接的写法

HydrogenSulfate · 2024-08-20T06:55:45Z

examples/preformer/conf/train.yaml

+hydra:
+  run:
+    # dynamic output directory according to running time and override name
+    dir: outputs_preformer
+  job:
+    name: ${mode} # name of logfile
+    chdir: false # keep current working directory unchanged
+    config:
+      override_dirname:
+        exclude_keys:
+          - TRAIN.checkpoint_path
+          - TRAIN.trained_model_path
+          - EVAL.trained_model_path
+          - mode
+          - output_dir
+          - log_freq
+  sweep:
+    # output directory for multirun
+    dir: ${hydra.run.dir}
+    subdir: ./
+


Suggested change

hydra:

run:

# dynamic output directory according to running time and override name

dir: outputs_preformer

job:

name: ${mode} # name of logfile

chdir: false # keep current working directory unchanged

config:

override_dirname:

exclude_keys:

- TRAIN.checkpoint_path

- TRAIN.trained_model_path

- EVAL.trained_model_path

- mode

- output_dir

- log_freq

sweep:

# output directory for multirun

dir: ${hydra.run.dir}

subdir: ./

defaults:

- ppsci_default

- TRAIN: train_default

- TRAIN/ema: ema_default

- TRAIN/swa: swa_default

- EVAL: eval_default

- INFER: infer_default

- hydra/job/config/override_dirname/exclude_keys: exclude_keys_default

- _self_

hydra:

run:

# dynamic output directory according to running time and override name

dir: outputs_preformer

job:

name: ${mode} # name of logfile

chdir: false # keep current working directory unchanged

sweep:

# output directory for multirun

dir: ${hydra.run.dir}

subdir: ./

HydrogenSulfate · 2024-08-20T06:56:07Z

examples/preformer/conf/train.yaml

+
+# model settings
+MODEL:
+  afno:


单模型可以删除afno这一层级

HydrogenSulfate · 2024-08-20T07:04:21Z

examples/preformer/conf/train.yaml

+  afno:
+    input_keys: ["input"]
+    output_keys: ["output"]
+    shape_in: [6, 12, IMG_H, IMG_W]


Suggested change

shape_in: [6, 12, IMG_H, IMG_W]

shape_in:

- 6

- 12

- ${IMG_H}

- ${IMG_W}

HydrogenSulfate · 2024-08-20T11:54:07Z

@EricKing19 标题已经修改过了，原先的merge code of upstream不太合适

liaoxin2 · 2024-08-27T02:55:54Z

docs/zh/examples/preformer.md

+案例中使用了预处理的 PEMSD4 和 PEMSD8 数据集。PEMSD4 为旧金山湾区交通数据，选取 29 条道路上 307 个传感器记录的交通数据，时间为 2018 年 1 月至 2 月。PEMSD8 为圣贝纳迪诺 8 条道路上 170 个检测器收集的交通数据，时间为 2016 年 7 月至 8 月。
+
+两个数据集均被保存为 N x T x 1 的矩阵，记录了相应交通节点与时间的流量数据，其中 N 为交通节点数量，T 为时间序列长度。两个数据集分别按照 7:2:1 划分为训练集、验证集，和测试集。案例中预先计算了流量数据的均值与标准差，用于后续的正则化操作。


该案例是关于降水的，这个数据集好像是交通的，数据集与代码不一致

HydrogenSulfate · 2024-10-15T16:58:32Z

docs/zh/examples/preformer.md

+开始训练、评估前，请下载数据集文件
+
+开始评估前，请下载或训练生成预训练模型
+


可以稍微介绍一下数据集的准备过程吗？比如如何下载和解压后的文件组织形式？

HydrogenSulfate · 2024-10-15T16:58:56Z

docs/zh/examples/preformer.md

+=== "模型训练命令"
+
+    ``` sh
+    # 模型训练


删除这个注释，上面这个标签已经说明了这是模型训练命令了

HydrogenSulfate · 2024-10-15T16:59:00Z

docs/zh/examples/preformer.md

+=== "模型评估命令"
+
+    ``` sh
+    # 模型评估


同上，删除该行注释

HydrogenSulfate · 2024-10-15T17:00:47Z

docs/zh/examples/preformer.md

+
+    ``` sh
+    # 模型评估
+    python train.py mode=eval


这里麻烦提供一下您训练好的预训练模型文件(.pdparams文件即可)，我们上传到bce上，这样就能通过在命令里直接指定预训练模型url直接下载并在评估前自动加载权重，不需要额外的手动下载了

HydrogenSulfate · 2024-10-15T17:05:21Z

docs/zh/examples/preformer.md

+#### 3.2.6 模型导出
+
+通过设置 `ppsci.solver.Solver` 中的 `eval_during_train` 和 `eval_freq` 参数，可以自动保存在验证集上效果最优的模型参数。
+
+``` py linenums="100" title="examples/preformer/train.py"
+--8<--
+examples/preformer/train.py:158:158
+--8<--
+```
+


模型导出章节可以不用出现在文章中，删除

请补充模型导出的函数def export和def inference到examples\preformer\main.py中，参考：

PaddleScience/examples/allen_cahn/allen_cahn_piratenet.py

Lines 235 to 269 in 83f6739

def export(cfg: DictConfig):

# set model

model = ppsci.arch.PirateNet(**cfg.MODEL)

# initialize solver

solver = ppsci.solver.Solver(model, cfg=cfg)

# export model

from paddle.static import InputSpec

input_spec = [

{key: InputSpec([None, 1], "float32", name=key) for key in model.input_keys},

]

solver.export(input_spec, cfg.INFER.export_path, with_onnx=False)

def inference(cfg: DictConfig):

from deploy.python_infer import pinn_predictor

predictor = pinn_predictor.PINNPredictor(cfg)

data = sio.loadmat(cfg.DATA_PATH)

u_ref = data["usol"].astype(dtype) # (nt, nx)

t_star = data["t"].flatten().astype(dtype) # [nt, ]

x_star = data["x"].flatten().astype(dtype) # [nx, ]

tx_star = misc.cartesian_product(t_star, x_star).astype(dtype)

input_dict = {"t": tx_star[:, 0:1], "x": tx_star[:, 1:2]}

output_dict = predictor.predict(input_dict, cfg.INFER.batch_size)

# mapping data to cfg.INFER.output_keys

output_dict = {

store_key: output_dict[infer_key]

for store_key, infer_key in zip(cfg.MODEL.output_keys, output_dict.keys())

}

u_pred = output_dict["u"].reshape([len(t_star), len(x_star)])

plot(t_star, x_star, u_ref, u_pred, cfg.output_dir)

模型导出和模型推理执行命令请添加到文档开头处的"=== "模型评估命令""后面

HydrogenSulfate · 2024-10-15T17:19:31Z

ppsci/arch/preformer.py

+        return latent
+
+
+class Mid_Xnet(nn.Layer):


Mid_Xnet建议改为MidXNet，命名更规范

HydrogenSulfate · 2024-10-15T17:19:46Z

ppsci/arch/preformer.py

+    def forward(self, hid, enc1=None):
+        for i in range(0, len(self.dec)):
+            hid = self.dec[i](hid)
+        # Y = self.dec[-1](torch.cat([hid, enc1], dim=1))


这行注释是否可以删除？

HydrogenSulfate · 2024-10-15T17:20:07Z

ppsci/data/dataset/era5sq_dataset.py

+        for m in range(self.sq_length):
+            x.append(self.load_data(global_idx + m))
+        for n in range(self.sq_length):
+            # y.append(self.load_data(global_idx+n))


这行注释是否可以删除？

HydrogenSulfate · 2024-10-15T17:20:56Z

ppsci/data/dataset/era5sq_dataset.py

+            # y.append(self.load_data(global_idx+n))
+            y.append(self.precipitation["tp"][global_idx + self.sq_length + n])
+        # x = self.Normalize(x)
+        x, y = self.RandomCrop(x, y)


self.RandomCrop是否应该是self._random_crop?

HydrogenSulfate · 2024-10-15T17:21:37Z

ppsci/data/dataset/era5sq_dataset.py

+    def _random_crop(self, x, y):
+        if isinstance(self.size, numbers.Number):
+            self.size = (int(self.size), int(self.size))
+        th, tw = self.size
+        h, w = y[0].shape[-2], y[0].shape[-1]
+        x1 = random.randint(0, w - tw)
+        y1 = random.randint(0, h - th)
+
+        for i in range(len(x)):
+            x[i] = self.crop(x[i], y1, x1, y1 + th, x1 + tw)
+        for i in range(len(y)):
+            y[i] = self.crop(y[i], y1, x1, y1 + th, x1 + tw)
+
+        return x, y
+
+    def crop(self, im, x_start, y_start, x_end, y_end):
+        if len(im.shape) == 3:
+            return im[:, x_start:x_end, y_start:y_end]
+        else:
+            return im[x_start:x_end, y_start:y_end]


非公开方法前面建议加上下划线：

Suggested change

def _random_crop(self, x, y):

if isinstance(self.size, numbers.Number):

self.size = (int(self.size), int(self.size))

th, tw = self.size

h, w = y[0].shape[-2], y[0].shape[-1]

x1 = random.randint(0, w - tw)

y1 = random.randint(0, h - th)

for i in range(len(x)):

x[i] = self.crop(x[i], y1, x1, y1 + th, x1 + tw)

for i in range(len(y)):

y[i] = self.crop(y[i], y1, x1, y1 + th, x1 + tw)

return x, y

def crop(self, im, x_start, y_start, x_end, y_end):

if len(im.shape) == 3:

return im[:, x_start:x_end, y_start:y_end]

else:

return im[x_start:x_end, y_start:y_end]

def _random_crop(self, x, y):

if isinstance(self.size, numbers.Number):

self.size = (int(self.size), int(self.size))

th, tw = self.size

h, w = y[0].shape[-2], y[0].shape[-1]

x1 = random.randint(0, w - tw)

y1 = random.randint(0, h - th)

for i in range(len(x)):

x[i] = self._crop(x[i], y1, x1, y1 + th, x1 + tw)

for i in range(len(y)):

y[i] = self._crop(y[i], y1, x1, y1 + th, x1 + tw)

return x, y

def _crop(self, im, x_start, y_start, x_end, y_end):

if len(im.shape) == 3:

return im[:, x_start:x_end, y_start:y_end]

else:

return im[x_start:x_end, y_start:y_end]

HydrogenSulfate · 2024-10-16T04:15:32Z

@EricKing19 顺带解决一下冲突问题

merge code of upstream

b7e0216

paddle-bot bot added the contributor label Aug 19, 2024

HydrogenSulfate requested changes Aug 20, 2024

View reviewed changes

HydrogenSulfate changed the title ~~merge code of upstream~~ [Example] Add preformer for precipitation nowcasting Aug 20, 2024

luotao1 self-assigned this Aug 21, 2024

liaoxin2 reviewed Aug 27, 2024

View reviewed changes

EricKing19 added 6 commits October 15, 2024 10:39

Update preformer.md

9092f86

Update preformer.md

038b3fc

Update train.yaml

52adb6a

Update and rename train.py to main.py

c110ae4

Update era5sq_dataset.py

88eda6a

Update era5sq_dataset.py

b99da76

HydrogenSulfate requested changes Oct 16, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Example] Add preformer for precipitation nowcasting #976

[Example] Add preformer for precipitation nowcasting #976

EricKing19 commented Aug 19, 2024

paddle-bot bot commented Aug 19, 2024

CLAassistant commented Aug 19, 2024 •

edited

Loading

HydrogenSulfate left a comment

HydrogenSulfate Aug 20, 2024

HydrogenSulfate Aug 20, 2024

HydrogenSulfate Aug 20, 2024

HydrogenSulfate Aug 20, 2024

HydrogenSulfate Aug 20, 2024

HydrogenSulfate Aug 20, 2024

HydrogenSulfate Aug 20, 2024

HydrogenSulfate Aug 20, 2024

HydrogenSulfate Aug 20, 2024

HydrogenSulfate Aug 20, 2024

HydrogenSulfate commented Aug 20, 2024

liaoxin2 Aug 27, 2024

HydrogenSulfate Oct 15, 2024

HydrogenSulfate Oct 15, 2024

HydrogenSulfate Oct 15, 2024

HydrogenSulfate Oct 15, 2024

HydrogenSulfate Oct 15, 2024

HydrogenSulfate Oct 15, 2024

HydrogenSulfate Oct 15, 2024

HydrogenSulfate Oct 15, 2024

HydrogenSulfate Oct 15, 2024

HydrogenSulfate Oct 15, 2024

HydrogenSulfate commented Oct 16, 2024

	python examples/preformer/train.py mode=eval
	python train.py mode=eval

		"num_replicas": NUM_GPUS_PER_NODE,
		"rank": dist.get_rank() % NUM_GPUS_PER_NODE,

		案例中使用了预处理的 PEMSD4 和 PEMSD8 数据集。PEMSD4 为旧金山湾区交通数据，选取 29 条道路上 307 个传感器记录的交通数据，时间为 2018 年 1 月至 2 月。PEMSD8 为圣贝纳迪诺 8 条道路上 170 个检测器收集的交通数据，时间为 2016 年 7 月至 8 月。

		两个数据集均被保存为 N x T x 1 的矩阵，记录了相应交通节点与时间的流量数据，其中 N 为交通节点数量，T 为时间序列长度。两个数据集分别按照 7:2:1 划分为训练集、验证集，和测试集。案例中预先计算了流量数据的均值与标准差，用于后续的正则化操作。

		开始训练、评估前，请下载数据集文件

		开始评估前，请下载或训练生成预训练模型

	def export(cfg: DictConfig):
	# set model
	model = ppsci.arch.PirateNet(**cfg.MODEL)

	# initialize solver
	solver = ppsci.solver.Solver(model, cfg=cfg)
	# export model
	from paddle.static import InputSpec

	input_spec = [
	{key: InputSpec([None, 1], "float32", name=key) for key in model.input_keys},
	]
	solver.export(input_spec, cfg.INFER.export_path, with_onnx=False)


	def inference(cfg: DictConfig):
	from deploy.python_infer import pinn_predictor

	predictor = pinn_predictor.PINNPredictor(cfg)
	data = sio.loadmat(cfg.DATA_PATH)
	u_ref = data["usol"].astype(dtype) # (nt, nx)
	t_star = data["t"].flatten().astype(dtype) # [nt, ]
	x_star = data["x"].flatten().astype(dtype) # [nx, ]
	tx_star = misc.cartesian_product(t_star, x_star).astype(dtype)

	input_dict = {"t": tx_star[:, 0:1], "x": tx_star[:, 1:2]}
	output_dict = predictor.predict(input_dict, cfg.INFER.batch_size)
	# mapping data to cfg.INFER.output_keys
	output_dict = {
	store_key: output_dict[infer_key]
	for store_key, infer_key in zip(cfg.MODEL.output_keys, output_dict.keys())
	}
	u_pred = output_dict["u"].reshape([len(t_star), len(x_star)])

	plot(t_star, x_star, u_ref, u_pred, cfg.output_dir)

[Example] Add preformer for precipitation nowcasting #976

Are you sure you want to change the base?

[Example] Add preformer for precipitation nowcasting #976

Conversation

EricKing19 commented Aug 19, 2024

PR types

PR changes

Describe

paddle-bot bot commented Aug 19, 2024

CLAassistant commented Aug 19, 2024 • edited Loading

HydrogenSulfate left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

HydrogenSulfate commented Aug 20, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

HydrogenSulfate commented Oct 16, 2024

CLAassistant commented Aug 19, 2024 •

edited

Loading