respect train/eval mode in traced network #1217

sebffischer · 2024-12-10T10:37:45Z

Further issues:

Reading a script function is currently loaded as a module, I don't think this should be the case
Probably we should use a separate CompilationUnit per compiled module

.github/workflows/lantern.yaml

sebffischer · 2024-12-16T09:37:56Z

Note that this maybe also fixes https://github.com/mlverse/torch/pull/633/files but I need to check again

dfalbel · 2024-12-19T18:21:37Z

.github/workflows/main.yaml

@@ -27,7 +27,7 @@ jobs:
        config:

          - {os: macOS, r_version: release, version: cpu-intel, runner: macos-13}
-          - {os: macOS, r_version: release, version: cpu-m1, runner: macos-13}
+          - {os: macOS, r_version: release, version: cpu-m1, runner: macos-latest}


Actually, I was looking at why we keep failing on some tests and it seems that github runners, even though running on arm64 images cannot run MPS, as this API can't be acccessed by VM's under macOS, so it requires real self-hosted machines.

thanks, fixed!

I think something might be broken with the custom macos runner, at least it's taking forever to start the job.

sebffischer · 2025-01-06T17:37:20Z

@dfalbel I think something is broken with the macOS runner. I don't think that this PR should behave differently on different operating systems, however.

dfalbel · 2025-01-06T18:38:32Z

The runner was missing a m1 label that I just added. It should run on m1 now for this PR

sebffischer · 2025-01-07T08:11:58Z

The runner was missing a m1 label that I just added. It should run on m1 now for this PR

Thanks, but I still think there is something off with the M1 runner: https://github.com/mlverse/torch/actions/runs/12633265519/job/35213843863

dfalbel

@sebffischer Looks good! I added some comments, let me know what you think!

dfalbel · 2025-01-14T16:31:22Z

tests/testthat/assets/linear.pt

Do we really need this file?

This file already existed before and was used in test-script_module.R, I merely updated it

dfalbel · 2025-01-14T16:38:04Z

R/trace.R

+        should_mangle = TRUE,
+        manage_memory = FALSE
+      )
+      mod$eval()


I wonder if we should add an argument to jit_trace that would keep the old behavior. My two concerns are:

Tracing runs the network twice, which could be problematic for some users.

Duplicates the size of the graph, which might be undesidered. Maybe a user wants to just trace the forward method in eval mode to export for deployment.

Yes, makes sense. But then I think calling $train() and $eval() should maybe result in an error, what do you think?

And should the default be to respect the train/eval-mode or not?

actually i think there should maybe be no default to force the user to specify this

Ok, I changed my mind again, I think the default TRUE is fine, but I am happy to change it as well.
In which cases do you think running the network twice is problematic?

Added a respect_mode argument that triggers the double/single tracing

respect train/eval mode in traced network

9bc195c

sebffischer mentioned this pull request Dec 10, 2024

Fix/trace mode #1216

Closed

3 tasks

sebffischer added 7 commits December 10, 2024 10:50

bugfixes

8e1e7e1

remove folder

96344d1

...

92eccb2

fixes

5a3984e

...

533a040

remove unneeded files

7b26a41

hopefully fix CI

7a52f42

dfalbel added the lantern Use this label if your PR affects lantern so it's built in the CI label Dec 11, 2024

sebffischer added 2 commits December 11, 2024 13:31

trigger CI

c5fcdfb

use github hosted macos runners

b807f48

dfalbel reviewed Dec 11, 2024

View reviewed changes

.github/workflows/lantern.yaml Outdated Show resolved Hide resolved

sebffischer added 2 commits December 11, 2024 20:42

fix macos runner

5c51452

delete comment

8580b7a

typo

1c101bc

dfalbel reviewed Dec 19, 2024

View reviewed changes

use custom runners for m1 again

9076a90

dfalbel reviewed Jan 14, 2025

View reviewed changes

sebffischer added 6 commits January 14, 2025 19:53

Merge branch 'main' into fix/trace-jit

d68cdfd

add respect_mode argument

fb03814

improve test

986d106

remove accidental code

fe76fc3

Merge branch 'main' into fix/trace-jit

635fa15

Merge branch 'main' into fix/trace-jit

8a101fa

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

respect train/eval mode in traced network #1217

respect train/eval mode in traced network #1217

sebffischer commented Dec 10, 2024 •

edited

Loading

sebffischer commented Dec 16, 2024

dfalbel Dec 19, 2024

sebffischer Dec 20, 2024

sebffischer Dec 21, 2024

sebffischer commented Jan 6, 2025

dfalbel commented Jan 6, 2025

sebffischer commented Jan 7, 2025

dfalbel left a comment

dfalbel Jan 14, 2025

sebffischer Jan 14, 2025

dfalbel Jan 14, 2025

sebffischer Jan 14, 2025

sebffischer Jan 14, 2025

sebffischer Jan 16, 2025

sebffischer Jan 16, 2025

respect train/eval mode in traced network #1217

Are you sure you want to change the base?

respect train/eval mode in traced network #1217

Conversation

sebffischer commented Dec 10, 2024 • edited Loading

sebffischer commented Dec 16, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sebffischer commented Jan 6, 2025

dfalbel commented Jan 6, 2025

sebffischer commented Jan 7, 2025

dfalbel left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sebffischer commented Dec 10, 2024 •

edited

Loading