Add registered custom objects inside pickled model file #19867

mthiboust · 2024-06-17T19:39:10Z

See the discussion in #19832

Context

Currently, keras serialization does not store the definition of custom objects (see here). Thus, pickled models are not self-sufficient if you want to load them in a separate vanilla session and call model.predict() if they contain such objects.

In practice, if you have 2 isolated codebases for training and inference, you always need to update the inference one along with the training one when using a custom object. It does not allow to experiment quickly with new models if your inference codebase has long CI/CD pipelines. Storing the definition inside the pickle file would allow to decouple training from inference.

Suggested change

Modify the __reduce__() method of KerasSaveable to store the definition of registered custom keras objects inside the pickle file when a keras model is serialized via pickle-like libraries (e.g. pickle, cloudpickle, dill). Registered custom keras objects are pickled with the standard pickle to avoid extra dependencies (but inheriting pickle limitations like lambda functions, this is a tradeoff).

Add a simple test with a registered custom layer. This test would ideally load the pickled model in a new subprocess to completely verify the wanted behavior. But it would complexify the test. Or maybe there is another way to test this? I can investigate this point if you want.

fchollet · 2024-06-17T21:21:31Z

keras/src/saving/keras_saveable.py

        return (
            self._unpickle_model,
-            (buf,),
+            (model_buf, custom_objects_buf),


This would break existing pickles, no?

Good catch, I have not seen this because I use cloudpickle instead of pickle.

I confirm that it breaks compatibility if users used the standard pickle lib to dump their previous models. Previous models dumped with cloudpickle do not have such compatibility issue because the definition of the unpickling callable is directly stored inside the file (not just a reference).

Handling compatibility issues will probably not be straightforward. Not sure it is worth the effort.

The compatibility issue could be handled by doing something like:

@classmethod def _unpickle_model(cls, model_buf, *args): import keras.src.saving.saving_lib as saving_lib # pickle is not safe regardless of what you do. if len(args) == 0: return saving_lib._load_model_from_fileobj( model_buf, custom_objects=None, compile=True, safe_mode=False, ) else: custom_objects_buf = args[0] custom_objects = pickle.load(custom_objects_buf) return saving_lib._load_model_from_fileobj( model_buf, custom_objects=custom_objects, compile=True, safe_mode=False, )

But after experimenting with pickle instead of cloudpickle, I do not think this PR is adding value if we stick with the standard pickle because pickling the custom object only returns a reference in that case (meaning we still need the object definition to be available at inference time).

2 options:

We pickle the custom objects with cloudpickle only if cloudpickle is importable (without explicitly adding it as a dependency). Otherwise, we do not pickle the custom objects.

We close the PR now

We pickle the custom objects with cloudpickle only if cloudpickle is importable (without explicitly adding it as a dependency). Otherwise, we do not pickle the custom objects.

Maybe let's try that? For backwards compat support, if we have multiple arguments we might want to use a dict instead of a tuple for better readability and maintainability.

New version with the following changes:

Use cloudpickle in the __reduce__() method but keep pickle in the unpickling class method so that we only need cloudpickle in the training environment (possible because cloudpickle.load is just an alias of pickle.load)

Use a dict instead of a tuple in the arguments of the unpickling function

Better isolate tests by adding a pytest fixture to clean the custom objects global dict before each test runs (failing tests in previous commit were due to already registered (but not accessible) custom objects by other tests

Add a test running in a fake __main__ module (so that cloudpickle could serializes the objects) and delete the custom object variable before loading the pickle to verify that its definition is completely stored inside the dumped pickle

Todo:

Ensure backward compatibility by checking the type of the argument (dict vs not dict) in the unpickling function

Not sure where to add cloudpickle as a test-only dependency. If I add it in https://github.com/keras-team/keras/blob/master/requirements-common.txt, it would also been installed in production.

gbaned · 2024-07-26T08:35:23Z

Hi @mthiboust Can you please resolve the conflicts? Thank you!

mthiboust · 2024-07-29T21:05:14Z

Hi @mthiboust Can you please resolve the conflicts? Thank you!

Hello @gbaned, sure, I can dedicate some time to finalize this in the coming days. Adding complete tests would require cloudpickle as a dependency. But from what i understand, dev dependencies are not separated from main dependencies, am I right?

Which one of those 2 options should I choose?

A/ Only do a partial coverage without adding the cloudpickle dependency
B/ Add the cloudpickle dependency for a better test coverage (do you confirm that I should modify this requirements file?)

gbaned · 2024-08-07T05:52:53Z

Hi @mthiboust Can you please resolve the conflicts? Thank you!

Hello @gbaned, sure, I can dedicate some time to finalize this in the coming days. Adding complete tests would require cloudpickle as a dependency. But from what i understand, dev dependencies are not separated from main dependencies, am I right?

Which one of those 2 options should I choose?

A/ Only do a partial coverage without adding the cloudpickle dependency

B/ Add the cloudpickle dependency for a better test coverage (do you confirm that I should modify this requirements file?)

Hi @fchollet Can you please assist on the above comment from @mthiboust. Thank you!

google-ml-butler bot added the size:S label Jun 17, 2024

google-ml-butler bot assigned gbaned Jun 17, 2024

fchollet reviewed Jun 17, 2024

View reviewed changes

mthiboust added 2 commits June 18, 2024 21:28

Custom objects support when pickling keras models

f4865cc

Handle compatibility issue & use cloudpickle

e73c4cc

mthiboust force-pushed the pickle-with-custom-objects branch from 8625e1b to e73c4cc Compare June 18, 2024 21:29

gbaned requested a review from fchollet July 12, 2024 02:59

google-ml-butler bot added the awaiting review label Jul 12, 2024

gbaned added stat:awaiting response from contributor and removed awaiting review labels Jul 26, 2024

google-ml-butler bot removed the stat:awaiting response from contributor label Jul 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add registered custom objects inside pickled model file #19867

Add registered custom objects inside pickled model file #19867

mthiboust commented Jun 17, 2024 •

edited

Loading

fchollet Jun 17, 2024

mthiboust Jun 17, 2024 •

edited

Loading

mthiboust Jun 18, 2024 •

edited

Loading

fchollet Jun 18, 2024

mthiboust Jun 18, 2024 •

edited

Loading

gbaned commented Jul 26, 2024

mthiboust commented Jul 29, 2024

gbaned commented Aug 7, 2024

Add registered custom objects inside pickled model file #19867

Are you sure you want to change the base?

Add registered custom objects inside pickled model file #19867

Conversation

mthiboust commented Jun 17, 2024 • edited Loading

Context

Suggested change

fchollet Jun 17, 2024

Choose a reason for hiding this comment

mthiboust Jun 17, 2024 • edited Loading

Choose a reason for hiding this comment

mthiboust Jun 18, 2024 • edited Loading

Choose a reason for hiding this comment

fchollet Jun 18, 2024

Choose a reason for hiding this comment

mthiboust Jun 18, 2024 • edited Loading

Choose a reason for hiding this comment

gbaned commented Jul 26, 2024

mthiboust commented Jul 29, 2024

gbaned commented Aug 7, 2024

mthiboust commented Jun 17, 2024 •

edited

Loading

mthiboust Jun 17, 2024 •

edited

Loading

mthiboust Jun 18, 2024 •

edited

Loading

mthiboust Jun 18, 2024 •

edited

Loading