Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Implement "Pydantic protocol" #53999

Open
1 of 3 tasks
Kludex opened this issue Jul 4, 2023 · 12 comments
Open
1 of 3 tasks

ENH: Implement "Pydantic protocol" #53999

Kludex opened this issue Jul 4, 2023 · 12 comments
Labels
Enhancement Needs Discussion Requires discussion from core team before further action Typing type annotations, mypy/pyright type checking

Comments

@Kludex
Copy link

Kludex commented Jul 4, 2023

Feature Type

  • Adding new functionality to pandas

  • Changing existing functionality in pandas

  • Removing existing functionality in pandas

Problem Description

I wish I could use Pydantic with Pandas without the need of customization or external library.

Feature Description

The idea would be to implement the __get_pydantic_core_schema__ mentioned here on pd.Series, pd.DataFrame, and others.

Alternative Solutions

There's a PR in pydantic-extra-types to include a custom type for pd.Series.

Additional Context

We have more information on the migration guide.

@Kludex Kludex added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels Jul 4, 2023
@rhshadrach rhshadrach added Typing type annotations, mypy/pyright type checking Needs Discussion Requires discussion from core team before further action and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Jul 5, 2023
@rhshadrach
Copy link
Member

cc @Dr-Irv

@Dr-Irv
Copy link
Contributor

Dr-Irv commented Jul 5, 2023

@Kludex You wrote:

I wish I could use Pydantic with Pandas without the need of customization or external library.

Can you give a short example of what things look like today if you try to use Pydantic with pandas, i.e., the nature of the customization that is required?

@Kludex
Copy link
Author

Kludex commented Jul 6, 2023

Yep!

I'll use pd.Series for the examples below, but it applies to pd.DataFrame as well.

Right now, if you want to have a pd.Series field in Pydantic, you'd something like:

import numpy as np
import pandas as pd
from pydantic import BaseModel, ConfigDict

arr = [1, 2, 3, np.nan, 6, 8]
series = pd.Series(arr)

class Model(BaseModel):
    model_config = ConfigDict(arbitrary_types_allowed=True)

    series: pd.Series

The arbitrary_types_allowed is a configuration that allows to create models with fields with "arbitrary types" (that doesn't implement the "Pydantic protocol").

The thing is that the usage with Pydantic gets limited. I'm not able to generate the JSON schema (interesting for users using FastAPI, for example), nor create the field with lists or dicts.

Implementing the __get_pydantic_core_schema__ would look like (this is a bit simplified):

import numpy as np
import pandas as pd
from pydantic import GetCoreSchemaHandler, BaseModel
from pydantic_core import core_schema
from typing import Any

arr = [1, 2, 3, np.nan, 6, 8]
series = pd.Series(arr)


class Series(pd.Series):
    @classmethod
    def __get_pydantic_core_schema__(
        cls, __source: type[Any], __handler: GetCoreSchemaHandler
    ) -> core_schema.CoreSchema:
        return core_schema.no_info_before_validator_function(
            pd.Series, core_schema.list_schema()
        )


class Model(BaseModel):
    series: Series

model = Model(series=series)
model = Model(series=arr)  # You can now use the `arr`
model = Model(series={"a": 1, "b": 2})  # Or the dict... Or any other type that constructs a pd.Series

model.model_json_schema()  # You are also able to generate the JSON schema

You can see more about this on Creating custom classes using __get_pydantic_core_schema__ on the Pydantic documentation.

Hope it's clear.

@MarcoGorelli
Copy link
Member

if you're willing to submit a PR, this seems nice to have!

@Kludex
Copy link
Author

Kludex commented Jul 6, 2023

if you're willing to submit a PR, this seems nice to have!

I am! Thanks! 🙏

@MarcoGorelli
Copy link
Member

cool - note that pydantic can't be a runtime dependency of pandas, so you may need to use import_optional_dependency - check other examples of this in the code

@mroeschke
Copy link
Member

Is the __get_pydantic_core_schema__ you provided above "general" enough to fit all users' use case for pydantic? Do other pydata libraries like numpy have __get_pydantic_core_schema__ on their objects?

@Kludex
Copy link
Author

Kludex commented Jul 7, 2023

Is the __get_pydantic_core_schema__ you provided above "general" enough to fit all users' use case for pydantic?

No, it's not. I'll push a PR soon.

Do other pydata libraries like numpy have __get_pydantic_core_schema__ on their objects?

They don't need it for this to be implemented in pandas, but I'll also open an issue on numpy.

@MarcoGorelli
Copy link
Member

They don't need it for this to be implemented in pandas,

Sure, but it would just be good to see if there's a precedent for it. Has any other library accepted this yet?

@Kludex
Copy link
Author

Kludex commented Jul 7, 2023

They don't need it for this to be implemented in pandas,

Sure, but it would just be good to see if there's a precedent for it. Has any other library accepted this yet?

Pydantic V2 was released a week ago, and pandas is the first repository I open an issue about this, so I don't think any other library has accepted this yet.

@MarcoGorelli
Copy link
Member

Is the get_pydantic_core_schema you provided above "general" enough to fit all users' use case for pydantic?

No, it's not.

could you clarify please?

@Kludex
Copy link
Author

Kludex commented Jul 8, 2023

Is the get_pydantic_core_schema you provided above "general" enough to fit all users' use case for pydantic?

No, it's not.

could you clarify please?

Yes... What I mean is that the one that I provided was just an example on how it would look like... 😅

The one implemented on https://github.com/pandas-dev/pandas/pull/54034/files is "general" enough to fit all users' use case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Needs Discussion Requires discussion from core team before further action Typing type annotations, mypy/pyright type checking
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants