Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenSearch indexing performance issue with huge dataframe #3048

Open
guanlinzhang-db opened this issue Dec 22, 2024 · 1 comment
Open

OpenSearch indexing performance issue with huge dataframe #3048

guanlinzhang-db opened this issue Dec 22, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@guanlinzhang-db
Copy link

Describe the bug

Lets say we have a 20w+ rows of dataframe, and I use below methods to ingest to opensearch, the code will just stucked in there for nothing.

%%time
wr.opensearch.index_df(client, df=df, index="guanlinz_index",use_threads=False, id_keys=["Name"], bulk_size=200, enable_refresh_interval=False)

How to Reproduce

Generate a hugh dataframe

import pandas as pd
import numpy as np

# Generate a pandas DataFrame with 200,000 rows
num_rows = 200000
data = {
    "id": range(1, num_rows + 1),
    "value": np.random.random(size=num_rows),
    "category": np.random.choice(['A', 'B', 'C'], size=num_rows),
    "timestamp": pd.date_range(start="2022-01-01", periods=num_rows, freq="S"),
}

df = pd.DataFrame(data)

# Display the first few rows of the DataFrame to confirm
df.head()

Trying to write to an index

%%time
wr.opensearch.index_df(client, df=df, index="my_index",use_threads=False, id_keys=["Name"], bulk_size=200, enable_refresh_interval=False)

Expected behavior

No response

Your project

No response

Screenshots

No response

OS

Amazon Linux

Python version

3.11

AWS SDK for pandas version

3.10.1

Additional context

No response

@guanlinzhang-db guanlinzhang-db added the bug Something isn't working label Dec 22, 2024
@guanlinzhang-db
Copy link
Author

Close this since I was using modin dataframe, convert to pandas dataframe solve the issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant