Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AWS Permissions required to use awswrangler.athena.to_iceberg function #3042

Open
git4suman opened this issue Dec 12, 2024 · 1 comment
Open
Labels
question Further information is requested

Comments

@git4suman
Copy link

I am using awswrangler.athena.to_iceberg to write data from a pandas dataframe to iceberg table. And this is being run from a lambda function. My question is what permissions would be required to perform this operation,so that I can add those to the lambda execution role. I beleive we should have glue and target S3 bucket access. Can anyone suggest the exact glue action and s3 action required to be added in aws policy statement. Do we have to add any athena related policies/permissions?

Code Snippet:

import awswrangler as wr

wr.athena.to_iceberg(df=df,database=GLUE_ICEBERG_DB,table=glue_table_name,table_location=OUTPUT_FILE_PATH,temp_path=TEMP_PATH,keep_files=False,index=False,schema_evolution=True,fill_missing_columns_in_df=True,partition_cols=['dt','ts'],additional_table_properties={'write_target_data_file_size_bytes':'536870912','write_compression':'SNAPPY'}
)

@git4suman git4suman added the question Further information is requested label Dec 12, 2024
@jaidisido
Copy link
Contributor

jaidisido commented Dec 12, 2024

Don't have a specific IAM policy to share, but my suggestion would be to design it based on the actions that are carried out in the API call. You can consult the awswrangler/athena/_write_iceberg.py module to identify them and scope them to your resources. For instance, you can see that an Athena StartQueryExecution action is also needed from the code.

Alternatively, you can check the AWS CloudTrail logs once you run the lambda to identify all the API calls that were made and build your policy based on that information.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants