Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Source Amazon Seller Partner: Connector Failed Error due to JSON Schema #362

Open
andrzejdackiewicz opened this issue Sep 16, 2024 · 1 comment

Comments

@andrzejdackiewicz
Copy link

Connector


source-amazon-seller-partner

Issue


I am using PyAirbyte for executing migration from the Amazon API to BigQuery. Here is the code I am running:
`def fetch_pyairbyte_data(aws_environment, region, account_type, app_id, client_secret, refresh_token, project_name, dataset_name, stream):
import airbyte as ab
from airbyte.caches.bigquery import BigQueryCache

source = ab.get_source(
    "source-amazon-seller-partner",
    config=
    {
        "aws_environment": aws_environment,
        "region": region,
        "account_type": account_type,
        "lwa_app_id": app_id,
        "lwa_client_secret": client_secret,
        "refresh_token": refresh_token,
        "replication_start_date": "2024-09-08T00:00:00Z",
        "report_options_list":
        [
          {
            "report_name": "GET_VENDOR_SALES_REPORT",
            "stream_name": "GET_VENDOR_SALES_REPORT",
            "options_list": [
              {
                "option_name": "reportPeriod",
                "option_value": "DAY"
              },
              {
                "option_name": "distributorView",
                "option_value": "SOURCING"
              },
              {
                "option_name": "sellingProgram",
                "option_value": "RETAIL"
              }
            ]
          },
        ]
    },
    install_if_missing=True,
)

cache = BigQueryCache(
    project_name=project_name,
    dataset_name=dataset_name,
)

source.select_streams(stream)

result = source.read(cache=cache)`

During the run I can see that I am able to fetch some data from source but there is an error in JSON parsing:

[2024-09-13, 04:37:48 UTC] {process_utils.py:190} INFO - google.api_core.exceptions.BadRequest: 400 Error while reading data, error message: JSON table encountered too many errors, giving up. Rows: 1; errors: 1. Please look into the errors[] collection for more details.; reason: invalid, message: Error while reading data, error message: JSON table encountered too many errors, giving up. Rows: 1; errors: 1. Please look into the errors[] collection for more details.; reason: invalid, message: Error while reading data, error message: JSON processing encountered too many errors, giving up. Rows: 1; errors: 1; max bad: 0; error percent: 0; reason: invalid, message: Error while reading data, error message: JSON parsing error in row starting at position 0: Couldn't convert value to timestamp: Could not parse '2024-09-09' as a timestamp. Required format is YYYY-MM-DD HH:MM[:SS[.SSSSSS]] or YYYY/MM/DD HH:MM[:SS[.SSSSSS]] Field: startdate; Value: 2024-09-09

I thought that perhaps there is an error in the Airbyte source connector in the schema. Instead of date-time format there should be date and I am willing to make a fix for that, but wanted to have confidence that would fix the issue.
image

I ran the same pipeline locally on Docker deployment of Airbyte and the migration is successful. So the issue as i see it is on Airbyte, but is visible only if you use PyAirbyte.

@andrzejdackiewicz
Copy link
Author

The 4.4.1 version of Amazon Sales Partner connector is being used. The same source connector version was used in Docker Airbyte migration and it succeeded in writing the data to BigQuery.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant