Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

persistent issue with the fivetran_connector_schema_config issue #310

Open
jakerohald opened this issue May 15, 2024 · 23 comments
Open

persistent issue with the fivetran_connector_schema_config issue #310

jakerohald opened this issue May 15, 2024 · 23 comments
Assignees
Labels
bug Something isn't working question Further information is requested

Comments

@jakerohald
Copy link

I'm currently addressing the challenges posed by the fivetran_terraform_config_schema updates from the 1.1.18 release. Our company manages roughly 700 connections, including numerous database shards, which makes the current fix considerably impractical for our scale.

The recent discussions around the new release noted that the fix was successfully applied to just six connectors. However, with many connections, each hosting several schemas (typically around five) each featuring a substantial number of nested column fields (between 10-20), this approach is not viable as it currently is.

@jakerohald jakerohald added the bug Something isn't working label May 15, 2024
@beevital
Copy link
Collaborator

@jakerohald have you tried schemas_json field with file approach?

@beevital
Copy link
Collaborator

Or you're talking about overall complexity of settings format?
It's tied to current API contract - so we have a lot of restrictions.
What exact problems are you facing now, could you please share an example of impractical case?

@beevital beevital added question Further information is requested and removed bug Something isn't working labels May 15, 2024
@beevital beevital self-assigned this May 15, 2024
@beevital
Copy link
Collaborator

We are currently thinking about enhancements of the existing approach. Do you have desired proposal of how it would be better to define schema settings on large connectors (with several huge schemas)?.

@jakerohald
Copy link
Author

Hey @beevital - I have tried the schemas_config.

Happy to share what I mean by 'impratical case'

When I say impractical I mean the following - we have multiple db's, each of which are sharded and so we split the connector. Taking one example - one of our sql server databases has 8 shards, which itself contains 6 schemas, which contain 8, 6, 1, 1, 1 and 12 tables each respectively, and each table contains between 6 and 20 fields each. This doesnt seem to scale (reflected by the fact that the fivetran_connector_schema_config state refresh takes an impratical amount of time for a single schema).

Thanks.

@beevital
Copy link
Collaborator

beevital commented May 16, 2024

fivetran_connector_schema_config state refresh takes an impratical amount of time for a single schema

Could you share some logs so I'll be able to investigate this on API side?

It should not reloas schema on each refresh, and should work pretty fast after the first apply

@beevital
Copy link
Collaborator

specificly - I'm interested in connection_id (connector_id)

@jakerohald
Copy link
Author

The issue mentioned above has been resolved. The problem was caused by the schema configuration in the state file, which was deployed using the old schema method. To fix it, the old schema configuration needed to be removed and then reapplied, as using the 'refresh' with the old method would cause it to hang. It is worth mentioning in the documentation that if resources were deployed using the old method, they should be destroyed first before reapplying with the new method.

However, we have discovered another issue that is worth mentioning here. Although the example uses the GitHub connector, it applies to other connectors we tested as well.

For instance, we write the output to a JSON file for the GitHub connector as follows (note: fields used for simplicity in this demonstration have been removed):

{
"github": {
"enabled": true,
"tables": {
"deployment": {
"columns": {
"createdAt": {
"enabled": true
},
"description": {
"enabled": true
}
},
"enabled": true,
"history_mode": false
}
}
}
}

We then deploy our resource using the 'BLOCK_ALL' schema change handling and it produces the following apply plan:

 ~ id                     = "ihe_theorists" -> (known after apply)
  ~ schemas_json           = jsonencode(
      ~ {
          ~ github = {
              ~ enabled = "true" -> true (note we didnt deploy ever using "true" vs true - indicating a potential parsing issue?)
              ~ tables  = {
                  ~ deployment = {
                      + columns      = {
                          + createdAt   = {
                              + enabled = true
                            }
                          + description = {
                              + enabled = true
                            }
                        }
                      ~ enabled      = "true" -> true
                      + history_mode = false
                    }
                }
            }
        }
    )
    # (2 unchanged attributes hidden)

    # (1 unchanged block hidden)
}

--
When i actually apply the change, the error we get is:

│ When applying changes to fivetran_connector_schema_config.contract_schema["github"], provider "provider["[registry.terraform.io/fivetran/fivetran](http://registry.terraform.io/fivetran/fivetran%5C)"]"
│ produced an unexpected new value: .schemas_json: was cty.StringVal("{\n "github": {\n "enabled": true,\n "tables": {\n
│ "deployment": {\n "enabled": true,\n "history_mode": false,\n "columns": {\n "createdAt": {\n "enabled":
│ true\n },\n "description": {\n "enabled": true\n }\n }\n }\n }\n }\n}"), but now
│ cty.StringVal("{"github":{"enabled":"true","tables":{"deployment":{"enabled":"true"}}}}")

We are running the following terraform versions:

Terraform v1.8.2
on darwin_arm64

@beevital
Copy link
Collaborator

Yeah, understood - github connector returns columns as well, but the original config doesn't contain it.
Have you tried field schemas - that is map-based? It should not produce such issues as schemas_json, but it a bit slower.

I'll try to figure out how could we manage schemas_json better.

@beevital beevital added the bug Something isn't working label May 21, 2024
@jakerohald
Copy link
Author

jakerohald commented May 21, 2024

@beevital what do you mean by github connector returns columns as well, but the original config doesn't contain it - how does that relate to the new issue outlined above?

@beevital
Copy link
Collaborator

@jakerohald it's just my thoughts about what is happening. I know how it works on API side and inside provider.
API returns response that contains additional elements and there's a difference between expected value and actually returned. (one contains columns as well)

Also - I'm wondering how you passed "history_mode": false ?
this field will be just ignored by API. You need to use "sync_mode": "HISTORY" instead.
please refer to API docs (as schemas_json field expect json compatible with API payload)

@beevital
Copy link
Collaborator

image https://fivetran.com/docs/rest-api/connectors#payloadparameters_6

@jakerohald
Copy link
Author

As you said this wont affect the deployment, but we went ahead and fixed this up.

We have closely identified the source of the issue when we deploy the resource

{
"github_XXX": {
"enabled": true,
"sync_mode": "HISTORY",
"tables": {
"deployment": {
"enabled": true,
"columns": {
"createdAt": {
"enabled": true/false # this line breaks it. Leaving this line blank lets the resource be successfully deployed - although leaving it blank doesnt do anything for the column itself (no default behaviour)
}
}
},
"team": {
"enabled": true
}
}
}
}

Happy to hear your subsequent thoughts.

@beevital
Copy link
Collaborator

As I said - it's a bug: so I have to update the way how we compare value of schemas_json we store in state with the one we receive in API response after applying.

@jakerohald
Copy link
Author

Hi @beevital , any update on the timeline for a possible fix? Thanks very much :)

@beevital
Copy link
Collaborator

@jakerohald I'm not sure if the issues you're facing were actually fixed in v1.2.0 - could you please try it and report about provider behaviour.

@jakerohald
Copy link
Author

Sure. @beevital it works with the old schema config block, but same error as above for schemas and schemas_json

@beevital
Copy link
Collaborator

Okay, will take a look deeper into it.

@jakerohald
Copy link
Author

Hi @beevital , just wanted to check if and see if any thought has been given to this. I just tested this with BLOCK_ALL for provider version 1.2.6 and no fix is in - same problem.

Was wondering if this can be looked at again as it would help unblock all resources created by BLOCK_ALL.

@beevital
Copy link
Collaborator

Nope, unfortunately I had no change to dedicate any time on it =(
We will try to prioritise this issue and investigate it in nearest time

@jakerohald
Copy link
Author

Thanks very much!

@jakerohald
Copy link
Author

@fivetran-jovanmanojlovic can I clarify if the intended fix for this was in version 1.2.8?

If so, I have tested it and it still produces some problems in apply, happy to share logs if thats the case.

@jakerohald
Copy link
Author

@beevital is there any update re the comment above? Thanks!

@peyyero
Copy link

peyyero commented Sep 12, 2024

Hi @fivetran-jovanmanojlovic,

I have noticed that this matter has been assigned to you. I am inquiring about the progress made in resolving this issue. My colleague Jake has been in contact with @beevital regarding this matter but would like clarification on its priority.

This specific matter has been causing significant challenges for us, and I am hopeful that it can be prioritised accordingly.

I am available to offer further information if needed.

Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants