Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implement choice of delimiter for seed files #1122

Merged
merged 12 commits into from
May 10, 2024

Conversation

salimmoulouel
Copy link
Contributor

@salimmoulouel salimmoulouel commented Feb 26, 2024

resolves #1119

Problem

CSV seed work only with ',' delimiter.

Solution

This PR solves the problems by adding parameter of delimiter to the macro which build the table from the csv file

Checklist

  • I have read the contributing guide and understand what's expected of me
  • I have run this code in development and it appears to resolve the stated issue
  • This PR includes tests, or tests are not required/relevant for this PR

Copy link

cla-bot bot commented Feb 26, 2024

Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: salim_moulouel.
This is most likely caused by a git client misconfiguration; please make sure to:

  1. check if your git client is configured with an email to sign commits git config --list | grep email
  2. If not, set it up using git config --global user.email [email protected]
  3. Make sure that the git commit email is configured in your GitHub account settings, see https://github.com/settings/emails

Copy link

cla-bot bot commented Feb 26, 2024

Thanks for your pull request, and welcome to our community! We require contributors to sign our Contributor License Agreement and we don't seem to have your signature on file. Check out this article for more information on why we have a CLA.

In order for us to review and merge your code, please submit the Individual Contributor License Agreement form attached above above. If you have questions about the CLA, or if you believe you've received this message in error, please reach out through a comment on this PR.

CLA has not been signed by users: @salimmoulouel

@dbeatty10
Copy link
Contributor

Thank you for opening this @salimmoulouel !

Please do the following:

  1. Sign our Contributor License Agreement. Check out this article for more detail.
  2. Add a CHANGELOG entry

@dbeatty10
Copy link
Contributor

Looking good so far @salimmoulouel 👍

A key thing we'll need is a functional test to make sure the change works.

Thankfully, we already have a relevant test defined here -- we just need to inherit and run it within dbt-bigquery!

Here's how to do it:

  1. Add the following import to the top of this file:
from dbt.tests.adapter.simple_seed.test_seed import BaseSeedWithUniqueDelimiter
  1. Add this new code right under here:
class TestBigQuerySeedWithUniqueDelimiter(BaseSeedWithUniqueDelimiter):
    pass

Here's instructions how to run the test locally to make sure it breaks before your change but works once it is included.

@salimmoulouel
Copy link
Contributor Author

Looking good so far @salimmoulouel 👍

A key thing we'll need is a functional test to make sure the change works.

Thankfully, we already have a relevant test defined here -- we just need to inherit and run it within dbt-bigquery!

Here's how to do it:

1. Add the following import to the top of [this file](https://github.com/dbt-labs/dbt-bigquery/blob/5501cd34b12965e0380952bdbb65fd52f49b49f5/tests/functional/adapter/test_simple_seed.py):
from dbt.tests.adapter.simple_seed.test_seed import BaseSeedWithUniqueDelimiter
2. Add this new code right under [here](https://github.com/dbt-labs/dbt-bigquery/blob/5501cd34b12965e0380952bdbb65fd52f49b49f5/tests/functional/adapter/test_simple_seed.py#L157-L158):
class TestBigQuerySeedWithUniqueDelimiter(BaseSeedWithUniqueDelimiter):
    pass

Here's instructions how to run the test locally to make sure it breaks before your change but works once it is included.

when i use

class TestBigQuerySeedWithUniqueDelimiter(BaseSeedWithUniqueDelimiter):
    pass

i get the following error

=========================================================================== ERRORS ============================================================================
____________________________________ ERROR at setup of TestBigQuerySeedWithUniqueDelimiter.test_seed_with_unique_delimiter ____________________________________

self = <dbt.adapters.bigquery.connections.BigQueryConnectionManager object at 0x10ad4a710>
sql = "\ncreate table test17090508078913404098_test_simple_seed.seed_expected (\nseed_id INTEGER,\nfirst_name TEXT,\nemail T...50','2007-09-14 13:03:34'),\n    (500,'Paula','[email protected]','123.27.47.249','2003-10-30 21:19:20');\n"

    @contextmanager
    def exception_handler(self, sql):
        try:
>           yield

/Users/carrefour/Desktop/work/forking/dbt-bigquery/dbt/adapters/bigquery/connections.py:247: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
/Users/carrefour/Desktop/work/forking/dbt-bigquery/dbt/adapters/bigquery/connections.py:755: in _retry_and_handle
    return retry.retry_target(
/opt/homebrew/lib/python3.11/site-packages/google/api_core/retry/retry_unary.py:153: in retry_target
    _retry_error_helper(
/opt/homebrew/lib/python3.11/site-packages/google/api_core/retry/retry_base.py:212: in _retry_error_helper
    raise final_exc from source_exc
/opt/homebrew/lib/python3.11/site-packages/google/api_core/retry/retry_unary.py:144: in retry_target
    result = target()
/Users/carrefour/Desktop/work/forking/dbt-bigquery/dbt/adapters/bigquery/connections.py:483: in fn
    return self._query_and_results(
/Users/carrefour/Desktop/work/forking/dbt-bigquery/dbt/adapters/bigquery/connections.py:741: in _query_and_results
    iterator = query_job.result(max_results=limit, timeout=job_execution_timeout)
/opt/homebrew/lib/python3.11/site-packages/google/cloud/bigquery/job/query.py:1595: in result
    do_get_result()
/opt/homebrew/lib/python3.11/site-packages/google/api_core/retry/retry_unary.py:293: in retry_wrapped_func
    return retry_target(
/opt/homebrew/lib/python3.11/site-packages/google/api_core/retry/retry_unary.py:153: in retry_target
    _retry_error_helper(
/opt/homebrew/lib/python3.11/site-packages/google/api_core/retry/retry_base.py:212: in _retry_error_helper
    raise final_exc from source_exc
/opt/homebrew/lib/python3.11/site-packages/google/api_core/retry/retry_unary.py:144: in retry_target
    result = target()
/opt/homebrew/lib/python3.11/site-packages/google/cloud/bigquery/job/query.py:1584: in do_get_result
    super(QueryJob, self).result(retry=retry, timeout=timeout)
/opt/homebrew/lib/python3.11/site-packages/google/cloud/bigquery/job/base.py:971: in result
    return super(_AsyncJob, self).result(timeout=timeout, **kwargs)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = QueryJob<project=vg1np-apps-bdi-dev-70, location=US, id=9585a460-e7ca-49f3-8fba-88bff45be442>, timeout = None, retry = None, polling = None

    def result(self, timeout=_DEFAULT_VALUE, retry=None, polling=None):
        """Get the result of the operation.
    
        This method will poll for operation status periodically, blocking if
        necessary. If you just want to make sure that this method does not block
        for more than X seconds and you do not care about the nitty-gritty of
        how this method operates, just call it with ``result(timeout=X)``. The
        other parameters are for advanced use only.
    
        Every call to this method is controlled by the following three
        parameters, each of which has a specific, distinct role, even though all three
        may look very similar: ``timeout``, ``retry`` and ``polling``. In most
        cases users do not need to specify any custom values for any of these
        parameters and may simply rely on default ones instead.
    
        If you choose to specify custom parameters, please make sure you've
        read the documentation below carefully.
    
        First, please check :class:`google.api_core.retry.Retry`
        class documentation for the proper definition of timeout and deadline
        terms and for the definition the three different types of timeouts.
        This class operates in terms of Retry Timeout and Polling Timeout. It
        does not let customizing RPC timeout and the user is expected to rely on
        default behavior for it.
    
        The roles of each argument of this method are as follows:
    
        ``timeout`` (int): (Optional) The Polling Timeout as defined in
        :class:`google.api_core.retry.Retry`. If the operation does not complete
        within this timeout an exception will be thrown. This parameter affects
        neither Retry Timeout nor RPC Timeout.
    
        ``retry`` (google.api_core.retry.Retry): (Optional) How to retry the
        polling RPC. The ``retry.timeout`` property of this parameter is the
        Retry Timeout as defined in :class:`google.api_core.retry.Retry`.
        This parameter defines ONLY how the polling RPC call is retried
        (i.e. what to do if the RPC we used for polling returned an error). It
        does NOT define how the polling is done (i.e. how frequently and for
        how long to call the polling RPC); use the ``polling`` parameter for that.
        If a polling RPC throws and error and retrying it fails, the whole
        future fails with the corresponding exception. If you want to tune which
        server response error codes are not fatal for operation polling, use this
        parameter to control that (``retry.predicate`` in particular).
    
        ``polling`` (google.api_core.retry.Retry): (Optional) How often and
        for how long to call the polling RPC periodically (i.e. what to do if
        a polling rpc returned successfully but its returned result indicates
        that the long running operation is not completed yet, so we need to
        check it again at some point in future). This parameter does NOT define
        how to retry each individual polling RPC in case of an error; use the
        ``retry`` parameter for that. The ``polling.timeout`` of this parameter
        is Polling Timeout as defined in as defined in
        :class:`google.api_core.retry.Retry`.
    
        For each of the arguments, there are also default values in place, which
        will be used if a user does not specify their own. The default values
        for the three parameters are not to be confused with the default values
        for the corresponding arguments in this method (those serve as "not set"
        markers for the resolution logic).
    
        If ``timeout`` is provided (i.e.``timeout is not _DEFAULT VALUE``; note
        the ``None`` value means "infinite timeout"), it will be used to control
        the actual Polling Timeout. Otherwise, the ``polling.timeout`` value
        will be used instead (see below for how the ``polling`` config itself
        gets resolved). In other words, this parameter  effectively overrides
        the ``polling.timeout`` value if specified. This is so to preserve
        backward compatibility.
    
        If ``retry`` is provided (i.e. ``retry is not None``) it will be used to
        control retry behavior for the polling RPC and the ``retry.timeout``
        will determine the Retry Timeout. If not provided, the
        polling RPC will be called with whichever default retry config was
        specified for the polling RPC at the moment of the construction of the
        polling RPC's client. For example, if the polling RPC is
        ``operations_client.get_operation()``, the ``retry`` parameter will be
        controlling its retry behavior (not polling  behavior) and, if not
        specified, that specific method (``operations_client.get_operation()``)
        will be retried according to the default retry config provided during
        creation of ``operations_client`` client instead. This argument exists
        mainly for backward compatibility; users are very unlikely to ever need
        to set this parameter explicitly.
    
        If ``polling`` is provided (i.e. ``polling is not None``), it will be used
        to control the overall polling behavior and ``polling.timeout`` will
        control Polling Timeout unless it is overridden by ``timeout`` parameter
        as described above. If not provided, the``polling`` parameter specified
        during construction of this future (the ``polling`` argument in the
        constructor) will be used instead. Note: since the ``timeout`` argument may
        override ``polling.timeout`` value, this parameter should be viewed as
        coupled with the ``timeout`` parameter as described above.
    
        Args:
            timeout (int): (Optional) How long (in seconds) to wait for the
                operation to complete. If None, wait indefinitely.
            retry (google.api_core.retry.Retry): (Optional) How to retry the
                polling RPC. This defines ONLY how the polling RPC call is
                retried (i.e. what to do if the RPC we used for polling returned
                an error). It does  NOT define how the polling is done (i.e. how
                frequently and for how long to call the polling RPC).
            polling (google.api_core.retry.Retry): (Optional) How often and
                for how long to call polling RPC periodically. This parameter
                does NOT define how to retry each individual polling RPC call
                (use the ``retry`` parameter for that).
    
        Returns:
            google.protobuf.Message: The Operation's result.
    
        Raises:
            google.api_core.GoogleAPICallError: If the operation errors or if
                the timeout is reached before the operation completes.
        """
    
        self._blocking_poll(timeout=timeout, retry=retry, polling=polling)
    
        if self._exception is not None:
            # pylint: disable=raising-bad-type
            # Pylint doesn't recognize that this is valid in this case.
>           raise self._exception
E           google.api_core.exceptions.BadRequest: 400 Syntax error: Expected ")" or "," but got identifier "WITHOUT" at [7:20]; reason: invalidQuery, location: query, message: Syntax error: Expected ")" or "," but got identifier "WITHOUT" at [7:20]
E           
E           Location: US
E           Job ID: 9585a460-e7ca-49f3-8fba-88bff45be442

/opt/homebrew/lib/python3.11/site-packages/google/api_core/future/polling.py:261: BadRequest

During handling of the above exception, another exception occurred:

self = <test_simple_seed.TestBigQuerySeedWithUniqueDelimiter object at 0x10ad12790>, project = <dbt.tests.fixtures.project.TestProjInfo object at 0x15fcc9990>

    @pytest.fixture(scope="class", autouse=True)
    def setUp(self, project):
        """Create table for ensuring seeds and models used in tests build correctly"""
        print("yiwen")
    
>       project.run_sql(seeds.seeds__expected_sql)

/opt/homebrew/lib/python3.11/site-packages/dbt/tests/adapter/simple_seed/test_seed.py:181: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
/opt/homebrew/lib/python3.11/site-packages/dbt/tests/fixtures/project.py:451: in run_sql
    return run_sql_with_adapter(self.adapter, sql, fetch=fetch)
/opt/homebrew/lib/python3.11/site-packages/dbt/tests/util.py:320: in run_sql_with_adapter
    return adapter.run_sql_for_tests(sql, fetch, conn)
/Users/carrefour/Desktop/work/forking/dbt-bigquery/dbt/adapters/bigquery/impl.py:889: in run_sql_for_tests
    _, res = self.execute(sql, fetch=do_fetch)
/opt/homebrew/lib/python3.11/site-packages/dbt/adapters/base/impl.py:345: in execute
    return self.connections.execute(sql=sql, auto_begin=auto_begin, fetch=fetch, limit=limit)
/Users/carrefour/Desktop/work/forking/dbt-bigquery/dbt/adapters/bigquery/connections.py:501: in execute
    query_job, iterator = self.raw_execute(sql, limit=limit)
/Users/carrefour/Desktop/work/forking/dbt-bigquery/dbt/adapters/bigquery/connections.py:492: in raw_execute
    query_job, iterator = self._retry_and_handle(msg=sql, conn=conn, fn=fn)
/Users/carrefour/Desktop/work/forking/dbt-bigquery/dbt/adapters/bigquery/connections.py:754: in _retry_and_handle
    with self.exception_handler(msg):
/opt/homebrew/Cellar/[email protected]/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/contextlib.py:158: in __exit__
    self.gen.throw(typ, value, traceback)
/Users/carrefour/Desktop/work/forking/dbt-bigquery/dbt/adapters/bigquery/connections.py:251: in exception_handler
    self.handle_error(e, message)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

cls = <class 'dbt.adapters.bigquery.connections.BigQueryConnectionManager'>
error = BadRequest('Syntax error: Expected ")" or "," but got identifier "WITHOUT" at [7:20]; reason: invalidQuery, location: query, message: Syntax error: Expected ")" or "," but got identifier "WITHOUT" at [7:20]')
message = 'Bad request while running query'

    @classmethod
    def handle_error(cls, error, message):
        error_msg = "\n".join([item["message"] for item in error.errors])
        if hasattr(error, "query_job"):
            logger.error(
                cls._bq_job_link(
                    error.query_job.location, error.query_job.project, error.query_job.job_id
                )
            )
>       raise DbtDatabaseError(error_msg)
E       dbt_common.exceptions.base.DbtDatabaseError: Database Error
E         Syntax error: Expected ")" or "," but got identifier "WITHOUT" at [7:20]

/Users/carrefour/Desktop/work/forking/dbt-bigquery/dbt/adapters/bigquery/connections.py:239: DbtDatabaseError
-------------------------------------------------------------------- Captured stdout setup --------------------------------------------------------------------

=== Test project_root: /private/var/folders/b0/b8zrlfqx5q7bc2fbgc9_16zr0000gp/T/pytest-of-carrefour/pytest-46/project0
yiwen
{"data": {"args": [], "base_msg": "https://console.cloud.google.com/bigquery?project=vg1np-apps-bdi-dev-70&j=bq:US:9585a460-e7ca-49f3-8fba-88bff45be442&page=queryresults", "exc_info": "", "name": "BigQuery"}, "info": {"category": "", "code": "E004", "extra": {}, "invocation_id": "5f37f5e4-8feb-4e3f-961f-03ff105cd19d", "level": "error", "msg": "BigQuery adapter: https://console.cloud.google.com/bigquery?project=vg1np-apps-bdi-dev-70&j=bq:US:9585a460-e7ca-49f3-8fba-88bff45be442&page=queryresults", "name": "AdapterEventError", "pid": 71280, "thread": "MainThread", "ts": "2024-02-27T16:20:14.131980Z"}}
====================================================================== warnings summary =======================================================================
tests/functional/adapter/test_simple_seed.py::TestBigQuerySeedWithUniqueDelimiter::test_seed_with_unique_delimiter
tests/functional/adapter/test_simple_seed.py::TestBigQuerySeedWithUniqueDelimiter::test_seed_with_unique_delimiter
  /opt/homebrew/lib/python3.11/site-packages/google/auth/_default.py:76: UserWarning: Your application has authenticated using end user credentials from Google Cloud SDK without a quota project. You might receive a "quota exceeded" or "API not enabled" error. See the following page for troubleshooting: https://cloud.google.com/docs/authentication/adc-troubleshooting/user-creds. 
    warnings.warn(_CLOUD_SDK_CREDENTIALS_WARNING)

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=================================================================== short test summary info ===================================================================
ERROR tests/functional/adapter/test_simple_seed.py::TestBigQuerySeedWithUniqueDelimiter::test_seed_with_unique_delimiter - dbt_common.exceptions.base.DbtDatabaseError: Database Error
================================================================ 2 warnings, 1 error in 8.00s =================================================================

to get arround this problem i inherited the TestSimpleSeedConfigs and i modified the delimiter of the csv used in the seed, i get an error before applying change and it disappears after the changes

@dbeatty10 dbeatty10 added the ready_for_review Externally contributed PR has functional approval, ready for code review from Core engineering label Feb 27, 2024
@salimmoulouel
Copy link
Contributor Author

@dbeatty10 i didn't get any news since my last commit, is the pull request okay ?

@salimmoulouel
Copy link
Contributor Author

Hope you're doing well! I wanted to chat about some merge requests I've submitted to dbt. It's been over Two months since I sent them in, and I haven't heard a peep back. Any chance you could give me the lowdown on what's up with them?

I get that running an open-source project can be hectic, and I totally respect the hustle. But it's a bit frustrating not knowing where my contributions stand. I've put some real effort into these requests and would love to see them get some love.

If there's anything I need to tweak or if my requests aren't quite hitting the mark, I'm all ears. Just looking for some clarity so I can get back in the game!

Thanks a bunch for your time, and looking forward to hearing from you soon.

@salimmoulouel salimmoulouel requested a review from a team as a code owner May 9, 2024 23:34
Copy link
Contributor

@colin-rogers-dbt colin-rogers-dbt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a quick tweak to the change log and this is good to merge

.changes/unreleased/Fixes-20240226-233024.yaml Outdated Show resolved Hide resolved
@colin-rogers-dbt colin-rogers-dbt enabled auto-merge (squash) May 9, 2024 23:37
@colin-rogers-dbt
Copy link
Contributor

@salimmoulouel Sorry for the delay LGTM, can you just update the code formatting? (just need to run pre-commit )

@colin-rogers-dbt colin-rogers-dbt merged commit 0d37ba6 into dbt-labs:main May 10, 2024
13 checks passed
@dbeatty10
Copy link
Contributor

@salimmoulouel Sorry for the delay LGTM, can you just update the code formatting? (just need to run pre-commit )

I went ahead and applied the formatting with the following, and it auto-merged once it passed all the CI checks 👍 ✅

gh pr checkout 1122
pre-commit run --all-files --show-diff-on-failure
git status .
git add dbt/adapters/bigquery/impl.py
git commit -m 'Apply formatting from pre-commit'
git push

@colin-rogers-dbt colin-rogers-dbt added the community This PR is from a community member label Jun 10, 2024
@owenprough-sift
Copy link

owenprough-sift commented Aug 23, 2024

@dbeatty10 I see this code in main (because it merged on May 10), but don't see its code changes/release notes in any release published since May 10. When will this PR's fix be released?

At least one person is awaiting it: https://getdbt.slack.com/archives/CBSQTAPLG/p1724405788161219

colin-rogers-dbt pushed a commit that referenced this pull request Oct 21, 2024
* implement choice of delimiter for seed files

* adding change log entry

* implementation of test of TestBigQuerySeedWithUniqueDelimiter

* Update dbt/adapters/bigquery/impl.py

Co-authored-by: Doug Beatty <[email protected]>

* Update dbt/include/bigquery/macros/materializations/seed.sql

Co-authored-by: Doug Beatty <[email protected]>

* Update dbt/include/bigquery/macros/materializations/seed.sql

Co-authored-by: Doug Beatty <[email protected]>

* Update dbt/adapters/bigquery/impl.py

Co-authored-by: Doug Beatty <[email protected]>

* Update .changes/unreleased/Fixes-20240226-233024.yaml

---------

Co-authored-by: Doug Beatty <[email protected]>
Co-authored-by: Colin Rogers <[email protected]>
(cherry picked from commit 0d37ba6)
colin-rogers-dbt added a commit that referenced this pull request Oct 22, 2024
* implement choice of delimiter for seed files

* adding change log entry

* implementation of test of TestBigQuerySeedWithUniqueDelimiter

* Update dbt/adapters/bigquery/impl.py

Co-authored-by: Doug Beatty <[email protected]>

* Update dbt/include/bigquery/macros/materializations/seed.sql

Co-authored-by: Doug Beatty <[email protected]>

* Update dbt/include/bigquery/macros/materializations/seed.sql

Co-authored-by: Doug Beatty <[email protected]>

* Update dbt/adapters/bigquery/impl.py

Co-authored-by: Doug Beatty <[email protected]>

* Update .changes/unreleased/Fixes-20240226-233024.yaml

---------

Co-authored-by: Doug Beatty <[email protected]>
Co-authored-by: Colin Rogers <[email protected]>
(cherry picked from commit 0d37ba6)

Co-authored-by: salimmoulouel <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 1.8.latest cla:yes community This PR is from a community member ready_for_review Externally contributed PR has functional approval, ready for code review from Core engineering
Projects
None yet
Development

Successfully merging this pull request may close these issues.

delimiters in properties.yml doesn't work correctly
6 participants