feat: implement InMemoryCatalog as a subclass of SqlCatalog #1140

hussein-awala · 2024-09-05T22:37:53Z

closes: #1110

This PR implement a new catalog InMemoryCatalog as a subclass of SqlCatalog with SQLite in-memory.

tests/catalog/test_base.py

hussein-awala · 2024-09-06T19:43:10Z

@kevinjqliu I applied what you suggested in the comment above, could you recheck it now?

kevinjqliu

added some nit comments. Thanks for working on this! The in-memory catalog is used at a bunch of places in the tests, so changing it has cascading effects

kevinjqliu · 2024-09-06T22:53:35Z

pyiceberg/catalog/memory.py

+    This is useful for test, demo, and playground but not in production as data is not persisted.
+    """
+
+    def __init__(self, name: str, warehouse: str = "file:///tmp/warehouse", **kwargs: str) -> None:


nit: let's use something like /tmp/iceberg/warehouse to not conflict with other tmp directories. Also I'm not sure if this works when the warehouse directory is not created yet.

kevinjqliu · 2024-09-06T23:09:42Z

tests/catalog/test_memory.py

nit: i'd like to keep this as test_base because I want to parameterize all tests to make sure all the catalogs have the same behaviors (see #813)

kevinjqliu · 2024-09-06T23:13:02Z

tests/catalog/test_memory.py

+DROP_NOT_EXISTING_NAMESPACE_ERROR = "Namespace does not exist: \\('com', 'organization', 'department'\\)"
+NO_SUCH_NAMESPACE_ERROR = "Namespace com.organization.department does not exists"


nit, merge these two

This should be done in a separate PR

tests/catalog/test_memory.py

kevinjqliu · 2024-09-07T19:24:44Z

tests/cli/test_console.py

@@ -806,7 +831,7 @@ def test_json_properties_get_table_does_not_exist(catalog: InMemoryCatalog) -> N
    runner = CliRunner()
    result = runner.invoke(run, ["--output=json", "properties", "get", "table", "doesnotexist"])
    assert result.exit_code == 1
-    assert result.output == """{"type": "NoSuchTableError", "message": "Table does not exist: ('doesnotexist',)"}\n"""
+    assert result.output == """{"type": "ValueError", "message": "Empty namespace identifier"}\n"""


nit: shouldnt this be NoSuchTableError? maybe the namespace needs to be created first

kevinjqliu

added some nit comments. Thanks for working on this! The in-memory catalog is used at a bunch of places in the tests, so changing it has cascading effects

kevinjqliu · 2024-09-07T19:27:47Z

@Fokko wydt of this change? i remember we had past discussions on adding a "new" catalog implementation

Fokko · 2024-10-28T14:35:50Z

@hussein-awala Thanks for working on this 🚀 @kevinjqliu Regarding the new catalogs, my main concern was a proliferation of new catalogs, and that they would lack maintenance. I do like this change for two reasons:

It moves out of the InMemoryCatalog that's specific to tests. We want to have the catalog as part of the tests, otherwise we're testing a catalog that's not part of the normal code-path.
It merges the InMemory catalog into the SqlCatalog. This way, when new features are released, such as support for views, multi-table transactions, etc. we have fewer places where we need to implement them.

I'm positive about this change. The only consideration I could make is that we hide the SqlCatalog behind the InMemoryCatalog. Maybe it is interesting for folks to know that they can easily switch to a persistent catalog. What are your thoughts?

kevinjqliu · 2024-10-28T17:03:05Z

I'm positive about this change. The only consideration I could make is that we hide the SqlCatalog behind the InMemoryCatalog. Maybe it is interesting for folks to know that they can easily switch to a persistent catalog. What are your thoughts?

I think it would be good to document the InMemoryCatalog, perhaps in the catalog section of the configuration page.
We can mention that it uses the SqlCatalog under the hood and to use another catalog implementation to persist the catalog metadata

kevinjqliu · 2025-02-01T00:11:56Z

hey @hussein-awala would you like to make the above changes on docs? This PR is almost ready!

hussein-awala · 2025-02-07T20:45:42Z

hey @hussein-awala would you like to make the above changes on docs? This PR is almost ready!

yes, I will make it ready ASAP

kevinjqliu

Thanks for the PR! And the great docs.

I like that we can replace the old implementation in test but I'm on the fence about whether we should expose/advertise this as a catalog type. It is useful for certain situations and for testing, but im not sure how much value there is to allow users to do

load_catalog()

and

catalog:
  default:
    type: in-memory
    warehouse: /tmp/pyiceberg/warehouse

WDYT @Fokko ?

Fokko · 2025-02-10T10:27:07Z

Or just:

catalog = load_catalog('default', 'type'='in-memory', 'warehouse'='/tmp/pyiceberg/warehouse')

I agree that this catalog impl is mostly focussed on testing/demonstration. If you would use a Jupyter notebook, each time you restart the kernel, then you end up with a fresh catalog (don't have to clean up any old stuff lingering around).

kevinjqliu · 2025-02-10T17:18:28Z

Thanks for the contribution @hussein-awala and thanks for the review @Fokko

CI is failing on main branch https://github.com/apache/iceberg-python/actions/runs/13247077313/job/36976096982 Caused by merge conflict after #1140. `InMemoryCatalog` now does not automatically create the namespace.

hussein-awala mentioned this pull request Sep 5, 2024

Remove InMemoryCatalog from the test-codebase #1110

Closed

kevinjqliu reviewed Sep 6, 2024

View reviewed changes

tests/catalog/test_base.py Outdated Show resolved Hide resolved

tests/catalog/test_base.py Outdated Show resolved Hide resolved

hussein-awala force-pushed the remove_InMemoryCatalog branch from 21d6d2f to aa6efc6 Compare September 6, 2024 19:35

hussein-awala changed the title ~~replace InMemoryCatalog by a subclass of SqlCatalog with SQLite in-memory~~ feat: implement InMemoryCatalog as a subclass of SqlCatalog Sep 6, 2024

hussein-awala force-pushed the remove_InMemoryCatalog branch from aa6efc6 to 6089ca2 Compare September 6, 2024 19:37

feat: implement InMemoryCatalog as a subclass of SqlCatalog

b6af81a

hussein-awala force-pushed the remove_InMemoryCatalog branch from 6089ca2 to b6af81a Compare September 6, 2024 19:38

kevinjqliu reviewed Sep 7, 2024

View reviewed changes

kevinjqliu mentioned this pull request Sep 11, 2024

[BUG] Catalog.list_tables() inconsistency between docstring and signature #1163

Closed

kevinjqliu added this to the PyIceberg 0.9.0 release milestone Oct 30, 2024

hussein-awala added 2 commits February 7, 2025 21:50

Merge branch 'main' into remove_InMemoryCatalog

cf2dbfa

apply suggestions

7f274a0

hussein-awala force-pushed the remove_InMemoryCatalog branch from fa2321e to 7f274a0 Compare February 7, 2025 21:26

kevinjqliu reviewed Feb 8, 2025

View reviewed changes

Fokko approved these changes Feb 10, 2025

View reviewed changes

kevinjqliu merged commit 17e9110 into apache:main Feb 10, 2025
8 checks passed

kevinjqliu mentioned this pull request Feb 10, 2025

Fix ci #1638

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: implement InMemoryCatalog as a subclass of SqlCatalog #1140

feat: implement InMemoryCatalog as a subclass of SqlCatalog #1140

hussein-awala commented Sep 5, 2024 •

edited

Loading

hussein-awala commented Sep 6, 2024

kevinjqliu left a comment

kevinjqliu Sep 6, 2024

kevinjqliu Sep 6, 2024

kevinjqliu Sep 6, 2024

hussein-awala Feb 7, 2025

kevinjqliu Sep 7, 2024

kevinjqliu left a comment

kevinjqliu commented Sep 7, 2024

Fokko commented Oct 28, 2024

kevinjqliu commented Oct 28, 2024

kevinjqliu commented Feb 1, 2025

hussein-awala commented Feb 7, 2025

kevinjqliu left a comment •

edited

Loading

Fokko commented Feb 10, 2025

kevinjqliu commented Feb 10, 2025

		DROP_NOT_EXISTING_NAMESPACE_ERROR = "Namespace does not exist: \\('com', 'organization', 'department'\\)"
		NO_SUCH_NAMESPACE_ERROR = "Namespace com.organization.department does not exists"

feat: implement InMemoryCatalog as a subclass of SqlCatalog #1140

feat: implement InMemoryCatalog as a subclass of SqlCatalog #1140

Conversation

hussein-awala commented Sep 5, 2024 • edited Loading

hussein-awala commented Sep 6, 2024

kevinjqliu left a comment

Choose a reason for hiding this comment

kevinjqliu Sep 6, 2024

Choose a reason for hiding this comment

kevinjqliu Sep 6, 2024

Choose a reason for hiding this comment

kevinjqliu Sep 6, 2024

Choose a reason for hiding this comment

hussein-awala Feb 7, 2025

Choose a reason for hiding this comment

kevinjqliu Sep 7, 2024

Choose a reason for hiding this comment

kevinjqliu left a comment

Choose a reason for hiding this comment

kevinjqliu commented Sep 7, 2024

Fokko commented Oct 28, 2024

kevinjqliu commented Oct 28, 2024

kevinjqliu commented Feb 1, 2025

hussein-awala commented Feb 7, 2025

kevinjqliu left a comment • edited Loading

Choose a reason for hiding this comment

Fokko commented Feb 10, 2025

kevinjqliu commented Feb 10, 2025

hussein-awala commented Sep 5, 2024 •

edited

Loading

kevinjqliu left a comment •

edited

Loading