-
Notifications
You must be signed in to change notification settings - Fork 227
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: implement InMemoryCatalog as a subclass of SqlCatalog #1140
Conversation
21d6d2f
to
aa6efc6
Compare
aa6efc6
to
6089ca2
Compare
6089ca2
to
b6af81a
Compare
@kevinjqliu I applied what you suggested in the comment above, could you recheck it now? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added some nit comments. Thanks for working on this! The in-memory catalog is used at a bunch of places in the tests, so changing it has cascading effects
pyiceberg/catalog/memory.py
Outdated
This is useful for test, demo, and playground but not in production as data is not persisted. | ||
""" | ||
|
||
def __init__(self, name: str, warehouse: str = "file:///tmp/warehouse", **kwargs: str) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: let's use something like /tmp/iceberg/warehouse
to not conflict with other tmp directories. Also I'm not sure if this works when the warehouse
directory is not created yet.
tests/catalog/test_memory.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: i'd like to keep this as test_base
because I want to parameterize all tests to make sure all the catalogs have the same behaviors (see #813)
tests/catalog/test_memory.py
Outdated
DROP_NOT_EXISTING_NAMESPACE_ERROR = "Namespace does not exist: \\('com', 'organization', 'department'\\)" | ||
NO_SUCH_NAMESPACE_ERROR = "Namespace com.organization.department does not exists" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit, merge these two
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be done in a separate PR
tests/cli/test_console.py
Outdated
@@ -806,7 +831,7 @@ def test_json_properties_get_table_does_not_exist(catalog: InMemoryCatalog) -> N | |||
runner = CliRunner() | |||
result = runner.invoke(run, ["--output=json", "properties", "get", "table", "doesnotexist"]) | |||
assert result.exit_code == 1 | |||
assert result.output == """{"type": "NoSuchTableError", "message": "Table does not exist: ('doesnotexist',)"}\n""" | |||
assert result.output == """{"type": "ValueError", "message": "Empty namespace identifier"}\n""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: shouldnt this be NoSuchTableError
? maybe the namespace needs to be created first
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added some nit comments. Thanks for working on this! The in-memory catalog is used at a bunch of places in the tests, so changing it has cascading effects
@Fokko wydt of this change? i remember we had past discussions on adding a "new" catalog implementation |
@hussein-awala Thanks for working on this 🚀 @kevinjqliu Regarding the new catalogs, my main concern was a proliferation of new catalogs, and that they would lack maintenance. I do like this change for two reasons:
I'm positive about this change. The only consideration I could make is that we hide the |
I think it would be good to document the InMemoryCatalog, perhaps in the catalog section of the configuration page. |
hey @hussein-awala would you like to make the above changes on docs? This PR is almost ready! |
yes, I will make it ready ASAP |
fa2321e
to
7f274a0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! And the great docs.
I like that we can replace the old implementation in test but I'm on the fence about whether we should expose/advertise this as a catalog type. It is useful for certain situations and for testing, but im not sure how much value there is to allow users to do
load_catalog()
and
catalog:
default:
type: in-memory
warehouse: /tmp/pyiceberg/warehouse
WDYT @Fokko ?
Or just: catalog = load_catalog('default', 'type'='in-memory', 'warehouse'='/tmp/pyiceberg/warehouse') I agree that this catalog impl is mostly focussed on testing/demonstration. If you would use a Jupyter notebook, each time you restart the kernel, then you end up with a fresh catalog (don't have to clean up any old stuff lingering around). |
Thanks for the contribution @hussein-awala and thanks for the review @Fokko |
closes: #1110
This PR implement a new catalog
InMemoryCatalog
as a subclass ofSqlCatalog
with SQLite in-memory.