Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot install on Databricks 11.3 LTS (Spark 3.3.0 #480

Open
lotsahelp opened this issue Jan 9, 2023 · 2 comments
Open

Cannot install on Databricks 11.3 LTS (Spark 3.3.0 #480

lotsahelp opened this issue Jan 9, 2023 · 2 comments

Comments

@lotsahelp
Copy link

I'm trying the azure-cosmos-spark_3-3_2-12 (v4.15.0) connector from maven and it never finishes installing via maven. I have tried downloading the jar from maven and installing manually. It takes a few minutes to upload / install, but I'm left with the below message each time I try to call Cosmos. Changing back to Databricks 10.4 LTS w/ the 3-2_2-12 connector works fine.

---------------------------------------------------------------------------
AnalysisException                         Traceback (most recent call last)
<command-725919012830118> in <cell line: 2>()
      1 ##CREATE Container and Database
----> 2 spark.sql(f'CREATE DATABASE IF NOT EXISTS cosmosCatalog.{cosmosDatabaseName};')
      3 
      4 spark.sql(
      5     f"""

/databricks/spark/python/pyspark/instrumentation_utils.py in wrapper(*args, **kwargs)
     46             start = time.perf_counter()
     47             try:
---> 48                 res = func(*args, **kwargs)
     49                 logger.log_success(
     50                     module_name, class_name, function_name, time.perf_counter() - start, signature

/databricks/spark/python/pyspark/sql/session.py in sql(self, sqlQuery, **kwargs)
   1117             sqlQuery = formatter.format(sqlQuery, **kwargs)
   1118         try:
-> 1119             return DataFrame(self._jsparkSession.sql(sqlQuery), self)
   1120         finally:
   1121             if len(kwargs) > 0:

/databricks/spark/python/lib/py4j-0.10.9.5-src.zip/py4j/java_gateway.py in __call__(self, *args)
   1319 
   1320         answer = self.gateway_client.send_command(command)
-> 1321         return_value = get_return_value(
   1322             answer, self.gateway_client, self.target_id, self.name)
   1323 

/databricks/spark/python/pyspark/sql/utils.py in deco(*a, **kw)
    200                 # Hide where the exception came from that shows a non-Pythonic
    201                 # JVM exception message.
--> 202                 raise converted from None
    203             else:
    204                 raise

AnalysisException: Catalog 'cosmoscatalog' not found
@FabianMeiswinkel
Copy link
Member

FabianMeiswinkel commented Jan 9, 2023

The error "Catalog 'cosmoscatalog' not found" indicates that the Spark Catalog with identifier "cosmoscatalog" has not been configured.

It would be done by adding the following entries to the Spark config
spark.conf.set("spark.sql.catalog.cosmosCatalog", "com.azure.cosmos.spark.CosmosCatalog")
spark.conf.set("spark.sql.catalog.cosmosCatalog.spark.cosmos.accountEndpoint", "")
spark.conf.set("spark.sql.catalog.cosmosCatalog.spark.cosmos.accountKey", "")

From the behavior you describe it might be possible that the Spark 3.2 cluster has these spark config settings defined in the cluster config - so would be applied at start-up - while the Spark 3.3 doesn't have those settings?

Thanks,
Fabian

@lotsahelp
Copy link
Author

@FabianMeiswinkel those three lines are in the cell above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants