Upgrade this lib to be compatible with Spark Connect / DB Connect #255

MrPowers · 2024-03-06T13:17:51Z

Expected Behavior

This library works the same with Spark Connect.

Current Behavior

This library uses sparkSession.sparkContext which doesn't work with Spark Connect, here is an example:

dbldatagen/dbldatagen/data_generator.py

Line 251 in debb29f

if sparkSession.sparkContext is not None:

. This actually might work cause the exception would be caught, but you get the idea.

Steps to Reproduce (for bugs)

Run the test suite with Spark Connect enabled and fix all issues.

The text was updated successfully, but these errors were encountered:

ronanstokes-db · 2024-03-15T18:14:15Z

We recently released an update to deal with situations where the spark context is not available to query things like default parallelism. This should address this

In general, the way to safeguard against this is to explicitly specify the number of partitions requested when generating the specification for your dataset. This will avoid the query against the sparkContext.

While we have not tested against Spark Connect, we have tested against other environments where there is no sparkContext available

ronanstokes-db self-assigned this Mar 15, 2024

ronanstokes-db changed the title ~~Upgrade this lib to be compatible with Spark Connect~~ Upgrade this lib to be compatible with Spark Connect / DB Connect Jul 16, 2024

ronanstokes-db added the P1 label Jul 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Upgrade this lib to be compatible with Spark Connect / DB Connect #255

Upgrade this lib to be compatible with Spark Connect / DB Connect #255

MrPowers commented Mar 6, 2024

ronanstokes-db commented Mar 15, 2024

Upgrade this lib to be compatible with Spark Connect / DB Connect #255

Upgrade this lib to be compatible with Spark Connect / DB Connect #255

Comments

MrPowers commented Mar 6, 2024

Expected Behavior

Current Behavior

Steps to Reproduce (for bugs)

ronanstokes-db commented Mar 15, 2024