Favoring standard libraries over external dependencies, especially in specific contexts like Databricks, is a best practice in software development.
There are several reasons why this approach is encouraged:
- Standard libraries are typically well-vetted, thoroughly tested, and maintained by the official maintainers of the programming language or platform. This ensures a higher level of stability and reliability.
- External dependencies, especially lesser-known or unmaintained ones, can introduce bugs, security vulnerabilities, or compatibility issues that can be challenging to resolve. Adding external dependencies increases the complexity of your codebase.
- Each dependency may have its own set of dependencies, potentially leading to a complex web of dependencies that can be difficult to manage. This complexity can lead to maintenance challenges, increased risk, and longer build times.
- External dependencies can pose security risks. If a library or package has known security vulnerabilities and is widely used, it becomes an attractive target for attackers. Minimizing external dependencies reduces the potential attack surface and makes it easier to keep your code secure.
- Relying on standard libraries enhances code portability. It ensures your code can run on different platforms and environments without being tightly coupled to specific external dependencies. This is particularly important in settings like Databricks, where you may need to run your code on different clusters or setups.
- External dependencies may have their versioning schemes and compatibility issues. When using standard libraries, you have more control over versioning and can avoid conflicts between different dependencies in your project.
- Fewer external dependencies mean faster build and deployment times. Downloading, installing, and managing external packages can slow down these processes, especially in large-scale projects or distributed computing environments like Databricks.
- External dependencies can be abandoned or go unmaintained over time. This can lead to situations where your project relies on outdated or unsupported code. When you depend on standard libraries, you have confidence that the core functionality you rely on will continue to be maintained and improved.
While minimizing external dependencies is essential, exceptions can be made case-by-case. There are situations where external dependencies are justified, such as when a well-established and actively maintained library provides significant benefits, like time savings, performance improvements, or specialized functionality unavailable in standard libraries.
See https://mypy.readthedocs.io/en/stable/cheat_sheet_py3.html for more details
- Add
assert ... is not None
if it's a body of a method. Example:
# error: Argument 1 to "delete" of "DashboardWidgetsAPI" has incompatible type "str | None"; expected "str"
self._ws.dashboard_widgets.delete(widget.id)
after
assert widget.id is not None
self._ws.dashboard_widgets.delete(widget.id)
- Add
... | None
if it's in the dataclass. Example:cloud: str = None
->cloud: str | None = None
Add .as_posix()
to convert Path to str
Add a valid default value for the dictionary return.
Example:
def viz_type(self) -> str:
return self.viz.get("type", None)
after:
Example:
def viz_type(self) -> str:
return self.viz.get("type", "UNKNOWN")
This section provides a step-by-step guide to set up and start working on the project. These steps will help you set up your project environment and dependencies for efficient development.
Go through the prerequisites and clone the dqx github repo.
To begin, install Hatch, which is our build tool.
On MacOSX, this is achieved using the following:
brew install hatch
Run the following command to create the default environment and install development dependencies, assuming you've already cloned the github repo.
make dev
Before every commit, apply the consistent formatting of the code, as we want our codebase look consistent:
make fmt
Before every commit, run automated bug detector (make lint
) and unit tests (make test
) to ensure that automated
pull request checks do pass, before your code is reviewed by others:
make lint
make test
Configure auth to Databricks workspace for integration testing by configuring credentials.
If you want to run the tests from an IDE you must setup .env
or ~/.databricks/debug-env.json
file
(see instructions).
Setup required environment variables for executing integration tests and code coverage:
export DATABRICKS_HOST=https://<workspace-url>
export DATABRICKS_CLUSTER_ID=<cluster-id>
# set either service principal credentials
export DATABRICKS_CLIENT_ID=<client-id>
export DATABRICKS_CLIENT_SECRET=<client-secret>
# or PAT token
export DATABRICKS_TOKEN=<pat-token>
Run integration tests with the following command:
make integration
Calculate test coverage and display report in html:
make coverage
Once you clone the repo locally and install Databricks CLI you can run labs CLI commands.
Similar to other databricks cli commands we can specify profile to use with --profile
.
Authenticate your current machine to your Databricks Workspace:
databricks auth login --host <WORKSPACE_HOST>
Show info about the project:
databricks labs show .
Install dqx:
databricks labs install .
Show current installation username:
databricks labs dqx me
Uninstall DQX:
databricks labs uninstall dqx
If you're interested in contributing, please reach out to us or open an issue to discuss your ideas. To contribute, you need to be added as a writer to the repository. Please note that we currently do not accept external contributors.
Here are the example steps to submit your first contribution:
- Make a branch in the dqx repo
git clone
git checkout main
(orgcm
if you're using ohmyzsh).git pull
(orgl
if you're using ohmyzsh).git checkout -b FEATURENAME
(orgcb FEATURENAME
if you're using ohmyzsh).- .. do the work
make fmt
make lint
- .. fix if any
make test
andmake integration
, optionallymake coverage
to get test coverage report- .. fix if any issues
git commit -S -a -m "message"
. Make sure to enter a meaningful commit message title. You need to sign commits with your GPG key (hence -S option). To setup GPG key in your Github account follow these instructions. You can configure Git to sign all commits with your GPG key by default:git config --global commit.gpgsign true
git push origin FEATURENAME
- Go to GitHub UI and create PR. Alternatively,
gh pr create
(if you have GitHub CLI installed). Use a meaningful pull request title because it'll appear in the release notes. UseResolves #NUMBER
in pull request description to automatically link it to an existing issue.
If you encounter any package dependency errors after git pull
, run make clean