Skip to content

Commit

Permalink
Implement a SQL server speaking the presto protocol (#56)
Browse files Browse the repository at this point in the history
* Implement a SQL server speaking the presto protocol

This server can be used to run dask-sql in a standalone application.
It is e.g. possible to run it in a dask-cluster
and answer SQL queries from external.

* Stylefix

* Fixes to docker build setup
  • Loading branch information
nils-braun authored Oct 13, 2020
1 parent a33dbfb commit 71ada13
Show file tree
Hide file tree
Showing 17 changed files with 505 additions and 475 deletions.
1 change: 0 additions & 1 deletion .coveragerc
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
[run]
omit = tests/*
dask_sql/server/*
branch = True

[report]
Expand Down
14 changes: 14 additions & 0 deletions .github/workflows/deploy.yml
Original file line number Diff line number Diff line change
Expand Up @@ -41,3 +41,17 @@ jobs:
run: |
python setup.py sdist bdist_wheel
twine upload dist/*
push_to_registry:
name: Push Docker image to Docker Hub
runs-on: ubuntu-latest
steps:
- name: Check out the repo
uses: actions/checkout@v2
- name: Push to Docker Hub
uses: docker/build-push-action@v1
with:
username: ${{ secrets.DOCKER_USERNAME }}
password: ${{ secrets.DOCKER_PASSWORD }}
repository: nbraun/dask-sql
tag_with_ref: true
27 changes: 27 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
# Dockerfile for dask-sql running the SQL server
# For more information, see https://dask-sql.readthedocs.io/.
FROM continuumio/miniconda3:4.8.2
LABEL author "Nils Braun <[email protected]>"

# Install dependencies for dask-sql
COPY conda.yaml /opt/dask_sql/
RUN /opt/conda/bin/conda install \
--file /opt/dask_sql/conda.yaml \
-c conda-forge

# Build the java libraries
COPY setup.py /opt/dask_sql/
COPY .git /opt/dask_sql/.git
COPY planner /opt/dask_sql/planner
RUN cd /opt/dask_sql/ \
&& python setup.py java

# Install the python library
COPY dask_sql /opt/dask_sql/dask_sql
RUN cd /opt/dask_sql/ \
&& pip install -e .

# Set the script to execute
EXPOSE 8080
ENV JAVA_HOME /opt/conda
ENTRYPOINT [ "/opt/conda/bin/python", "/opt/dask_sql/dask_sql/server/app.py" ]
30 changes: 21 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,12 @@ Some ideas for this project are coming from the very great [blazingSQL](https://

Read more in the [documentation](https://dask-sql.readthedocs.io/en/latest/).

You can try out `dask-sql` quickly by using the docker command

docker run --rm -it -p 8080:8080 nils-braun/dask-sql

See information in the SQL server at the end of this page.

---

**NOTE**
Expand Down Expand Up @@ -145,20 +151,24 @@ After the translation to a relational algebra is done (using `RelationalAlgebraG
## SQL Server

`dask-sql` comes with a small test implementation for a SQL server.
Instead of rebuilding a full ODBC driver, we re-use the [postgreSQL wire protocol](https://www.postgresql.org/docs/9.3/protocol-flow.html).
It is - so far - just a proof of concept
Instead of rebuilding a full ODBC driver, we re-use the [presto wire protocol](https://github.com/prestodb/presto/wiki/HTTP-Protocol).
It is - so far - only a start of the development and missing important concepts, such as
authentication.

You can test the sql postgres server by running
You can test the sql presto server by running

python dask_sql/server/handler.py
python dask_sql/server/app.py

in one terminal. This will spin up a server on port 9876
that looks similar to a normal postgres database to any postgres client
(except that you can only do queries, no database creation etc.)
or by using the created docker image

You can test this for example with the default postgres client:
docker run --rm -it -p 8080:8080 nils-braun/dask-sql

psql -h localhost -p 9876
in one terminal. This will spin up a server on port 8080 (by default)
that looks similar to a normal presto database to any presto client.

You can test this for example with the default [presto client](https://prestosql.io/docs/current/installation/cli.html):

presto --server localhost:8080

Now you can fire simple SQL queries (as no data is loaded by default):

Expand All @@ -167,3 +177,5 @@ Now you can fire simple SQL queries (as no data is loaded by default):
--------
2
(1 row)

You can find more information in the [documentation](https://dask-sql.readthedocs.io/en/latest/pages/server.html).
2 changes: 2 additions & 0 deletions conda.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,3 +6,5 @@ maven>=3.6.0
pytest>=6.0.1
pytest-cov>=2.10.1
sphinx>=3.2.1
fastapi>=0.61.1
uvicorn>=0.11.3
1 change: 1 addition & 0 deletions dask_sql/__init__.py
Original file line number Diff line number Diff line change
@@ -1 +1,2 @@
from .context import Context
from .server.app import run_server
2 changes: 1 addition & 1 deletion dask_sql/context.py
Original file line number Diff line number Diff line change
Expand Up @@ -293,7 +293,7 @@ def _prepare_schema(self):
schema = DaskSchema(self.schema_name)

if not self.tables: # pragma: no cover
logger.warn("No tables are registered.")
logger.warning("No tables are registered.")

for name, dc in self.tables.items():
table = DaskTable(name)
Expand Down
108 changes: 108 additions & 0 deletions dask_sql/server/app.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,108 @@
from argparse import ArgumentParser

from dask_sql.server.responses import DataResults, QueryResults, ErrorResults
from fastapi import FastAPI, Request
import uvicorn

from dask_sql import Context

app = FastAPI()


@app.get("/v1/empty")
async def empty(request: Request):
"""
Helper endpoint returning an empty
result.
"""
return QueryResults(request=request)


@app.post("/v1/statement")
async def query(request: Request):
"""
Main endpoint returning query results
in the presto on wire format.
"""
try:
sql = (await request.body()).decode().strip()
df = request.app.c.sql(sql)

return DataResults(df, request=request)
except Exception as e:
return ErrorResults(e, request=request)


def run_server(
context: Context = None, host: str = "0.0.0.0", port: int = 8080
): # pragma: no cover
"""
Run a HTTP server for answering SQL queries using ``dask-sql``.
It uses the `Presto Wire Protocol <https://github.com/prestodb/presto/wiki/HTTP-Protocol>`_
for communication.
This means, it has a single POST endpoint `v1/statement`, which answers
SQL queries (as string in the body) with the output as a JSON
(in the format described in the documentation above).
Every SQL expression that ``dask-sql`` understands can be used here.
Note:
The presto protocol also includes some statistics on the query
in the response.
These statistics are currently only filled with placeholder variables.
Args:
context (:obj:`dask_sql.Context`): If set, use this context instead of an empty one.
host (:obj:`str`): The host interface to listen on (defaults to all interfaces)
port (:obj:`int`): The port to listen on (defaults to 8080)
Example:
It is possible to run an SQL server by using the CLI script in ``dask_sql.server.app``
or by calling this function directly in your user code:
.. code-block:: python
from dask_sql import run_server
# Create your pre-filled context
c = Context()
...
run_server(context=c)
After starting the server, it is possible to send queries to it, e.g. with the
`presto CLI <https://prestosql.io/docs/current/installation/cli.html>`_
or via sqlalchemy (e.g. using the `PyHive <https://github.com/dropbox/PyHive#sqlalchemy>`_ package):
.. code-block:: python
from sqlalchemy.engine import create_engine
engine = create_engine('presto://localhost:8080/')
import pandas as pd
pd.read_sql_query("SELECT 1 + 1", con=engine)
Of course, it is also possible to call the usual ``CREATE TABLE``
commands.
"""
if context is None:
context = Context()

app.c = context

uvicorn.run(app, host=host, port=port)


if __name__ == "__main__":
parser = ArgumentParser()
parser.add_argument(
"--host",
default="0.0.0.0",
help="The host interface to listen on (defaults to all interfaces)",
)
parser.add_argument(
"--port", default=8080, help="The port to listen on (defaults to 8080)"
)

args = parser.parse_args()

run_server(host=args.host, port=args.port)
98 changes: 0 additions & 98 deletions dask_sql/server/handler.py

This file was deleted.

Loading

0 comments on commit 71ada13

Please sign in to comment.