-
Notifications
You must be signed in to change notification settings - Fork 0
/
spark.Rmd
36 lines (27 loc) · 1.19 KB
/
spark.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
---
title: "Working with other data science platforms"
author: "Steph Locke"
date: "29 November 2018"
output: html_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
We can work consume data from many different sources in Power BI, but as well as R and Python, some of the most useful to us are the data science platforms. We're able to consume Spark, Databricks, and Hadoop using Power BI.
- [HDInsight based Spark](https://docs.microsoft.com/en-us/azure/hdinsight/spark/apache-spark-use-bi-tools)
- [databricks](https://docs.databricks.com/user-guide/bi/power-bi.html)
## Working with databricks
Once we have a Spark or databricks cluster we can use the Spark connector to talk to a cluster.
```
https://westeurope.azuredatabricks.net:443/sql/protocolv1/o/7856645940864675/simples
```
We need a user name and password - these can be tokens for ease.
```
token
dapif31cca4bf07da43c52eca8f0a119c653
```
This can work in *direct query* mode so data stays on the databricks cluster.
## Exercise
1. Import the flights data from the Databricks cluster
2. Use (or construct!) a model that predicts flight arrival delays and use this to score the data from Spark
3. Visualise the results