BigQuery Connector

libraryDependencies += "com.github.music-of-the-ainur" %% "bigquery-almaren" % "0.0.6-$SPARK_VERSION"

To run in spark-shell:

spark-shell --packages "com.github.music-of-the-ainur:bigquery-almaren_2.12:0.0.6-$SPARK_VERSION,com.github.music-of-the-ainur:almaren-framework_2.12:0.9.3-$SPARK_VERSION"

BigQuery Connector was implemented using https://github.com/GoogleCloudDataproc/spark-bigquery-connector. For more details check the following link.


spark-shell --master "local[*]" --packages "com.github.music-of-the-ainur:almaren-framework_2.12:0.9.3-3.1,com.github.music-of-the-ainur:bigquery-almaren_2.12:0.0.5-3.1"

Source and Target

Source

Parameteres

Parameters	Description
table	The BigQuery table which is present in a dataset in the format [[project:]dataset.]table
options	Description
-------------	-------------
parentProject	The Google Cloud resource hierarchy resembles the file system which manages entities hierarchically . The Google Cloud Project ID of the table.
project	The Google Cloud Project ID of the table. A project organizes all your Google Cloud resources .For example, all of your Cloud Storage buckets and objects, along with user permissions for accessing them, reside in a project.
dataset	A dataset is contained within a specific project. Datasets are top-level containers that are used to organize and control access to your tables and views
query	Standard SQL SELECT query. (Table name should be in grave accent)

Example 1

import com.github.music.of.the.ainur.almaren.Almaren
import com.github.music.of.the.ainur.almaren.bigquery.BigQuery.BigQueryImplicit
import com.github.music.of.the.ainur.almaren.builder.Core.Implicit

val almaren = Almaren("App Name")

spark.conf.set("gcpAccessToken","token")

val df =  almaren
         .builder
         .sourceBigQuery("dataset.table",Map("parentProject"->"project_name","project"->"project_name"))
         .batch

df.show(false)

You can run any Standard SQL SELECT query on BigQuery and fetch its results directly to a Spark Dataframe.
In order to use this feature the following configurations MUST be set:

viewsEnabled must be set to true.
materializationDataset must be set to a dataset where the GCP user has table creation permission.

Example 2

import com.github.music.of.the.ainur.almaren.Almaren
import com.github.music.of.the.ainur.almaren.bigquery.BigQuery.BigQueryImplicit
import com.github.music.of.the.ainur.almaren.builder.Core.Implicit

val almaren = Almaren("App Name")

spark.conf.set("gcpAccessToken","token")
spark.conf.set("viewsEnabled","true")
spark.conf.set("materializationDataset","<dataset>")

val df =  almaren
         .builder
         .sourceBigQuery("query",Map("parentProject"->"project_name","project"->"project_name"))
         .batch

df.show(false)

Target:

Parameters

Parameters	Description
table	The BigQuery table which is present in a dataset in the format [[project:]dataset.]table
options	Description
-------------	-------------
parentProject	The Google Cloud resource hierarchy resembles the file system which manages entities hierarchically . The Google Cloud Project ID of the table.
project	The Google Cloud Project ID of the table. A project organizes all your Google Cloud resources .For example, all of your Cloud Storage buckets and objects, along with user permissions for accessing them, reside in a project.
dataset	A dataset is contained within a specific project. Datasets are top-level containers that are used to organize and control access to your tables and views
temporaryGcsBucket	The GCS bucket that temporarily holds the data before it is loaded to BigQuery. Required unless set in the Spark configuration (spark.conf.set(...)).

Example

import com.github.music.of.the.ainur.almaren.Almaren
import com.github.music.of.the.ainur.almaren.bigquery.BigQuery.BigQueryImplicit
import com.github.music.of.the.ainur.almaren.builder.Core.Implicit
import org.apache.spark.sql.SaveMode

val almaren = Almaren("App Name")

spark.conf.set("gcpAccessToken","token")

almaren.builder
    .sourceSql("""SELECT sha2(concat_ws("",array(*)),256) as id,*,current_timestamp from deputies""")
    .coalesce(30)
    .targetBigQuery("dataset.table",Map("parentProject"->"project_name","project"->"project_name","temporaryGcsBucket"->"bucket"),SaveMode.Overwrite)
    .batch

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
project		project
src		src
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE		LICENSE
README.md		README.md
build.sbt		build.sbt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BigQuery Connector

Source and Target

Source

Parameteres

Example 1

Example 2

Target:

Parameters

Example

About

Releases

Packages

Languages

License

badrinathpatchikolla/bigquery.almaren

Folders and files

Latest commit

History

Repository files navigation

BigQuery Connector

Source and Target

Source

Parameteres

Example 1

Example 2

Target:

Parameters

Example

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages