-
Notifications
You must be signed in to change notification settings - Fork 14
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #17 from ChinthapallyAkanksha/spark-3.3
Updated Quenya-Dsl to support special characters for spark-3.3 branch
- Loading branch information
Showing
11 changed files
with
245 additions
and
78 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,28 +1,24 @@ | ||
# Quenya DSL | ||
# Quenya-DSL | ||
|
||
[![Build Status](https://github.com/music-of-the-ainur/quenya-dsl/actions/workflows/quneys-dsl-githubactions.yml/badge.svg)](https://github.com/music-of-the-ainur/quenya-dsl/actions/workflows/quneys-dsl-githubactions.yml) | ||
|
||
Adding Quenya DSL dependency to your sbt build: | ||
Adding Quenya-DSL dependency to your sbt build: | ||
|
||
``` | ||
libraryDependencies += "com.github.music-of-the-ainur" %% "quenya-dsl" % "1.2.2-3.3" | ||
libraryDependencies += "com.github.music-of-the-ainur" %% "quenya-dsl" % "1.2.2-$SPARK_VERSION" | ||
``` | ||
|
||
To run in spark-shell: | ||
|
||
``` | ||
spark-shell --packages "com.github.music-of-the-ainur:quenya-dsl_2.13:1.2.2-3.3" | ||
spark-shell --packages "com.github.music-of-the-ainur:quenya-dsl_2.12:1.2.0-$SPARK_VERSION" | ||
``` | ||
|
||
### Connector Usage | ||
|
||
#### Maven / Ivy Package Usage | ||
The connector is also available from the | ||
[Maven Central](https://mvnrepository.com/artifact/com.github.music-of-the-ainur) | ||
repository. It can be used using the `--packages` option or the | ||
`spark.jars.packages` configuration property. Use the following value | ||
Quenya-Dsl is available in [Maven Central](https://mvnrepository.com/artifact/com.github.music-of-the-ainur) | ||
repository. | ||
|
||
| version | Connector Artifact | | ||
| versions | Connector Artifact | | ||
|----------------------------|-----------------------------------------------------------| | ||
| Spark 3.3.x and scala 2.13 | `com.github.music-of-the-ainur:quenya-dsl_2.13:1.2.2-3.3` | | ||
| Spark 3.3.x and scala 2.12 | `com.github.music-of-the-ainur:quenya-dsl_2.12:1.2.2-3.3` | | ||
|
@@ -32,7 +28,7 @@ repository. It can be used using the `--packages` option or the | |
| Spark 2.4.x and scala 2.11 | `com.github.music-of-the-ainur:quenya-dsl_2.11:1.2.2-2.4` | | ||
|
||
## Introduction | ||
Quenya DSL(Domain Specific Language) is a language that simplifies the task to parser complex semi-structured data. | ||
Quenya-DSL(Domain Specific Language) is a language that simplifies the task to parser complex semi-structured data. | ||
|
||
```scala | ||
|
||
|
@@ -155,7 +151,7 @@ Output: | |
|
||
## DSL Generator | ||
|
||
You can generate a DSL based on a DataFrame: | ||
You can generate the DSL from an existing DataFrame: | ||
|
||
```scala | ||
import com.github.music.of.the.ainur.quenya.QuenyaDSL | ||
|
@@ -165,6 +161,18 @@ val quenyaDsl = QuenyaDSL | |
quenyaDsl.printDsl(df) | ||
``` | ||
|
||
### getDsl | ||
You can generate and asssign a DSL to variable based on a DataFrame: | ||
|
||
```scala | ||
import com.github.music.of.the.ainur.quenya.QuenyaDSL | ||
|
||
val df:DataFrame = ... | ||
val quenyaDsl = QuenyaDSL | ||
val dsl = quenyaDsl.getDsl(df) | ||
``` | ||
|
||
|
||
json: | ||
``` | ||
{ | ||
|
@@ -201,6 +209,50 @@ weapon@weapon | |
|
||
You can _alias_ using the fully qualified name using ```printDsl(df,true)```, you should turn on in case of name conflict. | ||
|
||
## How to Handle Special Characters | ||
|
||
|
||
|
||
Use the literal backtick **``** to handle special characters like space,semicolon,hyphen and colon. | ||
Example: | ||
|
||
|
||
|
||
json: | ||
``` | ||
{ | ||
"name":{ | ||
"name One":"Mithrandir", | ||
"Last-Name":"Olórin", | ||
"nick:Names":[ | ||
"Gandalf the Grey", | ||
"Gandalf the White" | ||
] | ||
}, | ||
"race":"Maiar", | ||
"age":"immortal", | ||
"weapon;name":[ | ||
"Glamdring", | ||
"Narya", | ||
"Wizard Staff" | ||
] | ||
} | ||
``` | ||
|
||
|
||
|
||
DSL: | ||
``` | ||
age$age:StringType | ||
`name.Last-Name`$`Last-Name`:StringType | ||
`name.name One`$`name-One`:StringType | ||
`name.nick:Names`@`nick:Names` | ||
`nick:Names`$`nick:Names`:StringType | ||
race$race:StringType | ||
`weapon;name`@`weapon;name` | ||
`weapon;name`$`weapon_name`:StringType | ||
``` | ||
|
||
## Backus–Naur form | ||
|
||
``` | ||
|
@@ -216,14 +268,6 @@ You can _alias_ using the fully qualified name using ```printDsl(df,true)```, yo | |
| DoubleType | FloatType | ByteType | IntegerType | LongType | ShortType | ||
``` | ||
|
||
## Requirements | ||
|
||
| Software | Version | | ||
|--------------|-----------| | ||
| Java | 8 | | ||
| Scala | 2.11/2.12 | | ||
| Apache Spark | 2.4 | | ||
|
||
## Author | ||
Daniel Mantovani [[email protected]](mailto:[email protected]) | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -32,6 +32,12 @@ ThisBuild / developers := List( | |
name = "Daniel Mantovani", | ||
email = "[email protected]", | ||
url = url("https://github.com/music-of-the-ainur") | ||
), | ||
Developer( | ||
id = "ChinthapallyAkanksha", | ||
name = "Akanksha Chinthapally", | ||
email = "[email protected]", | ||
url = url("https://github.com/music-of-the-ainur") | ||
) | ||
) | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,10 +1,10 @@ | ||
age,LastName,nameOne,nickNames,race,weapon | ||
3500,Olórin,Mithrandir,Gandalf the Grey,Maiar,Glamdring | ||
3500,Olórin,Mithrandir,Gandalf the Grey,Maiar,Narya | ||
3500,Olórin,Mithrandir,Gandalf the Grey,Maiar,Wizard Staff | ||
3500,Olórin,Mithrandir,Gandalf the Grey,Maiar,Glamdring | ||
3500,Olórin,Mithrandir,Gandalf the Grey,Maiar,Narya | ||
3500,Olórin,Mithrandir,Gandalf the Grey,Maiar,Wizard Staff | ||
4500,"",Ilmarë,"",Ainur,Powers of the Ainur | ||
3500,"",Morgoth,Bauglir,Ainur,Powers of the Ainur | ||
3500,"",Morgoth,Bauglir,Ainur,Grond | ||
3500,"",Morgoth,Bauglir,Ainur,Mace | ||
3500,"",Morgoth,Bauglir,Ainur,Sword | ||
3500,"",Manwë,"King of Arda,",Ainur,Powers of the Ainur | ||
3500,"",Manwë,"King of Arda,",Ainur,Powers of the Ainur |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,50 @@ | ||
{ | ||
"Coffee": { | ||
"sub region": [ | ||
{ | ||
"id": 1, | ||
"full name": "John Doe" | ||
}, | ||
{ | ||
"id": 2, | ||
"name": "Don Joeh" | ||
} | ||
], | ||
"country": { | ||
"id": 2, | ||
"company": "ACME" | ||
} | ||
}, | ||
"brewing": { | ||
"sub-region": [ | ||
{ | ||
"id": 1, | ||
"name": "John Doe" | ||
}, | ||
{ | ||
"id": 2, | ||
"name": "Don Joeh" | ||
} | ||
], | ||
"world:country": { | ||
"id": 2, | ||
"company": "ACME" | ||
} | ||
}, | ||
"brewing2": { | ||
"sub;region": [ | ||
{ | ||
"id": 1, | ||
"name": "John Doe" | ||
}, | ||
{ | ||
"id": 2, | ||
"name": "Don Joeh" | ||
} | ||
], | ||
"world;country": { | ||
"id": 2, | ||
"company": "ACME" | ||
} | ||
} | ||
} |
Binary file not shown.
Binary file added
BIN
+40 Bytes
...plexData.parquet/.part-00000-f0e17416-ae5a-4295-9514-c1c7010257da-c000.snappy.parquet.crc
Binary file not shown.
Empty file.
Binary file added
BIN
+3.65 KB
...a/complexData.parquet/part-00000-f0e17416-ae5a-4295-9514-c1c7010257da-c000.snappy.parquet
Binary file not shown.
Oops, something went wrong.