-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Baseline IRs for translating ADF pipelines to workflows #1235
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to be refactored as to avoid Option
fields as much as possible.
dependsOn: Seq[ActivityDependency], | ||
description: Option[String], | ||
linkedServiceName: Option[LinkedServiceReference], | ||
name: Option[String], | ||
onInactiveMarkAs: Option[OnInactiveMarkAs], | ||
policy: Option[ActivityPolicy], | ||
state: Option[ActivityState], | ||
activityType: Option[String], | ||
activityProperties: Option[ActivityProperties], | ||
userProperties: Seq[UserProperty]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Case classes like this one where all fields are Seq
or Option
almost always indicate a design problem. As a rule of thumb, one should ask themselves "what would be the meaning of Activity(Seq(), None, None, None, None, None, None, None, None, Seq())
?".
Option
in a case class field should only be used when there is a meaningful situation where the field could be absent.
For example, is it possible for an Activity
to have no state
? Note that isn't exactly the same question as "can an Activity
be neither Active
nor Inactive
?". If the answer to the latter question is "yes", then we should add a case object Unknown extends Activity
for example and define the field as state: ActivityState = Unknown
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've made the name
, activityType
, activityProperties
, and state
fields required. Every activity should have at minimum a name, type, set of type-specific properties, and state.
Many of the other fields are configurations which will be optionally returned by the API client.
case class LibraryDefinition( | ||
jarFilePath: Option[String], | ||
eggFilePath: Option[String], | ||
whlFilePath: Option[String], | ||
mavenSpecification: Option[MavenSpecification], | ||
pyPiSpecification: Option[PyPiSpecification], | ||
cranSpecification: Option[CranSpecification]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Following the comment above, here it is quite clear that we have three mutually exclusive cases mashed together.
We should rather define it as
sealed trait LibraryDefinition
case class MavenLibrary(jarFilePath: String, mavenSpecification: MavenSpecification) extends LibraryDefinition
case class PyPiLibrary(eggFilePath: String, piPySpecification: PyPiSpecification) extends LibraryDefinition
case class CranLibrary(whlFilePath: String, cranSpecification: CranSpecification) extends LibraryDefinition
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is done.
abstract class ActivityProperties(name: Option[String]) extends PipelineNode { | ||
override def children: Seq[PipelineNode] = Seq() | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
abstract class ActivityProperties(name: Option[String]) extends PipelineNode { | |
override def children: Seq[PipelineNode] = Seq() | |
} | |
trait ActivityProperties extends PipelineNode { | |
def name: String | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is done. I will note that there are no common fields across ActivityProperties
of different types. Each activity type will have a unique set of properties. Is it OK to use a trait in this way?
case class DatabricksNotebookActivity( | ||
name: Option[String], | ||
baseParameters: Map[String, String], | ||
libraries: Seq[LibraryDefinition], | ||
notebookPath: Option[String]) | ||
extends ActivityProperties(name) { | ||
override def children: Seq[PipelineNode] = super.children ++ libraries | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
case class DatabricksNotebookActivity( | |
name: Option[String], | |
baseParameters: Map[String, String], | |
libraries: Seq[LibraryDefinition], | |
notebookPath: Option[String]) | |
extends ActivityProperties(name) { | |
override def children: Seq[PipelineNode] = super.children ++ libraries | |
} | |
case class DatabricksNotebookActivity( | |
override val name: String, | |
baseParameters: Map[String, String], | |
libraries: Seq[LibraryDefinition], | |
notebookPath: Option[String]) | |
extends ActivityProperties { | |
override def children: Seq[PipelineNode] = libraries | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is done.
override def children: Seq[PipelineNode] = Seq() ++ dependsOn ++ linkedServiceName ++ | ||
policy ++ userProperties | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are sure that linkedServiceName
, policy
and userProperties
are optional
override def children: Seq[PipelineNode] = Seq() ++ dependsOn ++ linkedServiceName ++ | |
policy ++ userProperties | |
} | |
override def children: Seq[PipelineNode] = dependsOn ++ linkedServiceName.toSeq ++ | |
policy.toSeq ++ userProperties.toSeq | |
} |
otherwise, we should go for
override def children: Seq[PipelineNode] = dependsOn ++ Seq(linkedServiceName, policy, userProperties)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is done. I made fields required and used the last pattern.
Summary
This PR adds intermediate representations of ADF pipelines, activities, and linked services from the APIs. This is a required baseline for translating ADF pipelines to Databricks workflows.
Details
orchestrators.adf
packagePipelineNode
for any nodes coming from ADFNotes
children
method? (e.g. inDatabricksNotebookActivity
)