unified architecture v0.1 #418

pralabhkumar · 2018-08-17T07:21:55Z

This is pull request for the unified architecture

varunsaxena · 2018-08-20T04:28:38Z

app/com/linkedin/drelephant/AutoTuner.java

    try {
-      AutoTuningMetricsController.init();
+     /* AutoTuningMetricsController.init();


Nit: Remove the commented code

varunsaxena · 2018-08-20T04:31:28Z

app/com/linkedin/drelephant/tuning/AutoTuningAPIHelper.java

@@ -18,6 +18,8 @@

 import com.fasterxml.jackson.databind.ObjectMapper;
 import com.linkedin.drelephant.ElephantContext;
+import com.linkedin.drelephant.tuning.obt.AutoTuningOptimizeManager;


Are these 2 imports required? There isnt any other code added in the file

Yes , they are required . AutoTuningOptimizeManager is being refactored .

Agree. I did not see associated code change hence gave the comment. The file has been moved to a different package hence the change is required.

varunsaxena · 2018-08-20T05:08:53Z

app/com/linkedin/drelephant/tuning/AbstractAlgorithmManager.java

+import org.apache.log4j.Logger;
+
+
+public abstract class AbstractAlgorithmManager implements Manager {


Algorithm=>Tuning/TuningType? We are using the term tuning type to differentiate between HBT/OBT. OBT will then have further algorithms. Keep the code consistent with that terminology?

varunsaxena · 2018-08-20T05:11:55Z

app/com/linkedin/drelephant/tuning/obt/AlgorithmManagerOBT.java

- */
-
-package com.linkedin.drelephant.tuning;
+package com.linkedin.drelephant.tuning.obt;


File header should not be removed

varunsaxena · 2018-08-20T05:14:40Z

app/com/linkedin/drelephant/tuning/obt/AlgorithmManagerOBT.java

 import org.apache.commons.io.FileUtils;
 import org.apache.log4j.Logger;
 import play.libs.Json;
 import org.apache.hadoop.conf.Configuration;

+public class AlgorithmManagerOBT extends AbstractAlgorithmManager {


The abstraction with regards to the algorithm isn't perfect here. We have code related to PSO in this class. Basically this class is meant for OBT and PSO code is algorithm specific. Code related to PSO/IPSO should go in a separate class and we should have further abstraction for specific optimization algorithm implementation. This could be the driver class which in turn invokes algorithm methods which implementations implement.

varunsaxena · 2018-08-20T05:20:56Z

app/com/linkedin/drelephant/tuning/Flow.java

+  public void createAlgorithmManagersPipeline() {
+    List<Manager> algorithmManagers = new ArrayList<Manager>();
+    algorithmManagers.add(new AlgorithmManagerHBT(new com.linkedin.drelephant.tuning.engine.MRExecutionEngine()));
+    algorithmManagers.add(new AlgorithmManagerHBT(new com.linkedin.drelephant.tuning.engine.SparkExecutionEngine()));


The class is already imported. Do not need to give full package name?

Yes , changed

varunsaxena · 2018-08-20T05:31:38Z

app/com/linkedin/drelephant/tuning/AbstractBaselineManager.java

+   */
+  protected Boolean updateDataBase(List<TuningJobDefinition> tuningJobDefinitions) {
+    for (TuningJobDefinition tuningJobDefinition : tuningJobDefinitions) {
+      tuningJobDefinition.update();


What if an exception is thrown during update? If it's taken up in next run, returning true from this method and checking it in caller is inconsequential

Yes , agreed. Actually I return true for the sake of completeness in most of the methods. I am adding exception handling now in all those cases.

varunsaxena · 2018-08-20T05:32:05Z

app/com/linkedin/drelephant/tuning/AbstractBaselineManager.java

+      }
+    }
+    AutoTuningMetricsController.setBaselineComputeWaitJobs(baselineComputeWaitJobs);
+    return true;


true is always returned. Any need for this method to return boolean?

Again if exception is there , it should return false . Changed the code

varunsaxena · 2018-08-20T05:51:30Z

app/com/linkedin/drelephant/tuning/obt/AlgorithmManagerOBT.java

+    TUNING_SCRIPT_PATH = PSO_DIR_PATH + "/pso_param_generation.py";
+    this._executionEngine=executionEngine;
+    logger.info("Tuning script path: " + TUNING_SCRIPT_PATH);
+    logger.info("Python path: " + PYTHON_PATH);


Any need of logging python path?

varunsaxena · 2018-08-20T05:58:45Z

app/com/linkedin/drelephant/tuning/obt/FitnessManagerOBT.java

+            Expr.eq(TuningJobExecutionParamSet.TABLE.jobExecution + '.' + JobExecution.TABLE.executionState,
+                JobExecution.ExecutionState.CANCELLED))
+        .isNull(TuningJobExecutionParamSet.TABLE.jobExecution + '.' + JobExecution.TABLE.resourceUsage)
+        .eq(TuningJobDefinition.TABLE.tuningAlgorithm, TuningAlgorithm.OptimizationAlgo.PSO.name())


This is optimization algorithm specific code in OBT class. We should abstract out optimization algorithm into a separate class. Maybe one class which does it all based on the algorithm. Updating of DB can be in the OBT class.

didn't understand .Lets discuss

As discussed, talking specifically about TuningAlgorithm.OptimizationAlgo.PSO.name() being present in OBT class. Algorithm class should be abstracted out and PSO/IPSO implementations for it. Basically consider any reference to algorithm as PSO/IPSO in my comments. I will refer to HBT/OBT as tuning type

varunsaxena · 2018-08-20T06:07:18Z

app/com/linkedin/drelephant/tuning/AbstractAlgorithmManager.java

+
+
+public abstract class AbstractAlgorithmManager implements Manager {
+  protected final String JSON_CURRENT_POPULATION_KEY = "current_population";


As the generation of param for each job is different, we can have a thread pool here, in the baseline manager, and in fitness compute manager as well to parallelize execution as much as possible.

Yes we are planning to have ThreadPool for each manager . That would be in nxt version

varunsaxena · 2018-08-20T06:12:15Z

app/com/linkedin/drelephant/tuning/obt/BaselineManagerOBT.java

+  public BaselineManagerOBT() {
+    NUM_JOBS_FOR_BASELINE_DEFAULT = 30;
+    Configuration configuration = ElephantContext.instance().getAutoTuningConf();
+    _numJobsForBaseline =


Are we expecting this configuration to be different for OBT and HBT? If yes, this config should be named differently. If no, it should be read in base class.

done . Only possible thing that can be different is NUM_JOBS_FOR_BASELINE_DEFAULT.

varunsaxena · 2018-08-20T06:17:14Z

app/com/linkedin/drelephant/tuning/AbstractAlgorithmManager.java

+public abstract class AbstractAlgorithmManager implements Manager {
+  protected final String JSON_CURRENT_POPULATION_KEY = "current_population";
+  private final Logger logger = Logger.getLogger(getClass());
+  protected abstract List<JobTuningInfo> detectJobsForParameterGeneration();


Move the method declaration below variable declaration.

varunsaxena · 2018-08-20T06:50:37Z

app/com/linkedin/drelephant/tuning/AbstractAlgorithmManager.java

+  }
+
+
+  protected abstract JobTuningInfo generateParamSet(JobTuningInfo jobTuningInfo);


As these abstract methods will have to be implemented, write a javadoc in detail explaining what each abstract method is supposed to do. Good for maintainability.
Applies for each abstract method in all the abstract classes added as part of this PR.

varunsaxena · 2018-08-20T06:53:15Z

app/com/linkedin/drelephant/tuning/Flow.java

+  public void createJobStatusManagersPipeline() {
+    List<Manager> jobStatusManagers = new ArrayList<Manager>();
+    jobStatusManagers.add(new AzkabanJobStatusManager());
+    //jobStatusManagers.add(new JobStatusManagerOBT());


Nit: Remove this line

varunsaxena · 2018-08-20T06:58:53Z

app/com/linkedin/drelephant/tuning/Flow.java

+
+
+public class Flow {
+  Map<String, List<Manager>> pipelines = null;


Any specific reason for adding a map here? We are just starting a thread for each type

nope , my bad . Changed this to List of List

varunsaxena · 2018-08-20T10:17:02Z

app/com/linkedin/drelephant/tuning/Flow.java

+  public void createAlgorithmManagersPipeline() {
+    List<Manager> algorithmManagers = new ArrayList<Manager>();
+    algorithmManagers.add(new AlgorithmManagerHBT(new com.linkedin.drelephant.tuning.engine.MRExecutionEngine()));
+    algorithmManagers.add(new AlgorithmManagerHBT(new com.linkedin.drelephant.tuning.engine.SparkExecutionEngine()));


How about having the execution engine i.e. Spark/MR being invoked inside AlgorithmManagerOBT/HBT? This can be tuning implementation specific. Having abstraction like this would ensure that we may not have to implement all execution types for a particular tuning type and we can probably not show it on the UI as well for a particular job type.

varunsaxena · 2018-08-20T10:23:11Z

test/rest/RestAPITest.java

@@ -1014,6 +1022,7 @@ private void populateTestData() {
  private void populateAutoTuningTestData1() {
    try {
      initAutoTuningDB1();
+      //initDB();


Nit: Remove this line

varunsaxena · 2018-08-20T10:23:21Z

test/rest/RestAPITest.java

@@ -163,8 +164,11 @@ public void run() {
        jobSuggestedParamSet.paramSetState = ParamSetStatus.EXECUTED;
        jobSuggestedParamSet.update();

-        FitnessComputeUtil fitnessComputeUtil = new FitnessComputeUtil();
-        fitnessComputeUtil.updateFitness();
+        /*FitnessComputeUtil fitnessComputeUtil = new FitnessComputeUtil();


Nit: Remove this code and elsewhere i.e. where commented code is present

varunsaxena · 2018-08-20T10:25:10Z

app/com/linkedin/drelephant/tuning/obt/FitnessManagerOBT.java

+            Expr.eq(TuningJobExecutionParamSet.TABLE.jobExecution + '.' + JobExecution.TABLE.executionState,
+                JobExecution.ExecutionState.CANCELLED))
+        .isNull(TuningJobExecutionParamSet.TABLE.jobExecution + '.' + JobExecution.TABLE.resourceUsage)
+        .eq(TuningJobDefinition.TABLE.tuningAlgorithm, TuningAlgorithm.OptimizationAlgo.PSO.name())


As discussed, talking specifically about TuningAlgorithm.OptimizationAlgo.PSO.name() being present in OBT class. Algorithm class should be abstracted out and PSO/IPSO implementations for it. Basically consider any reference to algorithm as PSO/IPSO in my comments. I will refer to HBT/OBT as tuning type

varunsaxena · 2018-08-20T10:39:49Z

app/com/linkedin/drelephant/tuning/Manager.java

+  /*
+   Use to execute the logic of all the managers .
+   */
+  Boolean execute();


Unnecessary autoboxing. Do we need it? Same comment elsewhere as well

varunsaxena · 2018-08-20T10:41:33Z

app/com/linkedin/drelephant/tuning/hbt/AlgorithmManagerHBT.java

+import java.util.List;
+
+
+public class AlgorithmManagerHBT extends AbstractAlgorithmManager{


Nit: Space before the brace. Space before and after "=". Checkstyle would catch it. We can scan through the code and fix them

varunsaxena · 2018-08-20T10:48:33Z

app/com/linkedin/drelephant/tuning/AbstractJobStatusManager.java

+import org.apache.log4j.Logger;
+
+
+public abstract class AbstractJobStatusManager implements Manager {


Wondering if we can have two schedulers supported in future? For instance, we use AppWorx as well. Should we have scheduler type in TuningJobExecutionParamSet then and this class then should call the scheduler implementation depending on scheduler type

Yes ,we have planned for it . This may be in the next version. Currently autotuning is for Azkaban.

varunsaxena · 2018-08-20T11:14:57Z

app/com/linkedin/drelephant/tuning/Flow.java

+  Map<String, List<Manager>> pipelines = null;
+
+  public Flow() {
+    pipelines = new HashMap<String, List<Manager>>();


We can think of Flow as 4 steps, running in parallel.

Baseline

Job status from Scheduler.

Fitness Manager

Parameter generation

Instead of having everything in the class Flow, which is primarily a driver class, should we have one interface for Tuning type and have implementations for it. This implementation can then give us further implementations for that specific tuning type for baseline, fitness compute and parameter generation.
This interface for tuning type would then be point of reference whenever we add a new tuning type.
Similarly we can do for scheduler in JobStatus manager.
For instance,
interface TuningType {
Baseline getBaselineImpl();
FitnessCompute getFitnessComputeImpl();
ParamGeneration getParamGenerationImpl();
}

Also as discussed and you said you have planned for it as well, we can add further thread pool at each step. Just capturing it as a comment

varunsaxena · 2018-08-20T11:27:03Z

test/com/linkedin/drelephant/tuning/IPSOManagerTestRunner.java

@@ -40,21 +43,22 @@ public void run() {
    JobSuggestedParamSet jobSuggestedParamSet =
        JobSuggestedParamSet.find.where().eq("fitness_job_execution_id", 1541).findUnique();
    JobExecution jobExecution = JobExecution.find.byId(1541L);
-    AutoTuningOptimizeManager optimizeManager = checkIPSOManager(tuningAlgorithm);
+    com.linkedin.drelephant.tuning.obt.AutoTuningOptimizeManager optimizeManager = checkIPSOManager(tuningAlgorithm);


Nit: Full package name is not required. Is at other places too. Fix it wherever required.

varunsaxena · 2018-08-20T11:57:27Z

app/com/linkedin/drelephant/tuning/obt/FitnessManagerOBT.java

+        Utils.getNonNegativeLong(configuration, IGNORE_EXECUTION_WAIT_INTERVAL, 2 * 60 * AutoTuner.ONE_MIN);
+
+    // #executions after which tuning will stop even if parameters don't converge
+    maxTuningExecutions = Utils.getNonNegativeInt(configuration, MAX_TUNING_EXECUTIONS, 39);


Suggestion: The default values can be made constants

varunsaxena · 2018-08-20T12:07:51Z

app/com/linkedin/drelephant/tuning/AutoTuningAPIHelper.java

@@ -18,6 +18,8 @@

 import com.fasterxml.jackson.databind.ObjectMapper;
 import com.linkedin.drelephant.ElephantContext;
+import com.linkedin.drelephant.tuning.obt.AutoTuningOptimizeManager;
+import com.linkedin.drelephant.tuning.obt.OptimizationAlgoFactory;


Don't we have to change getCurrentRunParameters method? Now the tuning type input would come from UI. Also TuningJobDefinition is set from this method

varunsaxena · 2018-08-20T12:10:42Z

app/com/linkedin/drelephant/tuning/AutoTuningAPIHelper.java

@@ -18,6 +18,8 @@

 import com.fasterxml.jackson.databind.ObjectMapper;
 import com.linkedin.drelephant.ElephantContext;
+import com.linkedin.drelephant.tuning.obt.AutoTuningOptimizeManager;
+import com.linkedin.drelephant.tuning.obt.OptimizationAlgoFactory;
 import com.linkedin.drelephant.util.Utils;


How do we set penalty now?

varunsaxena · 2018-08-20T12:19:28Z

app/com/linkedin/drelephant/tuning/AbstractBaselineManager.java

+  protected Boolean calculateBaseLine(List<TuningJobDefinition> tuningJobDefinitions) {
+    for (TuningJobDefinition tuningJobDefinition : tuningJobDefinitions) {
+      try {
+        logger.info("Computing and updating baseline metric values for job: " + tuningJobDefinition.job.jobName);


tuning_job_definition has a column named tuningAlgorithm which is meant for PSO/IPSO. While with only two tuning types i.e. HBT/OBT as of now, we can probably use this table for HBT when tuningAlgorithm is NULL. But we should ideally have a column for tuning type to make the code extensible for a new tuning type in future (say ML based).Same goes for other tables as well

varunsaxena · 2018-08-20T12:20:21Z

app/com/linkedin/drelephant/tuning/AbstractBaselineManager.java

+   * @return List of jobs whose baseline needs to be added
+   */
+
+  protected abstract List<TuningJobDefinition> detectJobsForBaseLineComputation();


Add a javadoc here

varunsaxena · 2018-08-20T12:20:38Z

app/com/linkedin/drelephant/tuning/AbstractJobStatusManager.java

+
+public abstract class AbstractJobStatusManager implements Manager {
+  private final Logger logger = Logger.getLogger(getClass());
+  protected abstract Boolean analyzeCompletedJobsExecution(List<TuningJobExecutionParamSet> inProgressExecutionParamSet);


Add a javadoc

akshayrai

@pralabhkumar , it would be great if you can provide a link to the design doc or mention what is the goal of this unified architecture in the description.

akshayrai · 2018-08-29T22:24:32Z

app/com/linkedin/drelephant/tuning/AbstractBaselineManager.java

+  protected final String BASELINE_EXECUTION_COUNT = "baseline.execution.count";
+  protected Integer NUM_JOBS_FOR_BASELINE_DEFAULT = 30;
+  protected String baseLineCalculationSQL =
+      "SELECT AVG(resource_used) AS resource_used, AVG(execution_time) AS execution_time FROM "


You could use the java ebeans rather than hard coding the sql statements here. You may refer to other parts of Dr. Elephant for reference.

varunsaxena · 2018-09-03T05:29:50Z

Review to move over to #430

pralabhkumar added 2 commits August 17, 2018 12:50

unified architecture v0.1

a9b1828

refactoring for unified architecture

bcce807

pralabhkumar requested review from varunsaxena and mkumar1984 August 17, 2018 08:48

varunsaxena reviewed Aug 20, 2018

View reviewed changes

varunsaxena requested changes Aug 20, 2018

View reviewed changes

pralabhkumar added 3 commits August 22, 2018 20:05

refactoring tune in code for unified architecture

c77bfb2

Changes in unified architecture for PSO & IPSO

8acff29

HBT integration with unified architecture

f94a2b3

akshayrai reviewed Aug 30, 2018

View reviewed changes

pralabhkumar added 3 commits August 30, 2018 22:12

AutoTuningAPIHelper change for HBT

4f76985

AutoTuningApiHelper change for HBT

2715115

Changes to unified architecutre

27c6bbf

varunsaxena mentioned this pull request Oct 19, 2018

Tuning #430

Closed

		import org.apache.log4j.Logger;


		public abstract class AbstractAlgorithmManager implements Manager {



		public abstract class AbstractAlgorithmManager implements Manager {
		protected final String JSON_CURRENT_POPULATION_KEY = "current_population";

		}


		protected abstract JobTuningInfo generateParamSet(JobTuningInfo jobTuningInfo);



		public class Flow {
		Map<String, List<Manager>> pipelines = null;

		import java.util.List;


		public class AlgorithmManagerHBT extends AbstractAlgorithmManager{

		import org.apache.log4j.Logger;


		public abstract class AbstractJobStatusManager implements Manager {

unified architecture v0.1 #418

Are you sure you want to change the base?

unified architecture v0.1 #418

Conversation

pralabhkumar commented Aug 17, 2018

varunsaxena Aug 20, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

varunsaxena Aug 20, 2018 • edited Loading

Choose a reason for hiding this comment

varunsaxena Aug 20, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

varunsaxena Aug 20, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

akshayrai left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

varunsaxena commented Sep 3, 2018

varunsaxena Aug 20, 2018 •

edited

Loading

varunsaxena Aug 20, 2018 •

edited

Loading

varunsaxena Aug 20, 2018 •

edited

Loading

varunsaxena Aug 20, 2018 •

edited

Loading