LinkedInAttic · Aergonus · Sep 15, 2017 · Sep 15, 2017 · Sep 15, 2017 · Sep 15, 2017
diff --git a/docs/config.md b/docs/config.md
@@ -1,4 +1,4 @@
-#SIMOORG - CONFIG FILES
+# SIMOORG - CONFIG FILES
 This document provides a quick overview of the various configuration files currently used by Simoorg. Simoorg expects path to the config directory as the first console argument and a standard simoorg config directory should have the following structure
 ```
 configs/
@@ -18,7 +18,7 @@ configs/
 Next we will go through each one of these configurations in details
 
 
-##API CONFIG api.yaml
+## API CONFIG api.yaml
 This is our main api config file, this needs to be passed to both Moirai process and Api server as well. This is a yaml file, which is mainly used to store input named pipe location for Moirai process. It may be used to contain more config items in the future as the api functionalities are extended.
 
 
@@ -29,14 +29,14 @@ moirai_input_fifo: '/tmp/moirai.fifo'
 
 ```
 
-##FATE BOOKS fate_books/*
+## FATE BOOKS fate_books/*
 Fate Book is a collection of configurations used to to describe failures to be induced against your service. Each service should have a unique Fate Book associated with it. Upon starting up, Simoorg scans configs/fate_books subdirectory for files with .yaml extension. Each qualified file is treated as Fate Book and used to instantiate observers that are watching and executing failures based on the conditions defined in a Fate Book.
 Fate Books are human readable and can be edited using a conventional editor.
 
-###Fate Book Format
+### Fate Book Format
 Format of the Fate Books are chosen to be YAML for its simplicity yet being capable to formally describe nested objects in a human readable form.
 
-###Fate Book Contents
+### Fate Book Contents
 Each service that needs to receive failure commands from the Failure Inducer, has to have a Fate Book associated with it. Below there is a sample Fate Book for an example service (called test-service)
 
 ```yaml
@@ -107,16 +107,16 @@ failures:
 
 ```
 
-###Fate Book Sections
+### Fate Book Sections
 Next we take a closer look at the various sections of the fatebook
 
 
-####service:
+#### service:
 Required : Yes
 Default: None
 The value for service key is used to uniquely identify the service being specified in that fate book. Simoorg enforces that no two fate books can have the same value for the service key.
 
-####topology:
+#### topology:
 Required : Yes
 All values related to topology plugin should be stored under this section. We expect only two values under this key, they are as follows
 
@@ -130,7 +130,7 @@ The name of the topology plugin should be same as plugin class (please check the
 topology_config :  
 Any plugin specific values should be added to this section.Simoorg expects the config to be contained inside the main config directory and the path provided here is relative to the config root
 
-####logger
+#### logger
 Required : Yes
 
 Contains the logging related information, we expect it to contain the following keys
@@ -150,7 +150,7 @@ This key is used to enable console logging
 log_level :  
 Simoorg expects the value for this key to be "WARNING", "INFO", "VERBOSE" or "DEBUG"
 
-####healthcheck
+#### healthcheck
 Required : Yes
 
 In this section we list all of our health check related configs. the various keys we expect in this section are as follows
@@ -168,7 +168,7 @@ Depends on what plugin you use. In case of Defaulthealthcheck this is the absolu
 plugin_config :   
 Place to specify any plugin specific configurations, Currently is None Default Health check plugin.
 
-####destiny
+#### destiny
 Required : Yes
 
 This section is responsible for listing all the scheduler specific information. We expect the following keys to be present under the destiny section
@@ -187,7 +187,7 @@ scheduler_plugin| the name of the scheduler plugin| Yes | None|
 
 Please check the plugins document to better understand the plugin names. In addition to the keys listed above, the "scheduler_plugin" key could also contain any plugin specific config, also the failure name given in "scheduler_plugin"->failures->"failure_name" should have a valid failure definition in the failures sections of the fate book
 
-####failures
+#### failures
 This section includes a list of failure definition and each item in the list should contain the following keys
 
 Key name | Description | Mandatory | Default |
@@ -204,11 +204,10 @@ restor_handler->args | The args passed to the handler during failure revert | Ye
 wait_seconds |  The wait seconds between failure induction and failure revert | Yes | None |
 
 
-###Plugin Configs
-=================
+### Plugin Configs 
 These are config files that may be specific to some plugin. Since these configs are closely related to the plugins, we will mainly be covering configs for the plugins that are shipped out of the box.
 
-####Handler Configs
+#### Handler Configs
 
 For any handler plugin (lets assume the handler name is test_handler), we expect the config to be located in the path config/plugins/handler/test_handler/test_handler.yaml, the config contents greatly depends on the specific handler.The ShellScriptHandler plugin file for example,  looks like this :
 
@@ -217,7 +216,7 @@ For any handler plugin (lets assume the handler name is test_handler), we expect
 host_key_path: ~/.ssh/known_hosts
 ```
 
-####Topology Configs
+#### Topology Configs
 The location of the topology plugin is usually provided under the topology section of the fate book. Again the content of this configuration file depends heavily on the specific plugin.But here are two sample configuration files for StaticTopology and KafkaTopology plugins respectively. In StaticTopology we list all the servers present in the service under the key node
 ```
 # file: configs/plugins/topology/static/topo.yaml 
@@ -261,6 +260,3 @@ kafka_host_resolution:
         LEADER: {Topic: "Topic1"}
 
 ```
-
-
-
diff --git a/docs/design.md b/docs/design.md
@@ -1,4 +1,4 @@
-#SIMOORG - HIGH LEVEL DESIGN
+# SIMOORG - HIGH LEVEL DESIGN
 This document describes high level design of Simoorg: Linkedin’s Failure Inducing Framework. The rationale behind developing Simoorg is to have a simple yet powerful and extensible failure inducing framework. Simoorg is written in Python - Linkedin's lingua franca for solving operational challenges.
 
 
@@ -11,7 +11,7 @@ Key points of Simoorg are:
 * Comprehensive logging to help SREs and developers to get valuable insights about how their application of choice reacts to failures.
 * Support of heterogeneous infrastructure by introducing flexible execution handlers. New execution handlers are easy to plug in with minimal efforts.
 
-##From a bird's eye view
+## From a bird's eye view
 
 Simoorg’s main job is to induce and revert failures against a service of your choice. The failures are induced based on the scheduler plugin type you wish to use. Simoorg comes with a non-deterministic scheduler configured, which generates failures at a random time. Although the failures are generated at a random time, you can still set a few limitations like: total run duration and min/max gap between failures. Each failure is followed by a revert, ensuring that the cluster we operate against is back to a clean state. Simoorg ensures logging of important metrics like failure name, impact and the time of the impact to help SREs and developers to reason about fault tolerance of their application of choice. 
 
@@ -33,7 +33,7 @@ In the subsequent paragraphs we will cover important components and talk about h
 * Topology
 * Api Server
 
-###Moirai
+### Moirai
 
 Moirai is a single threaded process that monitors and manages individual Atropos instances using standard UNIX IPC mechanism and python queues. It also provides entry points for the Api Server to retrieve information about the various services being tested.  Moirai takes configs directory path as an input argument and bootstraps the framework by reading configuration files in the configs directory. Configs directory contains:
 
@@ -48,7 +48,7 @@ Here each Atropos can communicate specific information to Moirai, with the help
 ![High level Design](/docs/images/high_level.jpg)
 
 
-###Atropos
+### Atropos
 
 Upon initialization, each Atropos instance reads one Fate Book ([link][/docs/configs.rst]) and depending on the destiny defined in the Fate Book sleeps until requirements are met. Once requirements are met, Atropos induces a random failure, waits for the specified interval and reverts to bring the cluster to a clean state. There are two types of requirements to be met before inducing a failure:
 
@@ -60,12 +60,12 @@ Each Atropos instance has its own instance of a Scheduler which is in charge of
 Apart from the Scheduler, each Atropos instance has its own instance of a Handler,  Logger and Journal. The high level diagram reflecting Atropos and its components is as follows:
 ![Atropos Components](/docs/images/atr1.png)
 
-###Scheduler
+### Scheduler
 
 A Scheduler generates a failure plan and keeps track of time. Currently Simoorg ships only with a Non-deterministic scheduler. The Non-deterministic scheduler randomly generates dispatch times and associates them with random failures. We refer to this sequence of timestamp and failures internally as a Plan. Once generated, the Plan is passed to Atropos.
 
 
-###Handler
+### Handler
 
 Each failure definition should have a handler associated with it. A Handler is referred to by its name within a failure definition and is responsible for inducing and reverting failures. The table below lists supported handlers and handlers planned to be available in future:
 
@@ -77,19 +77,19 @@ AWS|AWS API calls|not supported|TBD|
 Rackspace|Rackspace API|not supported|TBD|
 
 
-###Journal
+### Journal
 
 Each Observer has a separate Journal instance. The Journal is responsible for:
 
 * Keeping track of the internal state of Atropos such as: current impact and impact limits
 * Persisting the current state of Atropos to support session resumption
 * Resuming state after a crash
 
-###Logger
+### Logger
 
 Each Atropos has a separate Logger instance. The Logger is used to log and store arbitrary messages spit out at various points of Plan execution.
 
-###HealthCheck
+### HealthCheck
 
 Healthcheck is an optional component that allows you to control the damage inflicted against your service. If enabled, Atropos kicks off the healthcheck logic defined in the Fate Book before inducing a failure. The Healthcheck component needs to return success in order for the failure run. Otherwise the Scheduler skips the current failure cycle. This ensures that we are not aggravating any existing issues and lets the cluster fire self-healing routines and recover. If a healthcheck is not defined, failures will be induced as scheduled assuming the cluster was able to recover.
 
@@ -109,7 +109,7 @@ The best practice is to leverage your current monitoring system to identify the
 
 We also ship a simple kafka HealthCheck out of the box. This plugin considers a cluster to be healthy if the under replicated partition count is zero for all the nodes in cluster. The plugin also depends on the kafka topology config file to get information about the cluster.
 
-###Topology
+### Topology
 
 The Topology component is responsible for identifying and keeping the list of nodes that constitute your service. In most cases this is just a list of servers present in your cluster. The Topology component is also responsible for choosing a random node from the list and handing it over to Atropos. We ship a static topology and Kafka topology plugins with our source code. 
 
@@ -128,7 +128,7 @@ Another example of topology is Kafka topology. It is a custom Topology component
 * RANDOM_LEADER - Where the node is a leader for a random topic and a random partition
 * LEADER - Where the node is a leader for a specific topic and a specific partition (if you skip the partition it randomly selects a partition)
 
-###Api Server
+### Api Server
 
 Simoorg provides a simple API interface based on Flask. The API server communicates with Moirai process through linux FIFOs, so it is necessary that the Api Server is started on the same server as the Moirai process. The API endpoints currently supported by our systems are
 

diff --git a/docs/index.md b/docs/index.md
@@ -1 +1 @@
-#SIMOORG
+# SIMOORG
diff --git a/docs/low_level.md b/docs/low_level.md
@@ -1,7 +1,7 @@
-#LOW LEVEL FAILURES
+# LOW LEVEL FAILURES
 Libfui provides an easy way to induce low level failures to any POSIX call in your application. To be able to use low level failures against POSIX calls, we require the application to be started under the control of libfiu. The best practice is to use these failures either on your staging/dev clusters or run on select nodes from your production cluster.
 
-Please check the libfiu website (https://blitiri.com.ar/p/libfiu/) to understand how to build and install libfiu on your servers. Once the libfiu packages are installed, please restart your application under the control of libfiu. You can achieve this using the fiu-run command ( see https://blitiri.com.ar/p/libfiu/doc/man-fiu-run.html ), the command should look something like the following
+Please check the [libfiu website](https://blitiri.com.ar/p/libfiu/) to understand how to build and install libfiu on your servers. Once the libfiu packages are installed, please restart your application under the control of libfiu. You can achieve this using the [fiu-run command](https://blitiri.com.ar/p/libfiu/doc/man-fiu-run.html), the command should look something like the following
 ```
 fiu-run -x -c $COMMAND
 ```

diff --git a/docs/plugins.md b/docs/plugins.md
@@ -1,7 +1,7 @@
-#How to create a new plugin:
+# How to create a new plugin:
 In simoorg, we have four types of pluggable component namely Topology, Healthcheck, Scheduler and Handler. Even though we ship a few standard plugins of each category, we understand that it will not meet the requirements of all the potential customers. So one our guiding design principles has been to ensure that system is easily extensible. So in this document, we will be detailing the various steps to be taken to create a new plugin. 
 
-##Topology
+## Topology
 First we start with the the topology plugin. Simoorg relies on the topology plugin to retrieve information about the individual nodes of a service. The arguments that are passed to any topology plugin is 
 *Args:*
 input_file - the config file to be read by the plugin
@@ -56,7 +56,7 @@ kafka_host_resolution:
         node_type_7:
             RANDOM_BROKER: {Topic: "Topic3"} 
 
-````
+```
 * This class reads the config file and loads it in memory data structure. At the    
                  time of failure induction, it returns a random host (broker host name) to the    
                  caller method. The selection of this host depends upon the kind of node selected      
@@ -67,7 +67,7 @@ In the above Kafka Topology plugin example, it is possible to modify the config
 Path to KafkaTopology plugin : simoorg.plugins.topology.KafkaTopology.KafkaTopology
 
 
-##HealthCheck : 
+## HealthCheck : 
 Healthcheck plugin is responsible for checking the health of the target cluster.
 *Args:*
 script - Any external script to be used by the plugin
@@ -92,7 +92,7 @@ Let’s take an example of *KafkaHealthCheck plugin* :
 
  If users want to use a shell script, that will do the HealthCheck on the target cluster, they can use the DefaultHealtCheck plugin in the fate book and pass it the customized shell_script. The DefaultHealthCheck plugin like KafkaHealthCheck plugin implements the check() method that will return true if the target cluster is healthy, else false otherwise.
 
-##Scheduler:
+## Scheduler:
 The Scheduler plugin is responsible for creating the plans that an atropos process will be following. A plan as received by atropos should be a list of single item dictionaries, where the dictionary has the failure name as the key and the trigger time as the value.
 *Args:*
     destiny_object - A dictionary containing the contents of the plugin key of the destiny   
@@ -116,7 +116,7 @@ Let us consider the example of NonDeterministicScheduler plugin:
 
 There are a number of fully implemented methods in BaseScheduler, that you can use in your implementation to better access the destiny object.
 
-##Handler
+## Handler
 Handler is the plugin responsible for actually inducing and reverting the failures
 *Args:*
 config_dir - This is the path to the simoorg config directory

diff --git a/docs/user_guide.md b/docs/user_guide.md
@@ -1,13 +1,13 @@
-#Introduction
+# Introduction
 This document describes the process of setting up and running simoorg against an application cluster.
-##Installation
+## Installation
 The system requirements for Simoorg are as follows
 OS: Any Linux distribution
 Python Version : Python-2.6
 Additional Python Modules: multiprocessing, yaml, paramiko
 
 Simoorg is currently distributed via pip, so to install the package please run the following command
-````
+```
 (sudo) pip install simoorg
 ```
 If you want to work with the latest code, please run the following commands
@@ -25,13 +25,13 @@ Once you have confirmed that the tests have passed, you can install the code by
 ```
 If you are planning to use ssh handler plugin to induce failures against a specific service cluster, please ensure that the user you are using to run simoorg have Passwordless SSH access to all the nodes in the cluster. You should also ensure that any failure scripts you plan to use are already present on all the nodes in the target service cluster.
 
-##Basic Usage
-Simoorg is started using the command *simoorg* which takes the path to your config directory as the only argument. Please check the config document ([link](/docs/config.md)) to better understand the configuration files.  The sample config directory packaged with the product can be used to set up your configs.
+## Basic Usage
+Simoorg is started using the command *simoorg* which takes the path to your config directory as the only argument. Please check the [config document](/docs/config.md) to better understand the configuration files.  The sample config directory packaged with the product can be used to set up your configs.
 ```
     Ex:     simoorg ~/configs/
 ```
 
-##Usage Example
+## Usage Example
 In this section of the document, we will be describing how to use Simoorg against a kafka cluster. For this examples we will be running three predefined failures (graceful stop, ungraceful stop and simulate full GC) on random nodes in the cluster using the Shell script handler plugin. We will be executing the failures in a random manner using the non deterministic scheduler.  We will also be using the Kafka Topology plugin and  Kafka HealthCheck plugin. Both of these plugins are packaged with the product and are ready to use out of the box.
 
 Before we start , we need to make sure that all the required failure scripts (the ones required for these failure scenario is present in the repo under Simoorg/failure_scripts/base/) are present on all the broker nodes in the kafka cluster. Let’s assume that the script is present in the location ~/test/failure_scripts/base/ on the kafka brokers, we will need this path later when we are updating our configurations.
@@ -182,4 +182,3 @@ Where ~/kafka_configs/ is the path to your failure inducer configs. For longer r
     gunicorn 'simoorg.Api.MoiraiApiServer:create_app("~/kafka_configs/api.yaml")'
 ```
 Where api.yaml should contain a valid path for the named pipe used by both the api server and Simoorg. Our current implementation of api, relies on the simoorg process to retrieve all information and do not serve any data once the process is dead. Please check the design doc to better understand the various REST API endpoints
-