Skip to content
This repository has been archived by the owner on Jul 19, 2023. It is now read-only.

Latest commit

 

History

History
351 lines (261 loc) · 17 KB

README.md

File metadata and controls

351 lines (261 loc) · 17 KB

Welcome to Collectington

Collectington calls 3rd party APIs to gather data/metrics and translates those metrics for Prometheus to monitor

How does it work?

  • This application (Python3) can run in any environment as it can be installed using pip command

    pip install collectington

  • Once the service is running, the service will make an API call every 60 sec to read data (metrics) from 3rd party services. The read data must be translated based on the logic you provide so Prometheus is able to monitor.

  • Collectington uses Prometheus-Client and we have automated a lot of steps that require writing many lines of code. Collectington removes the process of:

    • Having to define logic for calling APIs
    • Instantiating prometheus client class many times
    • Mapping prometheus metric to each function using the client
    • Calculating metrics to avoid double counting
    • Caching API call response to minimize requests
  • All you need to use Collectington is a single json config and a custom API class.

How do I start using Collectington?

  1. Install Collectington using the below command

    pip install collectington

  2. Create your configuration file. Link

    e.g. config.json

    Here's an example config file:

    {
        "api_call_intervals" : 2,
        "log_level" : "INFO",
        "services" : {
            "splunk" : {
                "service_class" : "SplunkApi",
                "service_module" : "splunk_api",
                "port" : 8000,
                "api_url" : "https://api.victorops.com/api-reporting/v2/incidents",
                "prometheus_metrics_mapping" : {
                    "counter" : [
                    "number_of_incidents"
                    ],
                    "summary" : [
                    "time_taken_to_acknowledge",
                    "time_taken_to_resolve"
                    ]
                },
                "secret_file_path" : "./splunk",
                "api_key" : "",
                "api_id" : ""
            }
        }
    }
    

    Let's take a closer look at this config file

    • port (required): the port you want to run your service.
    • api_call_intervals (required): the interval between each API call in seconds
    • log_level (required): level of logging you want. It is currently under development.
    • services (required):
      • There are a number of components to pay attention to. This requires a dictionary with key being the name of your service. Inside your service dictionary, below is a detailed explanation of what these are

      • service_class (required): this is the name of your service class that you create to define your metrics.

      • service_module (required): this is the name of your module or filename that contains your service_class

      • api_url (required): this is the API endpoint that you want to get data from.

      • prometheus_metrics_mapping(required):

        • Define prometheus_metrics_mapping in a dictionary

        • Prometheus client includes 4 metrics - Counter, Summary, Histogram, and Gauge:

          • The Counter type is useful for tracking increasing metrics (i.e. number of calls)
          • Gauge should be used when you want to track the exact value of a metric. We've setup collection to report the observe() value of the metric when sending to prometheus.
          • Summary should be used for recording statistics on aggregated values. If you have many metrics per API_CALL_INTERVALS, you can use summary type to get the <METRIC>_sum, <METRIC>_count, and <METRIC>_created on prometheus for free.
          • Histogram can be used similarly as Summary except it has different metrics aggregations (it groups by percentile buckets).
          • Please refer to the official Prometheus client documentation additional details.
        • This instantiation step is automated from the Collection Service and can be configured from the config file as below:

          e.g. number_of_instances metric will be auto-instantiated with Counter

          PROMETHEUS_METRICS_MAPPING = {
              'counter': ['number_of_incidents'],
              'summary': ['time_taken_to_acknowledge', 'time_taken_to_resolve']
          }
          
      • The rest of the config fields are completely optional. If you want to add any custom config to read from it, that is completely fine. The only optinal config that is built-in with Collectington is the prometheus_metric_labels. This fields is reserved for collectingon for its own use.

      • prometheus_metric_labels (optional):

        • This adds labels to the given metrics. Based on the above examples the labels could look like:
        "prometheus_metrics_mapping" : {
                "counter" : [
                "number_of_incidents"
                ],
                "summary" : [
                "time_taken_to_acknowledge",
                "time_taken_to_resolve"
                ]
            }
        
        • Whenever a metric is defined with labels there are a couple of changes that need to be done or else the collection service will error. These are as follow:

          • The user-defined function in the *_api.py that gets metrics with labels (i.e. get_number_of_incidents) will need to return a different type of datastructure. It should return a list of dict that has every label as keys and one extra key that stores the actual metric value.
            • i.e. for a metric that has labels ["team_name", "stack"] the returned metric must look like below (even if it's only one element):
            [
                {
                    "team_name": "some_label_value",
                    "stack": "prd",
                    "metric_value": 123
                },
                {
                    "team_name": "some_label_value",
                    "stack": "sbx",
                    "metric_value": 34
                }
            ]
            
            • Notice that this is different from the normal user-defined functions which return a single numeric value.
          • There should be one dict for each unique combination of label values; that means team_name = some_label_value and stack = prd cannot have appear a second time (from above example) since they already have a metric value.
          • The keys that correspond to the labels must match exactly (case-sensitive).
          • The key for the metric can be anything, but must be there.
          • You cannot use the @enable_delta_metric decorator with labeled metrics, instead change your function to calculate the delta or change the metric type to be gauge.
  3. Create an API Class. Link

    You will find an example API Class below. Let's take a look at a closer look at this file.

    Imports

    • The first task is to import collectington library. You would need collectington.config in order to read config properly

    • You would also need to import certain class and functions from collectington.collectington_api from the library

      • enable_delta_metric (optional): This is optional but will be useful as metrics data collected is cumulative. This will avoid double counting any metric
      • register_metric_class (required): This is required as you would have to register your own class to Collectington's metric.
      • register_metric (required): This allows you to register your metrics functions to be used/called by the main application.

    Class init

    • There are certain fields that are required to use collectington.

      def __init__(self):
          super(SplunkApi, self).__init__()
      
          self.config = get_config("config.json")
          self.service_name = "splunk"
          ...
      
    • self.config (required): you have to provide your config path by using the get_config function

    • self.service_name (required): you have to provide your service name that matches the service name in the config file

    • self.headers (optional): this is required if you need to send header information. Collectington uses the requests library so it works the same way.

    • self.params (optional): this is required if you need to add params to your API URL. If you need custom params for each metric method. You can simply override it from a method.

    • self.name_of_datastore (required): this is required as this will ensure that your API is cached and not making unnecessary calls for every metric

    • The rest of the init attributes will be different for each case. This example uses the custom utils functions to get credentials for its API key.

    Class Metric Methods

    • This is the core of your application. Collectington allows you to only worry about how to call 3rd party APIs and get the metrics. The rest is automated by Collectington. You just need to simply define your metric methods and register it using the imported decorator.

    • Here's how to implement Metrics in detail

      If you decide to add another service - new_api, you must do the following.

      • Create a new module - e.g. new_api.py
        • Create a subclass called NewApi which inherits from the CollectingtonApi class.

        • Ensure that it is using @register_metric_class class decorator to ensure metrics defined in your class are going to be registered to be used.

          @register_metric_class
          class NewApi(AbstractApi):
          
        • CollectingtonApi is an abstract class that includes common implementations which can be used across most services - read data, cache data, instantiate Prometheus metrics and etc.

        • Since the logic for each metric for each 3rd party service is different, user of the class must implement custom metrics logics from the newly created subclass.

        • You may also override common methods if you require custom implementations.

        • You must define and implement all metrics defined from the config

          e.g.

          def get_number_of_incidents(self):
              response = self.get_data_from_store(self.name_of_datastore)
              total_incidents= response['total']
              return total_incidents
          
        • Once you define a metric, you must register your method using a decorator. This will ensure that the name of the metric that you defined in your config will be mapped to a correct method defined in this class.

          e.g.

          @register_metric("time_taken_to_resolve")
          def get_time_taken_to_resolve(self):
              response = self.get_data_from_store(self.name_of_datastore)
              list_of_time_triggered_and_resolved = []
          
              ...
          
        • (Optional) Use @enable_delta_metric decorator

          • This is to ensure that the metric you are sending to Promteheus is not double counted.

          • For example, let's say an API returns 1,000 which is a YTD total number of incidents. If an API is queried again after a minute and returns 1,100, without @enable_delta_metric, you are sending 1,100 to Prometheus. This will result in double counting and total number of incidents will become 2,100.

            e.g.

            @register_metric("number_of_incidents")
            @enable_delta_metric
            def get_number_of_incidents(self):
                response = self.get_data_from_store(self.name_of_datastore)
                total_incidents = response["total"]
                return total_incidents
            
          • @enable_delta_metric will send the difference between the previous data (1,000) and the latest data (1,100) to ensure double counting is avoided.

        • (Optional) Override _update_metric

          • This method is to determine which Prometheus method will be used for each metric. If you need custom behaviour, you can override this method.

            e.g.

            if isinstance(p_instance, Counter):
                # inc is a method from Prometheus client
                p_instance.inc(
                    service_metric_dict[metric]
                )
            
  4. Once the above steps are complete, you can run the below command to spin up Collectington. Collectington has its own command line so you don't have to run the service as a python module

    cton -s <SERVICE_NAME> -c <CONFIG_PATH>

    • Please note that your service name must match the name defined in your config file.

Example Service Usage

  • We have in fact created a working service as an example using Splunk API. You can go to the example directory to see it.

  • cd inside of the example directory and run the below command to see results.

    cton -s splunk -c ./config.json

  • Please note that since this service requires API credentials and requires to read from splunk file that contains the secrets, you have to create the file to run it locally in below foramt

    API_KEY="<YOUR_API_KEY>"
    API_ID="<YOUR_API_ID>"
    
  • This is the example service class.

    ...
    
    from collectington.config import *
    from collectington.collectington_api import (
        CollectingtonApi,
        enable_delta_metric,
        register_metric_class,
        register_metric,
    )
    
    
    @register_metric_class
    class SplunkApi(CollectingtonApi):
        """
        This class is the main class for calling Splunk On-Call API to generate
        custom metrics to be read by Prometheus. This class inherits from an abstract class
        to use common methods.
        The Splunk API returns data in a JSON format which requires iteration to retrieve
        desired data:
            - The Number of Incidents
            - The Number of Incidents Per team
            - Meantime to Acknowledge
            - Meantime to Resolve
        This class includes 3 major processes:
            1. Call an API to retrive data
            2. Define each metric logic as a method
            3. Implementing abstract methods to be called from the main run
        """
    
        def __init__(self):
            super(SplunkApi, self).__init__()
    
            self.config = get_config("config.json")
            self.service_name = "splunk"
            self.api_url = self.config["services"][self.service_name]["api_url"]
    
            dict_of_credentials = get_credentials_from_secret_file(
                self.config["services"][self.service_name]["secret_file_path"]
            )
    
            self.api_id = dict_of_credentials.get("API_ID", "")
            self.api_key = dict_of_credentials.get("API_KEY", "")
    
            self.headers = {
                "X-VO-Api-Key": self.api_key,
                "X-VO-Api-Id": self.api_id,
            }
    
            self.params = {"startedAfter": get_iso_timestamp_x_min_ago(1)}
            self.name_of_datastore = "splunk_datastore"
    
        ...
    
        @register_metric("number_of_incidents")
        @enable_delta_metric
        def get_number_of_incidents(self):
            self.params = {}  # override params to get all time total figure
    
            response = self.get_data_from_store(self.name_of_datastore)
            total_incidents = response["total"]
    
            return total_incidents
    
        ...
    
    

Development & Contribution

Since this is an open source project, anyone is welcome to contribute towards Collectington

To do development work on this project make sure you follow these steps.

  1. Have your local virtualenv setup and add tests for each service.

  2. Since we want to ensure that new features are runnable from running below,

    pip install collectington

    it is recommended that you create a local library to test installation and run the service.

  3. You can run the below command to build a whl file

    python setup.py sdist bdist_wheel

  4. Once the whl file is created, you can run pip install <PATH_TO_YOUR_WHL_FILE> to install the package locally.

Testing

To run all the tests, use the virtual environment and run python -m unittest discover