Creating a Benchmark

Please note that to simply run a benchmark, it is not required to define one. Theodolite comes with a set of benchmarks, which are ready to be executed. See the fundamental concepts page to learn more about our distinction between benchmarks and executions.

A typical benchmark looks like this:

apiVersion: theodolite.rocks/v1beta1
kind: benchmark
metadata:
  name: example-benchmark
spec:
  sut:
    resources:
      - configMap:
         name: "example-configmap"
         files:
           - "uc1-kstreams-deployment.yaml"
  loadGenerator:
    resources:
      - configMap:
         name: "example-configmap"
         files:
            - uc1-load-generator-service.yaml
            - uc1-load-generator-deployment.yaml
  resourceTypes:
    - typeName: "Instances"
      patchers:
        - type: "ReplicaPatcher"
          resource: "uc1-kstreams-deployment.yaml"
  loadTypes:
    - typeName: "NumSensors"
      patchers:
        - type: "EnvVarPatcher"
          resource: "uc1-load-generator-deployment.yaml"
          properties:
            variableName: "NUM_SENSORS"
            container: "workload-generator"
        - type: "NumSensorsLoadGeneratorReplicaPatcher"
          resource: "uc1-load-generator-deployment.yaml"
          properties:
            loadGenMaxRecords: "150000"
  slos:
    - name: "lag trend"
      sloType: "lag trend"
      prometheusUrl: "http://prometheus-operated:9090"
      offset: 0
      properties:
        threshold: 3000
        externalSloUrl: "http://localhost:80/evaluate-slope"
        warmup: 60 # in seconds
  kafkaConfig:
    bootstrapServer: "theodolite-kafka-kafka-bootstrap:9092"
    topics:
      - name: "input"
        numPartitions: 40
        replicationFactor: 1
      - name: "theodolite-.*"
        removeOnly: True

System under Test (SUT), Load Generator and Infrastructure

In Theodolite, the system under test (SUT), the load generator as well as additional infrastructure (e.g., a middleware) are described by Kubernetes resources files. All resources defined for the SUT and the load generator are started and stopped for each SLO experiment, with SUT resources being started before the load generator. Infrastructure resources are kept alive throughout the entire duration of a benchmark run. They avoid time-consuming recreation of software components like middlewares, but should be used with caution so that earlier SLO experiments do not influence later ones.

Resources

The recommended way to link Kubernetes resources files from a Benchmark is by bundling them in one or multiple ConfigMaps and refer to that ConfigMap from sut.resources, loadGenerator.resources or infrastructure.resources.

Note: Theodolite requires that each resources file contains only a single resource (i.e., YAML document).

To create a ConfigMap from all the Kubernetes resources in a directory run:

kubectl create configmap <configmap-name> --from-file=<path-to-resource-dir>

Add an item such as the following one to the resources list of the sut, loadGenerator or infrastructure fields.

configMap:
  name: example-configmap
  files:
  - example-deployment.yaml
  - example-service.yaml

Actions

Sometimes it is not sufficient to just define resources that are created and deleted when running a benchmark. Instead, it might be necessary to define certain actions that will be executed before running or after stopping the benchmark. Theodolite supports actions, which can run before (beforeActions) or after afterActions all sut, loadGenerator or infrastructure resources are deployed. Theodolite provides two types of actions:

Exec Actions

Theodolite allows to execute commands on running pods. This is similar to kubectl exec or Kubernetes’ container lifecycle handlers. Theodolite actions can run before (beforeActions) or after afterActions all sut, loadGenerator or infrastructure resources are deployed. For example, the following actions will create a file in a pod with label app: logger before the SUT is started and delete if after the SUT is stopped:

  sut:
    resources: # ...
    beforeActions:
      - exec:
          selector:
            pod:
              matchLabels:
                app: logger
            container: logger # optional
          command: ["touch", "file-used-by-logger.txt"]
          timeoutSeconds: 90
    afterActions:
      - exec:
          selector:
            pod:
              matchLabels:
                app: logger
            container: logger # optional
          command: [ "rm", "file-used-by-logger.txt" ]
          timeoutSeconds: 90

Theodolite checks if all referenced pods are available for the specified actions. That means these pods must either be defined in infrastructure or already deployed in the cluster. If not all referenced pods are available, the benchmark will not be set as Ready. Consequently, an action cannot be executed on a pod that is defined as an SUT or load generator resource.

Note: Exec actions should be used sparingly. While it is possible to define entire benchmarks imperatively as actions, it is considered better practice to define as much as possible using declarative, native Kubernetes resource files.

Delete Actions

Sometimes it is required to delete Kubernetes resources before or after running a benchmark. This is typically the case for resources that are automatically created while running a benchmark. For example, Kafka Streams creates internal Kafka topics. When using the Strimzi Kafka operator, we can delete these topics by deleting the corresponding Kafka topic resource.

As shown in the following example, delete actions select the resources to be deleted by specifying their apiVersion, kind and a regular expression for their name.

  sut:
    resources: # ...
    beforeActions:
      - delete:
          selector:
            apiVersion: kafka.strimzi.io/v1beta2
            kind: KafkaTopic
            nameRegex: ^some-internal-topic-.*

Load and Resource Types

Benchmarks need to specify at least one supported load and resource type for which scalability can be benchmarked.

Load and resource types are described by a name (used for reference from an Execution) and a list of patchers. Patchers can be seen as functions, which take a value as input and modify a Kubernetes resource in a patcher-specific way. Examples of patchers are the ReplicaPatcher, which modifies the replica specification of a deployment, or the EnvVarPatcher, which modifies an environment variable. See the patcher API reference for an overview of available patchers.

If a benchmark is executed by an Execution, these patchers are used to configure SUT and load generator according to the load and resource values set in the Execution.

Service Level Objectives SLOs

SLOs provide a way to quantify whether a certain load intensity can be handled by a certain amount of provisioned resources. In Theodolite, SLOs are evaluated by requesting monitoring data from Prometheus and analyzing it in a benchmark-specific way. An Execution must at least define one SLO to be checked.

A good choice to get started is defining an SLO of type generic:

- name: droppedRecords
  sloType: generic
  prometheusUrl: "http://prometheus-operated:9090"
  offset: 0
  properties:
    externalSloUrl: "http://localhost:8082"
    promQLQuery: "sum by(job) (kafka_streams_stream_task_metrics_dropped_records_total>=0)"
    warmup: 60 # in seconds
    queryAggregation: max
    repetitionAggregation: median
    operator: lte
    threshold: 1000

All you have to do is to define a PromQL query describing which metrics should be requested (promQLQuery) and how the resulting time series should be evaluated. With queryAggregation you specify how the resulting time series is aggregated to a single value and repetitionAggregation describes how the results of multiple repetitions are aggregated. Possible values are mean, median, mode, sum, count, max, min, std, var, skew, kurt, first, last as well as percentiles such as p99 or p99.9. The result of aggregation all repetitions is checked against threshold. This check is performed using an operator, which describes that the result must be “less than” (lt), “less than equal” (lte), “greater than” (gt) or “greater than equal” (gte) to the threshold.

If you do not want to have a static threshold, you can also define it relatively to the tested load with thresholdRelToLoad or relatively to the tested resource value with thresholdRelToResources. For example, setting thresholdRelToLoad: 0.01 means that in each experiment, the threshold is 1% of the generated load. Even more complex thresholds can be defined with thresholdFromExpression. This field accepts a mathematical expression with two variables L and R for the load and resources, respectively. The previous example with a threshold of 1% of the generated load can thus also be defined with thresholdFromExpression: 0.01*L. For further details of allowed expressions, see the documentation of the underlying exp4j library.

In case you need to evaluate monitoring data in a more flexible fashion, you can also change the value of externalSloUrl to your custom SLO checker. Have a look at the source code of the generic SLO checker to get started.

Kafka Configuration

Theodolite allows to automatically create and remove Kafka topics for each SLO experiment by setting a kafkaConfig. bootstrapServer needs to point your Kafka cluster and topics configures the list of Kafka topics to be created/removed. For each topic, you configure its name, the number of partitions and the replication factor.

With the removeOnly: True property, you can also instruct Theodolite to only remove topics and not create them. This is useful when benchmarking SUTs, which create topics on their own (e.g., Kafka Streams and Samza applications). For those topics, also wildcards are allowed in the topic name and, of course, no partition count or replication factor must be provided.

Example: TeaStore