Sizing guide for your cluster

Introduction

When deploying the Kubescape/ARMO Platform in Kubernetes environments, it's important to tailor the configuration to your cluster's specific needs and scale to ensure optimal resource utilization. The default settings provided in the Helm chart are designed for smaller-scale deployments, typically supporting up to 20-40 worker nodes on average cloud Virtual Machines equipped with 4-8 vCPUs. However, as your Kubernetes cluster's size and complexity grows, configuration adjustments will be necessary to provide enough resources to Kubescape to operate in the cluster.

Kubescape components scale variably across different aspects of the cluster, influenced by both the size and the activity within the environment:

  • Node-agent: This component's performance is directly influenced by the size and specifications of the individual nodes. Larger nodes with more resources (CPU, memory) impact the resources required by the node-agent.
  • Storage component: The demands on the storage component scale with the number of resources within the cluster, such as Pods, Nodes, and Services. The storage needs to grow as the cluster expands.
  • KubeVuln: This component is susceptible to the size of the most significant container images used within the cluster. Larger images affect KubeVuln's memory requirements.

This guide shares information on scaling the Kubescape/ARMO Platform across various cluster sizes and configurations. We aim to provide clear guidelines on adjusting your deployment parameters to align with your specific operational requirements and the unique characteristics of your Kubernetes environment.

Together with this, the optimal sizing of these components depends on more parameters than mentioned here. We decided to create this guide in this way so it will be easy to adopt at the beginning of the deployment. We suggest to monitor these components over time and make adjustments if needed.

Node-agent

Node-agent is taking in an average ~1-5% CPU capacity of a Kubernetes Node, and it should not pass beyond ~10% in extreme cases.

Therefore, the suggested CPU allocations should be

  • 2.5% request of the CPU all the CPU the node has
  • 10% limit of the CPU all the CPU the node has

For example, a machine with 4vCPUs (4000ms capacity) should have a 100ms request and 400ms limit.

From a memory perspective, around 2.5% should be allocated by default as a request, and 10% should be the limit.

Therefore, the suggested memory allocations should be

  • 2.5% request of the memory of the node
  • 10% limit of memory of the node

For example, if a node has 8Gb of RAM, then at least 200Mb should be allocated as a “request” and 800Mb as a limit.

These numbers assume that runtimeDetection feature is turned on and nodeSbomGeneration is turned off.

  • When runtimeDetection is off, then the memory and CPU consumption is less by at least 25% in general
  • When nodeSbomGeneration is on, then the Node-agent uses more memory for a short period during image scans; therefore, we suggest leaving requests in place but raising the memory limit by 200Mb

Storage

As explained before, the cluster's storage is mainly affected by the number of resources. There is a direct connection between the number of resources, the activity of the workloads, and the memory requirements of this component. It is hard to give exact numbers, but we suggest the following technique as a rule of thumb.

Get the number of resources in the cluster, use this command:

kubectl get all -A --no-headers | wc -l

Then, set the memory requirements as 0.2x[number or resources] and the limit as 0.8x[number or resources].

For example, for a cluster with 1600 resources, you’d get:

  • Requests for a 320Mb
  • Limits of 1280Mb

If the storage driver supports directIO, the memory usage (both requests and limits) can be divided by 2!

A few examples of storage drivers supporting directIO:

  • Local volumes
  • AWS EBS
  • GCE Persistent Disks
  • Azure Disk
  • OpenEBS LocalPV

This component is less sensitive to CPU requests and limits settings. However, in big clusters (5000 or more resources) CPU limits should be adjusted to prevent CPU throttling, which slows down garbage collection and causes memory ballooning.

KubeVuln

This component has two operational modes:

  • It builds SBOMs from images and scans results for vulnerabilities
  • It only scans results for vulnerabilities (this happens when both nodeSbomGeneration is set to enable and the registry scanning feature is not used)

nodeSbomGeneration is a new feature that is still in the early adaptor phase and has yet to be proven to be stable. This is going to be our preferred and default setting in the near future due to the fact that when images are scanned on the Kubernetes nodes, the memory consumption per scan is 90% less.

If it is not enabled and SBOMs are generated by this component, then (and only then) the following applies.

KubeVuln needs to load the images to the RAM in addition to the space required to list all packages (which is directly related to the SBOM size). As a result, the memory scaling should be relative to the size of the most significant images in the cluster.

If the image size is X, then X+400Mb should be set as the limit.
If the most significant image is 1 GB, then the KubeVuln should have a limit of no less than 1.4 GB.

Ephemeral storage must be adjusted to fit all the uncompressed layers in temporary files.