Sizing guide for your cluster

Introduction

When deploying the Kubescape/ARMO Platform in Kubernetes environments, it's important to tailor the configuration to your cluster's specific needs and scale to ensure optimal resource utilization. The default settings provided in the Helm chart are designed for smaller-scale deployments, typically supporting up to 20-40 worker nodes on average cloud Virtual Machines equipped with 4-8 vCPUs. However, as your Kubernetes cluster's size and complexity grows, configuration adjustments will be necessary to provide enough resources to Kubescape to operate in the cluster.

Kubescape components scale variably across different aspects of the cluster, influenced by both the size and the activity within the environment:

Node-agent: This component's performance is directly influenced by the size and specifications of the individual nodes. Larger nodes with more resources (CPU, memory) impact the resources required by the node-agent.
Storage component: The demands on the storage component scale with the number of resources within the cluster, such as Pods, Nodes, and Services. The storage needs to grow as the cluster expands.
KubeVuln: This component is susceptible to the size of the most significant container images used within the cluster. Larger images affect KubeVuln's memory requirements.

This guide shares information on scaling the Kubescape/ARMO Platform across various cluster sizes and configurations. We aim to provide clear guidelines on adjusting your deployment parameters to align with your specific operational requirements and the unique characteristics of your Kubernetes environment.

Together with this, the optimal sizing of these components depends on more parameters than mentioned here. We decided to create this guide in this way so it will be easy to adopt at the beginning of the deployment. We suggest to monitor these components over time and make adjustments if needed.

Node-agent

Node-agent is taking in an average ~1-5% CPU capacity of a Kubernetes Node, and it should not pass beyond ~10% in extreme cases.

Therefore, the suggested CPU allocations should be

2.5% request of the CPU all the CPU the node has
10% limit of the CPU all the CPU the node has

For example, a machine with 4vCPUs (4000ms capacity) should have a 100ms request and 400ms limit.

From a memory perspective, around 2.5% should be allocated by default as a request, and 10% should be the limit.

Therefore, the suggested memory allocations should be

2.5% request of the memory of the node
10% limit of memory of the node

For example, if a node has 8Gb of RAM, then at least 200Mb should be allocated as a “request” and 800Mb as a limit.

These numbers assume that runtimeDetection feature is turned on and nodeSbomGeneration is turned off.

When runtimeDetection is off, then the memory and CPU consumption is less by at least 25% in general
When nodeSbomGeneration is on, then the Node-agent uses more memory for a short period during image scans; therefore, we suggest leaving requests in place but raising the memory limit by 200Mb

Storage

As explained before, the cluster's storage is mainly affected by the number of resources. There is a direct connection between the number of resources, the activity of the workloads, and the memory requirements of this component. It is hard to give exact numbers, but we suggest the following technique as a rule of thumb.

Get the number of resources in the cluster, use this command:

kubectl get all -A --no-headers | wc -l

Then, set the memory requirements as 0.2x[number or resources] and the limit as 0.8x[number or resources].

For example, for a cluster with 1600 resources, you’d get:

Requests for a 320Mb
Limits of 1280Mb

If the storage driver supports directIO, the memory usage (both requests and limits) can be divided by 2!

A few examples of storage drivers supporting directIO:

Local volumes
AWS EBS
GCE Persistent Disks
Azure Disk
OpenEBS LocalPV

This component is less sensitive to CPU requests and limits settings. However, in big clusters (5000 or more resources) CPU limits should be adjusted to prevent CPU throttling, which slows down garbage collection and causes memory ballooning.

KubeVuln

This component has two operational modes:

It builds SBOMs from images and scans results for vulnerabilities
It only scans results for vulnerabilities

KubeVuln needs to load the images into RAM, in addition to the space required to list all packages (which is directly related to the SBOM size). As a result, the memory scaling should be relative to the size of the most significant images in the cluster.

If the image size is X, then X+400Mb should be set as the limit.
If the most significant image is 1 GB, then the KubeVuln should have a limit of no less than 1.4 GB.

Ephemeral storage must be adjusted to fit all the uncompressed layers in temporary files.

Updated 10 days ago