Workload Auto-scaling

Background

Application workloads for anonymizing data can vary significantly based on the size of the source data and anonymization configuration. These workloads run on an internal system which breaks them up into discrete tasks and runs each task on a worker. Processing large datasets and datasets with advanced Quasi-Identifier (QID) treatment require large amounts of memory to complete. This means provisioning Kubernetes nodes with the appropriate amount of memory.

In a default installation all of the application workloads run on a pair of nodes that run 24x7, however, it is possible to configure a second tier of nodes dedicated to only the larger tasks. If enabled, the auto-scaling second tier can be scaled to zero when there are no large tasks to complete.

The task server

A task server is built-in to the software and it provisions and monitors data anonymization tasks. Each time a project is run, the task server takes over and handles data processing for each dataset within the project. Each step of the anonymization process is handled by special purpose tasks. There are several different types of tasks within the system, but they can roughly be broken down into data load, data treatment, and data write.

When a project run starts the application breaks the workload into tasks which are categorized as "big" or "small" based initially on the type of task. If the task has been run before the application will take into account how much memory it used previously. For example, batched reading and writing of data is categorized as "small" and treating columns in a table with millions of rows is categorized as "big."

There is a big task queue and a small task queue. After the tasks are categorized by the application they are placed onto the appropriate queue. Big workers pull tasks from the big queue and small workers pull tasks from the small queue.

Big tasks run on a dedicated worker pod, whereas small tasks run on shared worker pods that can run a few tasks concurrently.

Enabling auto-scaling

In a default installation, all the pods, including the big worker pod, remain active whether any tasks are queued or running. This means the Kubernetes nodes will be running even when they are underutilized. With the auto-scaling feature enabled, big worker pods will be terminated upon job completion. When connected to a node scaling solution like Karpenter, the nodes that the pods run on can be terminated, reducing node costs.

To enable this scale-to-zero setup it's important to create two tiers of nodes.

A smaller pool of nodes which runs all the time.
- This pool will run the user interface, the backend API, the small workers, the queue of work, and the observability components (including the pod scaler).
An auto scaling pool of larger nodes.
- This pool can scale to zero.
- The pool should be dedicated to the big workers. They tolerate a NoSchedule taint called "big-worker-only." The other pods do not tolerate this so adding this to nodes in this pool dedicates the pool the big worker pods.
- Since it is a dedicated pool it should have slightly more CPU/Memory than you've configured the workers to use in the Privacy Dynamics configuration. This accounts for the node overhead.