Analyzing SimKube 1.0: How well does it work?

A crazed-looking duck in a hacker hoodie typing madly on a keyboard. on the computer monitor, green text is
scrolling by, matrix-style — For some reason, here is a duck. Generated by Bing Image Creator.

Ok, this post is the last part of my 3-part series on SimKube 1.0. Over the last couple of weeks, we walked through some background about SimKube, the Kubernetes Cluster Autoscaler (KCA) and Karpenter, and then performed an in-depth analysis of the two autoscalers based on simulation data. Today, we’re going to take a look at SimKube itself: its performance, some lessons that I learned from this set of experiments, and next steps in the Kubernetes simulation world. If you want to go back and read the previous posts from this series first (recommended), you can find them here:

Part 1 - Background and Motivation
Part 2 - Running the Simulations
Part 3 (this post) - Analysis and Following up

As before, I’ll note that all of the raw data for these experiments is publicly available for download here if you’re interested in trying it out on your own!

Who watches the Simulator?

One of the selling points of SimKube is that you can (supposedly) run simulations of multi-thousand-node Kubernetes clusters on your laptop. But, all the experiments I ran last week were run on an AWS c6i.8xlarge instance with 32 vCPUs and 64GB of RAM! I dunno about you, but my laptop doesn’t have 32 vCPUs and 64GB of RAM, so what gives?

As I hinted at briefly last week, it all comes down to metrics. SimKube itself actually ran fine, even with my largest simulations. I’m not going to include the graphs here because they’re not that interesting, but both the simulation controller and the simulation driver pods used a tiny fraction of the host’s available CPU and about 20MB of memory. The KWOK controller¹ similarly uses around half a core and 60-100MB of RAM; we only had about a maximum of 100 nodes in the Karpenter experiment, and more like 1000 nodes in the CA experiment². If we assume that this resource consumption scales linearly³ then you’re looking at maybe half a gig of memory for a 5000-node cluster, which is the largest size supported by “official” Kubernetes (again I don’t think these graphs are particularly interesting, you can see them in the Jupyter notebooks for the experiments)

Just for funsies, we can also take a look at the resource utilization for the Kubernetes control plane (apiserver, controller-manager, and scheduler). We’re not taxing Kubernetes itself particularly hard with this experiment, but this can give you an idea of the resource utilization by the k8s control plane for a middling-size cluster:

Bokeh Plot

Graphs showing the CPU utilization and working set size for the Kubernetes apiserver in the "large" simulations with the Kubernetes Cluster Autoscaler.

Bokeh Plot

Graphs showing the CPU utilization and working set size for the Kubernetes controller-manager in the "large" simulations with the Kubernetes Cluster Autoscaler.

Bokeh Plot

Graphs showing the CPU utilization and working set size for the Kubernetes scheduler in the "large" simulations with the Kubernetes Cluster Autoscaler.

These graphs are from the set of KCA experiments, which had a larger number of nodes, and thus are likely to put higher load on the control plane. You can see that the apiserver is the most heavily loaded, using 3-4 cores during periods of intense scaling activity, and around 2GB of RAM⁴. The above three graphs probably look pretty similar to stuff you have internally on Grafana or SignalFX or whatever monitoring tool you use⁵, so maybe not that interesting⁶, except to confirm that the Kubernetes control plane performs somewhat similarly in a simulated environment as a “real” environment⁷. The graphs for the Karpenter simulations look similar, but slightly lower resource utilization.

Ok, so SimKube itself doesn’t seem like it’s using that many resources, what’s going on that requires a giant EC2 inst—

Bokeh Plot

Prometheus CPU and memory utilization over the lifetime of the KCA simulation

Bokeh Plot

prom2parquet CPU and memory utilization over the lifetime of the KCA simulation

Oh. Oh my. What is going on here???

These graphs are showing the resource utilization for our metrics pipeline. If you recall, I’m using the standard⁸ kube-prometheus stack to collect metrics, and I’m using prom2parquet⁹ to export those metrics to an efficient¹⁰ storage format on S3. Also if you recall, I made the somewhat dubious decision to scrape metrics every second. So, what you can see here is that my Prometheus pod is using a solid 2-3 cores and a gradually-increasing-but-lets-call-it-5GB of RAM over continuously over the course of the simulation, and prom2parquet is using more like 6-10 cores and 30-ish GB of RAM over the course of the simulation! And now you see why I ran this on a fairly hefty EC2 instance instead of on my laptop.

Wait but how did you even analyze this data if there’s so much of it?

Ok, let’s get one thing out of the way first: this is not Big Data™️. A few hundred gigs of data stored in an S3 bucket isn’t even close to Big Data™️. But look, it’s still more than I can fit in my laptop’s memory, and I’d like to get some useful results out of this thing, so, like, how did I make all these pretty graphs?

Early on in the process, when I was just dealing with a couple gigs of data (i.e., for the smaller experiments), I was downloading the Parquet files locally, caching them for easy re-access later, and then using Pandas to munge the data¹¹. (I’m also using Bokeh to produce the graphs, which is the least-worst Python plotting utility out there). Pandas is…fine. It’s got a weird programming model, it uses a bunch of non-standard conventions and you have to learn what all the types mean, and I feel like I forget how it works every few months and have to go relearn it again, but, you know, whatever. It’s fine.

Anyways, I packaged all of my analysis tools up into a wrapper suite called DataKube and used that to do all my processing, and it worked pretty well until I tried to process the several hundred gee-bees from my larger experiments and promptly crashed my laptop.

After banging my head into a wall for a couple days, I took a step back to analyze my options. The nice thing about using Parquet as my storage format is that there are a bunch of data analytics platforms and tools out there that can work on the data. As far as I can tell¹², there are roughly four options here:

You can spin up a really beefy EC2 instance to do your data analysis, if you feel like giving AWS even more money than you already are.
You can use AWS Athena to run queries against your Parquet files that you’ve nicely stored in S3, if you feel like giving AWS even more money than you already are.
You can pay some third-party vendor to analyze your Parquet files for you¹³.
Or, you can use DuckDB, which is free and open-source.

After carefully considering my options and my bank account, I elected for option 4¹⁴.

So then I took two weeks to completely rewrite DataKube using SQL and DuckDB, I was finally able to do the analysis on these bigger simulations. I was really pleasantly surprised to discover that DuckDB is fast. Like, really fast. Even on the small simulation runs I started out with, Pandas took 30-40 seconds to crunch all the data into a nice digestible format; DuckDB did it basically instantaneously. And for the large simulation runs, Pandas couldn’t even finish crunching the data, but DuckDB does it in about 10 minutes¹⁵. So yea—I can definitely say that, if you’re trying to do this sort of analysis on your own, DuckDB is the way to go¹⁶.

Putting a bow on it

So there you have it, folks: a complete series on using SimKube to solve some “real” problems! I hope you enjoyed this series of posts, and I would love to hear from you if you have used SimKube or you want to use SimKube to help understand your own infrastructure better!

These last three blog posts required a significantly higher level of effort than the usual schlock that I throw up here and call writing, so I’m honestly probably going to take a break here for a few weeks. I definitely have a lot more things to say, and some articles will be coming out in some other venues soon, so don’t go anywhere! Thanks to all of my subscribers for reading and supporting the work that I’m doing at Applied Computing—it’s definitely appreciated.

Till next time,

~drmorr

If you recall, KWOK, or Kubernetes WithOut Kubelet, is the underlying controller that manages the pod and node lifecycles in the simulation environment. ↩
I only just noticed this difference when I was revisiting the data for this blog post, this is a substantial difference in cluster sizes between the two autoscalers! Of course, as I pointed out last week, you can definitely configure KCA to act more like Karpenter, but out-of-the-box, for this specific experiment, it definitely prefers a larger number of smaller nodes. ↩
Probably a bad assumption, but whatever. ↩
Also a reminder that the metrics collection for the cluster autoscaler experiments went a little wonky, so take these graphs with a grain of salt. ↩
Please don’t say DataDog. ↩
I debated not including these graphs either, but felt like I really needed to show some sort of graph in this post. ↩
I have no hard evidence to back this up, it’s based entirely on hearsay and previous anecdata. ↩
And by standard, I mean, I customized it a bunch. ↩
I previously blogged about prom2parquet here. ↩
And by efficient, I mean it still takes up several hundred gigabytes of space. ↩
I explored using Julia instead of Pandas, and I really wanted to like Julia, but it’s enough of a niche language and ecosystem that there were too many rough edges for me to be excited about it. ↩
And I’m writing this down so that if you’re in a similar position, you don’t have to do all this same research yourself. As best as I can tell, there’s no, like, “summary of good tools for this kind of thing” anywhere on the internet. Either you’re a data scientist and you already know all the good tools, or you’re not and you’re like, “all of these things use weird words and weird terms and I don’t even know how to start differentiating them”. ↩
And let’s be real, most of that money is going to AWS too, just more indirectly. ↩
I was actually fairly resistant to this option for a while, for reasons which can mostly be summed up as “Why’d you name this thing after a duck?” and “WAHHHHHHHH SQL!!!” ↩
The first time I did the analysis, it took closer to 45 minutes, because it had to fetch all the data from S3 first; I’m using the same pattern in DataKube where I cache data locally after the first fetch, which really speeds up the entire process significantly, and makes my network adapter much happier to boot. ↩
I would not, however, recommend using DataKube, at least not in its current state. There are some abstractions and things in the code that are holdovers from when it was primarily using Pandas that didn’t translate well to a more SQL-like environment, and—in the interest of getting this series out—I just hacked together some quick-and-dirty solutions. I expect to come back to DataKube at some point in the future, but I’m not sure exactly when that will happen. ↩