Safeguarding Kubernetes with Defense in Depth

These days, security is top-of-mind for everyone involved in IT. It seems increasingly difficult to string together more than a few days without a major incident. If it’s not a massive data breach, it’s a major flaw in popular software. A couple weeks ago, the Marriott breach was revealed and just last week we witnessed the first major security vulnerability tied to Kubernetes.

It’s not just that attackers are becoming more sophisticated. A developer-centric approach to IT, while empowering, changes the attack surface and unintentionally exposes threat vectors through the very tools relied on to produce applications.

Security must be multi-layered

The days when perimeter security was sufficient are well behind us. While securing the perimeter is still necessary, today’s security landscape requires multiple layers of detection and remediation. This approach is commonly referred to as defense in depth.

In many ways, cybersecurity can and should be approached in a manner that is similar to that of a modern-day hotel security system. The hotel might be surrounded by a wall and the entrance monitored with a gate and security guards, but the security does not end there. As guests check in, their identities are verified through both an ID and a credit card before the hotel issues a room key. Inside the hotel, hallways are monitored with cameras and staff. And once inside the room, guests can utilize the safe with a personal passcode to keep their belongings secure.

This defense in depth approach to security means that would-be bad actors are not in the clear once they have breached the outer perimeter. Instead, they have to continue to stay undetected while trying to gain further access.

This example of the multi-layer security measures offered by hotels can be compared to enterprise IT security. Firewalls, microsegmentation, application inspection, threat intelligence and automated remediation all provide layers of security for applications and data.

Defense in depth protection to isolate chaos

If you have not seen the latest Kubernetes issue, you can find out more about it here.

The Kubernetes privilege escalation flaw allows would-be attackers to get full administrator access on any compute node within a cluster. Even worse, this can all be done from a Kubernetes API server, which means that the attack originates from endpoints that are authenticated (and hence trusted) with Kubernetes’ TLS credentials. This renders authentication-based security approaches ineffective. Also, the attack won’t generate easy-to-identify logs, making detection difficult and remediation challenging.

The community has responded quickly with patches to Kubernetes. But to patch the “hole”, there must be an upgrade to the orchestration software. The challenge is that it is not always easy to gain access to and apply patches to production systems:

For enterprises that rely on any given month for a disproportionate part of their retail business, taking production systems down can be a daunting task. This is why so many companies have change freezes from November through the new year.
If they are working with upstream versions of code, the burden is even worse.
Upgrading to recent versions of Kubernetes means additional requirements with regard to TLS and namespace controls. These, in turn, have an impact on development and deployment cycles.

In the best of scenarios, systems are vulnerable until they are patched. In the worst cases, that window is long enough to compromise the cluster and, effectively, the enterprise.

Moving beyond authentication-based security

The entire premise of defense in depth is that if one measure is ineffective, there are others to backstop the enterprise. In this case, if auth-based controls within a Kubernetes cluster are compromised, it is useful to leverage host-based firewalling. While the Kubernetes/OpenShift administrators finalize their remediation strategy, the security administration teams need another way to respond.

Our implementation of the solution to combat this vulnerability relies on tagging the API-Server with the version it is currently running and then creating policies based on that tag to tear down the TCP connections from endpoints to whom an error response has been sent back by the server.

Juniper® Contrail® Security brings together the power of distributed L4 enforcement via Contrail’s vRouter (performs forwarding, enforcement and reporting) Node and distributed L7 enforcement via Juniper Networks® cSRX Container Firewall.

First Response

The solution can be deployed in a staged manner. The first response can be implemented immediately after the vulnerability is disclosed, giving the Kubernetes administrators time to strategize the most effective method of patching and upgrading the now-vulnerable environment. The first response can block off all remote invocations (clusters that are running the affected versions and have not yet been patched with the fix will be identified by the appropriate tag) until a finer grained blocking approach is implemented.

Developer <—- https —-> K8S API Server <—- https —-> K8s Node

This first response approach is captured in this demo video:

Deeper Defense

With the first response defense deployed and in place protecting the Kubernetes clusters, a finer-grained and more robust secondary defense mechanism must be deployed. This finer grained secondary defense mechanism leverages the SRX’s Intrusion Detection and Prevention (DP) policy to inspect the HTTP headers and status codes to detect a request for the connection to be upgraded to a WebSocket and a corresponding error response from the backend. This secondary defense mechanism tears down exactly those problematic connections where the upgrade requests are met with the error response.

Developer <—- https —-> cSRX NGFW <—- https —-> K8S API Server <—- https —->

This deeper defense approach is captured in this demo video:

Contrail Security enables the expression of security intent using tags that identify attributes of the workload to be protected. In this instance, based on the version of Kubernetes and whether the patch has been applied or not, appropriate tags are attached to the API Server.

A tag-based policy dictates that traffic from clients to affected Kubernetes clusters be directed via the distributed containerized next-gen firewall, the cSRX, so the traffic may be inspected for the HTTP/1.1 payloads to detect the WebSocket protocol handshake.

The cSRX is programmed to look for the problematic WebSockets handshake via the presence of “Connection: Upgrade” and “Upgrade: WebSocket” within HTTP requests directed at the API Server. Further, the cSRX is also configured to look for the HTTP Status Code 101 in the response relayed by the API Server back to the client.

Separation of responsibilities

Security is a shared responsibility. While being able to secure an application is critical, a defense in depth approach also creates a separation of responsibilities. By leveraging host-based firewalling, enterprises can leave the Kubernetes administration teams to determine the best path for rolling out patches across their environment, alleviating the pressure.

Reducing the scope of work allows teams to stay focused on a narrow set of activities, which is particularly important during fast-moving stretches immediately following a vulnerability disclosure.

Ultimately, security is going to be an “and” proposition. There must be multiple means of detecting and remediating threats, starting at the perimeter and extending all the way into the applications themselves. The network plays a central role, providing a natural means of identifying threats and containing issues. In this case, Contrail Enterprise Multicloud becomes a powerful backstop for a particularly problematic issue.

About me