Avoiding a Faux-PAW: Ditching my Beloved Jump Host

by Jonathan Lent

Background

The Information Security Office (ISO) has created a set of Minimum Security Standards (MinSec), broken down with a matrix of applicability depending on the risk classification of a given system (low, medium, or high) [1]. Of these standards for high-risk systems, one requires the use of a dedicated admin workstation for administration (also known as a Privileged Access Workstation (PAW), or Personal Bastion Host (PBH)) [2]. Unlike many of the MinSec requirements, this standard doesn’t hinge so much on a technical implementation detail; rather, it requires a simple set of technical changes (firewall rules), along with a drastic change in daily workflows for folks like me.

Before I go into why shared jump hosts can be a source of risk, it’s important to maintain an open mind and reiterate why these setups can be useful. To be clear, by “jump host”, I mean a hardened server that is used to gain access to other resources. Without going into any implementation-specific details, jump hosts are often on the short list of machines (or the sole source) that can connect to various ports on endpoints. The machine ideally would be locked down to restrict authentication access (e.g. not only a carefully scrutinized list of users, but additional tweaks requiring specific protocols for access (e.g. Kerberos, rather than password authentication)). This is indeed more secure than some of the alternatives (e.g. world or domain-wide access to individual endpoints). With proper planning these hosts can be hardened to eliminate the potential for many issues outlined below. However, the weakest link in computer security often lies with the human element, and that’s one aspect of the shared model that cannot be ignored.

Being the often stubborn person that I am, this is the one MinSec requirement that would always stand out to me when I reviewed the standards. For better or worse, I am a creature of habit, and someone that has really settled into the use of a jump host. Defending my stubborn workflows for a moment, I want to emphasize that I tried to avoid ever having a shell session on the jump host; in my workflows, I set up tunnels (through the use of SSH ProxyCommand, SOCKS5 proxies, proxychains, and the like). Thus, I pass through it, rather than land on it and do any of my work there. That being said, it’s really hard to shake habits that are heavily engrained. In helping myself prepare for these changes, I wanted to get a good answer to the question, “Why are jump hosts so bad?“. In full disclosure, a keen observer may notice that I make a lot of assumptions with some of the points I make in this article. (e.g. I’m assuming that a jump host in question allows interactive shell sessions for its legitimate users). My goal in writing this article is to demonstrate my experience on this front, including some Stanford-specific scenarios, in hopes of convincing myself and others that out-sourcing the security concerns to a hardened personal privileged access workstation and dedicated VPN, managed by the Information Security Office, is the right way to go.

Security Considerations with Shared Jump Hosts

As evident by the existence of the MinSec standard, as well as the title of this article, however, there are significant security considerations with having a setup like this. There have been many peer-reviewed studies published on this topic, as well as many back-and-forth blog articles on the open internet, though here are the general drawbacks that stand out to me:

  • Significant Target – these hosts are convenient for legitimate users, but an absolute goldmine for attackers. If a would-be attacker is able to monitor traffic to your endpoints, it’s going to be fairly easy to detect a guess single or set of jump hosts. This drawback is substantially more dangerous if connections to the jump host do not require a VPN connection first (e.g. open to the world). I’d like to believe that all jump hosts are protected in this way, but even at Stanford I know of at least two that are not.
  • One-access policy – endpoints typically have access allowed from jump hosts per-port, with little additional security aside from allowed users and authentication methods. That is, if Jane Doe can reach port 22 on endpoint 1, so can John Smith if coming from the same source. If other ports are open, and exploits are discovered in the wild, it would be possible to circumvent endpoint protections and use the jump host for nefarious purposes. This gets especially muddy when jump hosts are shared between departments and organizations (and yes, this is the case in multiple cases here at Stanford).
  • Wide-area Network Access – in a scenario where the endpoints have no wide-area network access of their own, it may be necessary or convenient to allow wide-area network (internet) access from the jump host, so that external materials can be gathered and security copied to the endpoint. However, this access on the jump host also makes it more convenient and/or possible for an attacker or ill-advised user to pull malware or other dangerous content directly onto the privileged jump host. Even in the absence of this access, it’d still likely be possible to get the material to the jump host and then to the destination. However, the convenience of WAN access on the jump host makes this a lot easier.
  • Single Choke point – if a single jump host is the only one allowed to make port 22 (SSH connections) to endpoints, what happens if the jump host becomes overloaded or otherwise goes down? Sure, you could use an out-of-band management connection or physically visit a datacenter, but that’s not practical for situations requiring reconfigurations to many servers. Similarly, what happens if a user on a jump host accidentally gets the jump host blacklisted (e.g. /etc/hosts.deny) on the endpoint due to malformed or otherwise repeatedly invalid authentication attempts? The easy answer, “add more jump hosts”, increases your attack surface. Many advantages that a keen systems administrator may use in defending the use of a single jump host become moot or invalid when another is introduced.

Again, those are the ones that stick out to me, but I’ll leave discovery of the other general disadvantages as an exercise to the reader. At this point, it’s important to point out the subtleties required for many of these disadvantages to garner attention. Let’s assume that a jump host in question has no VPN connection requirement, but does require two-factor authentication for human users. In this scenario, exploitation of the above disadvantages will require the insider threat, social engineering, remote privilege escalation, or local privilege escalation (as a result of successful insider threat, social engineering, or simple password guessing).

Anatomy of an Attack

With the disadvantages of jump hosts discussed in some detail, I’d like to provide at least one concrete example that is especially relevant within our University. For this first scenario, let’s assume that a nefarious user has gained root access to a system. Gaining this access may be the result of one of the following:

  • Gained physical access to the datacenter housing the jump host (or access to the virtual infrastructure housing the jump host, allowing console control), and has either guessed the root password or used a password provided through a compromised employee
  • Gained access to the system using remote privilege escalation (e.g. by way of a kernel bug). For sake of example, let’s assume that two-factor authentication was not required for root logins and the attacker otherwise had a clear network route
  • Gained access to a local user on the system (or maybe was the user themselves), and exploited a method of local privilege escalation (e.g., by way of a kernel bug)

With root access, what is the attacker to do next? The obvious answer is to compromise the system. They may install key-loggers, backdoors, replace system files and binaries with malware, etc. The sky’s the limit, really. Configuration management may come along and fix some things, and detection technologies may report the incident after-the-fact, but there’s a good chance that something the attacker will do will persist even in the presence of other safeguards. But that’s “too easy”. Let’s assume that Kerberos authentication is required for connections to the endpoints, and the same technique used to gain root on the jump host cannot be used on the endpoints. Also, let’s assume that the actual jump host users rely upon a home directory in AFS. The attacker could simply “su” to the user, though they would not get an AFS token (so, just an unprivileged account on a system). So what if they wanted to inflict harm or otherwise jeopardize a user’s account…? They might know that the Stanford Information Security Office recommends in their MinSec documents that “Logins with SUNet credentials via Kerberos [are] recommended.” [1]. If you simply “su” as a user, you won’t get a Kerberos ticket, since you did not go through the proper channels. Thus, as an attacker, I’d first look for valid existing Kerberos caches on the system to take over. While an end-user may define their cache location to be anywhere on the system, the default for the MIT Kerberos client is for them to go into /tmp/krb5cc_[uid] [3]. Here’s an example of what an attacker might see:

jumpHost:/root# ls /tmp | grep krb5cc
krb5cc_123456_eHkc0TnRUc
krb5cc_234567_b6M1nS3c4U
krb5cc_1202
krb5cc_root_johnsmith
krb5cc_root_janedoe

In this case, there are five Kerberos ticket caches. Depending on the intent of the attacker, a single valid cache might be enough to achieve their goals. The attacker can try and pick up and use the cache in one of two ways: simply by trying “kswitch -c /tmp/krb5cc_some_UID”, else setting the user’s KRB5CCNAME environment variable:

jumpHost:/root# su -l johnsmith

johnsmith@jumpHost:~$ klist
klist: No credentials cache found (ticket cache FILE:/tmp/krb5cc_123456_eHkc0TnRUc)

johnsmith@jumpHost:~$ export KRB5CCNAME=/tmp/krb5cc_123456_eHkc0TnRUc

johnsmith@jumpHost:~$ klist
Ticket cache: FILE:/tmp/krb5cc_123456_eHkc0TnRUc
Default principal: johnsmith@stanford.edu

Valid starting     Expires            Service principal
04/13/17 11:09:19  04/14/17 12:09:16  krbtgt/stanford.edu@stanford.edu
renew until 04/20/17 09:04:16
04/13/17 11:12:45  04/14/17 12:09:16  afs/ir.stanford.edu@stanford.edu
renew until 04/20/17 09:04:16

If that was our attacker’s first attempt, they got really lucky — they’ve gotten ahold of a live Kerberos ticket cache that’s fresh and prime for usage. They didn’t have to crawl the file system searching in non-standard locations, either, though that would not be especially complicated as a super user (especially if locate/updatedb are up-to-date). As an added bonus, since they did this after assuming the UID of the user, they also have an AFS token ready to go (or can simply run ‘aklog’ and get a new one). What might they do in AFS?:

johnsmith@jumpHost:~$ pwd
/afs/ir/users/j/o/johnsmith

johnsmith@jumpHost:~$ fs getcalleraccess
Callers access to . is rlidwka

johnsmith@jumpHost:~$ pts membership johnsmith
Groups johnsmith (id: 123456) is a member of:
  workgroup:workgroup1
  workgroup:workgroup2
  website1-admins
  website2-admins
  department1-admins
  department2-admins

johndoe@jumpHost:~$ pts membership department1-admins
Members of department1-admins (id: -876543) are:
  janedoe
  johnsmith
  VIP-1
  VIP-2

On that front, the attacker can now cause some environment-specific damage. They can make changes to the user’s home directory (e.g. shell scripts, configuration files), after-the-fact changing POSIX attributes to mask the modification times. From there, they may also have enough context to crawl through the AFS cell and cause some broader damage. In the case of the PTS group “website1-admins”, maybe they infer that this user has administrative access to web space somewhere, and can find it and add malicious content. This AFS example was provided only because the attacker had the opportunity. This is not a risk isolated to jump hosts; any generic resource with AFS available could afford the same opportunity if compromised in this manner. I just wanted to point it out.

Keep Calm and Grab the World by the PAW

However, back to the heart of the example, and the most important part: the attacker now has an active Kerberos ticket, for a user that is otherwise authorized to be on the jump host. A quick look through the user’s history might reveal where they’ve logged into using this ticket. The ‘klist’ output itself may show host keytabs as well. The attacker would then try to remotely authenticate to the endpoints using this kerberos identity, and will be successful wherever the legitimate user has access. If two-factor is not required or is able to be circumvented (e.g. attacker has control of the user’s phone), and root access is granted to the user via sudo, the attacker may have just gotten exactly what they need.

Throughout that example, I’ve been dancing around many assumptions. For one, two-factor is implemented on many hosts at Stanford, with it being configured on many more after the MinSec standards were published. Well, what if the attacker does not have your SUNET password, but has one of your devices (which you have not yet noticed is missing)? Obviously, if an attacker has you at your disposal, only existing high-level blocks will stand in their way.

The ISO-developed Privileged-Access Workstations are not perfect, but they address many of the risks associated with jump hosts. These devices are provisioned on carefully-selected hardware, complete with hardware and software security safeguards including automatic patching and granular blacklisting configurations and abilities. The additional use of a restricted, segmented VPN eliminates the need for a jump host after existing firewall rules are reconfigured. That being said, as a sysadmin I’m at the mercy of my PAW working at any given moment, as well as the PAW-specific VPN being highly-available. I’m optimistic that both of these will be true in the months and years to come. While the transition to using PAWs for connecting to host will take a bit to get used to, I’m optimistic that the transition will be livable, and as well as make me feel a bit more secure about how I administer systems moving forward.

References and Further Reading