HASSH is a technique developed up by some clever engineers at Salesforce to fingerprint SSH implementations. If you want to know technical specifics of how this works, the GitHub repo and their blog post about it do a very good job of explaining the concept. Summing it up briefly, a HASSH is an MD5 hash of key pieces of data exchanged during a SSH session negotiation.
Some practical uses for HASSH are:
- In controlled network segments, you should be able to identify all HASSH signatures for the legitimate software in play fairly easily. Unknown HASSHes should be alerted upon and investigated; something atypical is happening in your environment.
- Identifying HASSHes of common hacking tools. You can feasibly generate detections for tools such as hydra, metasploit, or ncrack.
- Network cartography; if you are in a large environment with multiple SSH server implementations in play, you can use HASSH to map out how many Cisco devices you have running SSH, how many are running OpenSSH, and so on.
- Honeypot detection during offensive operations.
- Gaining insight to your users’ software choices.
One day, I wondered if someone had already written an NSE script for Nmap to fingerprint SSH servers. After a quick Google search I discovered that someone had. Looking a bit closer into this NSE script, it did in fact calculate HASSHes accurately, but did not have a fingerprint database. I reached out to the author of this script on Twitter and talked about the project a bit. A few days later, I issued a PR to their project to add a database lookup feature. He suggested that I write a blog post about the process, so here it is.
First, I used masscan to scan the internet for hosts with port 22 open. This is something I do not recommend you do unless you know what you are doing. You can get be banned from your service provider for doing this. Many networks will threaten or even take legal action against you for scanning their networks.
Using the logs from masscan, I wrote a terrible Python script to connect to each of these hosts and identify as an SSH client, and capture the responses from the server. I wrote yet another crude Python script that sniffed for these responses and calculated HASSH values as they came in, correlating them with the banner the server was sending out. I logged these in a flat text file like so:
cb422ee1335b60da9df923c0553c12f8 SSH-2.0 OpenSSH blahblahblah
As this scan was taking place, I wrote another Python script that tallied the HASSH values and their corresponding banners. After working through 20% of my list of hosts running SSH servers, the number of HASSH values that were observed more than 10 times started slowing down. At 33% of the way through, this number ceased to grow at all. At 40%, I stopped scanning because I didn’t think I’d get any more results if I worked through the entire list.
I ended up with a list with entries like so:
123 cb422ee1335b60da9df923c0553c12f8 SSH-2.0 OpenSSH blahblahblah
These entries contain the number of time a HASSH:banner pair was observed, HASSH, and banner. Writing one last quick Python script, I combined duplicate HASSH values and their respective banners in a manner that I could easily use in an NSE script. I settled with “HASSH <space> <banner1 | banner 2 | …>”
Sometimes, there would be several dozen banners mapped to a HASSH. I opted to only keep the top five occurring banners, and show the percentage of time they occurred in my data set to provide a rough “confidence” value; if you scan a host and it gives you “SSH-2.0 OpenSSH blahblah” 99% of the time and the HASSH matches this, that’s probably what it is.
Finally, I made the addition to 0x4D31’s script to search this database for the values it calculated along with a few minor enhancements and sent the PR on GitHub.
Room For Improvement
I could not come up with an intuitive way to include the number of occurrences of each banner in the NSE script’s output. This would be useful for showing how often a HASSH matched a particular banner. If I had a hundred thousand HASSH:banner matches, one could say with even higher confidence that an endpoint is running that particular SSH implementation, or make a better judgement call to what it might be if the administrators have changed the banner string.
This was a very fun project to work on. I was able to find hundreds of ssh-honeypot and cowrie instances on the internet. Hopefully, this data helps someone in some way or another.
Shortly after my contributions to this project, I was really pleased to see 0x4D31 present about HASSH, JA3, and other methods of fingerprinting encrypted traffic at Kawaiicon. Here is a link to the slide deck of this presentation and here is a link to the video.