Recently, I have been making efforts to learn more about reverse engineering malware. Here are some of the notes that I’ve taken to reinforce my learning. I am not a complete novice at working with malware, but I am by no means an expert. You have been warned.
What is Reverse Engineering?
Reverse engineering is the practice of examining an object with the intent of understanding how it works or recreating it. Reverse engineering can apply to software, hardware, machinery, architecture, or any artificial object.
This page focuses on software reverse engineering malware specifically.
Why Reverse Engineer Malware?
Reverse engineering malware is often performed with the intent of strengthening security systems.
New malware and variants of older malware are developed every day. Security teams often catalog and analyze these new samples as they are discovered.
Cataloging analyzed samples helps speed up the incident response process. If incident responders are able to identify the tools used in an intrusion and already have an idea of how these tools work, they will likely have experience, notes, and playbooks to deal with the malware.
The data extracted from malware samples can be used to improve existing security detection mechanisms or provide insight and opportunities to develop brand-new ways to detect malicious activity.
By comparing newly-observed samples with previously-observed samples, analysts may be able to place attribution on the sample. This is often determined by observing code reuse or shared metadata between samples.
Attribution can help determine the appropriate actions to take when malicious activity is detected. The appropriate response for some run-of-the-mill adware is much different than if an implant used by a foreign intelligence agency is discovered running on one of your hosts.
You can start reversing and analyzing malware with a fairly basic skillset, but if you want to get serious as a reverse engineer, you will definitely want to improve your overall knowledge of contemporary computer systems, software development, and obtain domain-specific knowledge related to reverse engineering.
I do not believe that reversing is a beginner-friendly field, but I also believe that you can teach nearly any security analyst or systems administrator basic triage and analysis techniques. Never underestimate the power of simply running
strings against a sample or reading it with a hex editor.
In the last year, I have seen malware for Windows, Linux, Mac, Android, iOS, FreeBSD, Solaris, and lesser-known embedded systems. There is definitely malware that impacts other operating systems out there. If a device connects to a network or has anything of value on it, that there is likely malware that targets it.
If you are trying to get a job reversing, it makes sense to spend some time learning about Windows because of its prevalence within corporate networks. There is simply more Windows malware out there and learning material geared towards Windows at this point in time.
It may make sense to specialize in less popular systems as well. There aren’t as many specialists who know how to reverse engineer mobile applications compared to Windows systems. If you are looking for work, the jobs aren’t as plentiful for these specialized roles, but you may end up having less competition and an advantage when applying to these jobs compared to roles primarily focused on reversing Windows malware.
You should be familiar with the languages commonly used on your chosen platforms because these are the languages malware authors will most likely use to develop malware.
The good news is that once you learn a language or two to a reasonable degree, you will probably be able to make sense of malware written in other languages.
If I were to recommend languages to start with, I’d suggest Python, C, and x86 assembler because they cover the most ground with systems today and there is an abundance of learning material out there for these.
I feel that it makes a lot of sense to actually write malware when learning. Pros Versus Joes, CCDC, and other similar CTFs provide safe and legal environments to develop malware and detection software.
There are many algorithms and data structures utilized by malware that you should become familiar with. Malware often uses encoding, encryption, and hashing algorithms to obfuscate its true intent and make analysis more difficult. It may also use data structures such as linked lists, protobufs, hash tables, maps, and so on.
There are also a lot of patterns used by malware. Techniques such as process injection, parent process id spoofing, and keylogging use specific sets of API calls. Recognizing these can be a quick tell on what a sample’s intent is.
Being exposed to these structures, patterns, and algorithms speeds up the analysis process.
A lot of malware communicates over networks. As such, it makes sense to be knowledgeable about networking.
Often, the only practical way certain malware families can be detected is by analyzing network traffic. Devices such as printers, routers, switches, and cameras can be infected with malware. These devices often have no commercial antivirus or EDR software that will work on them. Despite the lack of endpoint security solutions for these devices, NIDS, firewalls, and proxies will still work and are plenty effective.
You don’t have to hold multiple CCIE certifications and read every single RFC in existence to analyze malware, but it does help to know some of the basics:
- IP addresses
- Common ports
- RFC1918, RFC3927
- A basic understanding of DNS
- Common protocols: TCP, UDP, ICMP, HTTP, TLS, SSH, RDP, SMB, FTP, IRC, …
- Packet Captures
It is helpful to know how to fingerprint malware with network traffic:
- Port scans
- IP addresses
- Anomolous traffic
Network traffic analysis provides many exciting opportunities to detect malware and should not be overlooked.
Malware often targets features and vulnerabilities within common software products or uses these as a medium to spread.
Examples include abusing macros within Microsoft Office formats, abusing PDF features, or targeting vulnerabilities within the software used to read/play file formats:
Malware can also target software flaws and misconfiguration within client or server software:
- Database servers: MS-SQL, MySQL, MongoDB, etc
- Web Servers and Web Applications: Apache, IIS, WordPress, Drupal, etc
- Chat clients: Discord, Teams, Slack, etc
- Development, continuous integration, and devops tools: Jenkins, Gitlab, SaltStack, Confluence, etc
It is impossible to be an expert in all of these tools, but having a basic understanding of them and their intended use helps.
How Malware Works
It helps to know how malware works and what it intends to achieve. There are several types of malware that share common characteristics such as RATs, credential stealers, keyloggers, ransomware, rootkits, and sniffers.
Depending on the type of malware, certain techniques are commonly used in malware, but uncommon with legitimate software:
- Process Injection
- Process enumeration
The MITRE ATT&CK framework has done a good job classifying techniques employed by malware and families of malware that use each technique.
SEKTOR7 offers courses on Windows malware development.
Reversing utilizes several types of tools. You may be able to solve the same problem with different kinds of tools. Being versed with a variety of tools can make the reversing process much easier.
Some of the types of tools used by reverse engineers are:
- Virtual Machines
- File viewers and parsers
- Antivirus engines
- Scripting languages
- Data transformation tools
- Classification tools
- Reporting and presentation tools
- Network traffic analysis tools
Luckily, the availability and quality of learning materials related to reversing have improved tremendously in the time I’ve been studying security. There has been no better time than now to get into reversing.
There are several books, blogs, YouTube channels, courses, CTFs, Crackmes, and sources of samples that are of high quality.
Many researchers publish writeups of malware, CTF puzzles, and crackmes that they have worked on. These writeups can be excellent learning aids, offering a glimpse into the methods used by other reverse engineers.
Obtaining and Managing Samples
Obtaining samples has always been a struggle for me. There are a lot of sites that host samples with the intent of making them available for researchers. These sites are great, but often don’t let you search their corpus of samples or may not have a specific sample you want.
Lenny Zeltzer maintains a list of free resources to obtain samples: https://zeltser.com/malware-sample-sources/. You may or may not find what you need in the resources listed above if you put in some effort.
Alternatively, if you are targeting specific campaigns, worms, or general miscreant behavior, you may want to set up honeypots.
Some sites offer live feeds to newly reported samples. With a bit of scripting, you can utilize these to automate downloading, categorizing, and storing new samples.
Collecting samples and storing them long-term provides some interesting opportunities. A large corpus of malware is handy for developing detection rules, machine learning, and discovering relationships between different malware strains. Storing this much data in a meaningful way can be a difficult and manual process.
Deception, Anti-Analysis, and Anti-Forensics
Malware authors often want to disguise their malware so that it blends in with legitimate activity. Doing this increases the chances of their malware surviving on a system. They may rename processes, use “living off the land” techniques, or place rootkits to deceive administrators and responders.
They may also employ anti-analysis and anti-forensics techniques to make analysis more difficult if their malware is discovered.
They may use obfuscation, attempt to detect and evade analysis environments and sandboxes, use “fileless” techniques, utilize geofencing, use domain generating algorithms, packers and other protectors, and a number of other tricks.
These techniques are intended to make analysis difficult. As such, some malware samples are extremely difficult to analyze and may require the use of tools and techniques that are not within your current resources and abilities.
Amount to Learn
It is impossible to know every language, every malware technique, and every tool out there. Picking what to spend time on when there are so many choices is can be difficult. There is no set blueprint or baseline for reverse engineering.