REVIEW: Malware Data Science – Attack Detection and Attribution

As an Amazon Associate, I earn from qualifying purchases.

This is a review of Malware Data Science: Attack Detection and Attribution by Joshua Saxe and Hillary Sanders.

Here is a sample chapter:

I read this book because I am interested in detecting malware. This book was easy to read. I was not Googling every few words like I do with other machine learning and data science material. I finished skimming through this book in a weekend, and was able to apply several of these concepts on my own.

Unfortunately, the examples are written for Python 2.7, which has been deprecated. I opted to write my own code to explore these concepts rather than relying on the book’s code. Despite this, the code was clear and concise. It appeared to be trivial to fix, if any fixes are even required at all, the examples to work on Python 3, however I have not actually done this so don’t take my word for it.

Prior to reading this book, I have already used most of the libraries and tools which were covered. Despite this, I learned new techniques to examine malware and made a list of topics I’d like to explore deeper. I appreciated the suggestions of the authors to study the Python packages introduced in this book in depth, linear algebra, probability theory, statistics, graph analytics, and multi-variable calculus. All of these topics can be learned online for free.

If you are new to data science or machine learning, this book provides an excellent introduction to these topics.

What’s Included

This book is 231 pages long and has 12 chapters. A virtual machine and zip file with code and datasets are provided at

I did not use the virtual machine or code, but I did use the datasets.

Prerequisite Knowledge

This book may be difficult to read if you don’t have basic knowledge of Python, computer security, and data science.


Here are brief summaries of each chapter. Hopefully this will let you make an informed choice if this book will be of benefit to you.

Basic Static Malware Analysis

Chapter 1 covers basics of malware analysis. It briefly covers the structure of a Portable Executable file and how to parse PE files with the pefile Python library. This chapter also covers some basic strategies for identifying malware using strings and examining PE file resources with wrestool and icotool.

Beyond Basic Static Malware Analysis: x86 Disassembly

Every malware book has an obligatory x86 crash course. This chapter covers basic x86 assembler and touches on some anti-analysis techniques. It provides an example of using Capstone to disassemble some code.

A Brief Introduction to Dynamic Analysis

This chapter provides a very basic overview of using dynamic analysis against malware. The authors suggest reading Practical Malware Analysis: The Hands-On Guide to Dissecting Malicious Software for more information on this subject.

Identifying Attack Campaigns Using Malware Networks

This chapter is a crash course in graphs using the networkx library and Graphvis.

Shared Code Analysis

Shared code analysis shows how similar two malware samples are to each other. It can be used to discover malware samples that may be related to each other. I learned about Jaccard index and Minhash in this chapter.

Understanding Machine Learning-Based Malware Detectors

This chapter is an introduction to machine learning. I appreciated the emphasis placed on the amount of time gathering samples to test new detection schemes, because this was precisely what causes me the most grief when trying to develop this kind of stuff.

Evaluating Malware Detection Systems

Machine learning isn’t perfect. You will have malware that isn’t detected (False Negative), and benign software that gets categorized as malware (False Positives). This chapter covers this and some other ways to measure how well your detections work.

Building Machine Learning Detectors

This chapter provides a walkthrough of building a detector using scikit-learn.

Visualizing Malware Trends

Humans tend to be able to process data better with visual aids. This chapter shows examples of visualizing datasets to recognize patterns and trends. They used pandas, matplotlib, and seaborn to demonstrate these concepts.

Deep Learning Basics

This chapter provides an overview of the basic concepts of neural networks.

Building a Neural Network Malware Detector With Keras

Self-explanatory. This chapter provides an example of building a malware detector with keras.

Becoming a Data Scientist

This chapter gives advice to readers who want to pursue this topic further or get into data science as a profession.

Further Reading

Mining Massive Data Sets

Practical Malware Analysis: The Hands-On Guide to Dissecting Malicious Software

Attribution of Advanced Persistent Threats: How to Identify the Actors Behind Cyber-Espionage

Intelligence-Driven Incident Response: Outwitting the Adversary

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s