As an Amazon Associate, I earn from qualifying purchases.
This is a review of Malware Data Science: Attack Detection and Attribution by Joshua Saxe and Hillary Sanders.
Here is a sample chapter: https://www.malwaredatascience.com/sample-chapter
I read this book because I am interested in detecting malware. This book was easy to read. I was not Googling every few words like I do with other machine learning and data science material. I finished skimming through this book in a weekend, and was able to apply several of these concepts on my own.
Unfortunately, the examples are written for Python 2.7, which has been deprecated. I opted to write my own code to explore these concepts rather than relying on the book’s code. Despite this, the code was clear and concise. It appeared to be trivial to fix, if any fixes are even required at all, the examples to work on Python 3, however I have not actually done this so don’t take my word for it.
Prior to reading this book, I have already used most of the libraries and tools which were covered. Despite this, I learned new techniques to examine malware and made a list of topics I’d like to explore deeper. I appreciated the suggestions of the authors to study the Python packages introduced in this book in depth, linear algebra, probability theory, statistics, graph analytics, and multi-variable calculus. All of these topics can be learned online for free.
If you are new to data science or machine learning, this book provides an excellent introduction to these topics.
What’s Included
This book is 231 pages long and has 12 chapters. A virtual machine and zip file with code and datasets are provided at https://www.malwaredatascience.com/
I did not use the virtual machine or code, but I did use the datasets.
Prerequisite Knowledge
This book may be difficult to read if you don’t have basic knowledge of Python, computer security, and data science.
Chapters
Here are brief summaries of each chapter. Hopefully this will let you make an informed choice if this book will be of benefit to you.
Basic Static Malware Analysis
Chapter 1 covers basics of malware analysis. It briefly covers the structure of a Portable Executable file and how to parse PE files with the pefile Python library. This chapter also covers some basic strategies for identifying malware using strings and examining PE file resources with wrestool and icotool.
Beyond Basic Static Malware Analysis: x86 Disassembly
Every malware book has an obligatory x86 crash course. This chapter covers basic x86 assembler and touches on some anti-analysis techniques. It provides an example of using Capstone to disassemble some code.
A Brief Introduction to Dynamic Analysis
This chapter provides a very basic overview of using dynamic analysis against malware. The authors suggest reading Practical Malware Analysis: The Hands-On Guide to Dissecting Malicious Software for more information on this subject.
Identifying Attack Campaigns Using Malware Networks
This chapter is a crash course in graphs using the networkx library and Graphvis.
Shared Code Analysis
Shared code analysis shows how similar two malware samples are to each other. It can be used to discover malware samples that may be related to each other. I learned about Jaccard index and Minhash in this chapter.
Understanding Machine Learning-Based Malware Detectors
This chapter is an introduction to machine learning. I appreciated the emphasis placed on the amount of time gathering samples to test new detection schemes, because this was precisely what causes me the most grief when trying to develop this kind of stuff.
Evaluating Malware Detection Systems
Machine learning isn’t perfect. You will have malware that isn’t detected (False Negative), and benign software that gets categorized as malware (False Positives). This chapter covers this and some other ways to measure how well your detections work.
Building Machine Learning Detectors
This chapter provides a walkthrough of building a detector using scikit-learn.
Visualizing Malware Trends
Humans tend to be able to process data better with visual aids. This chapter shows examples of visualizing datasets to recognize patterns and trends. They used pandas, matplotlib, and seaborn to demonstrate these concepts.
Deep Learning Basics
This chapter provides an overview of the basic concepts of neural networks.
Building a Neural Network Malware Detector With Keras
Self-explanatory. This chapter provides an example of building a malware detector with keras.
Becoming a Data Scientist
This chapter gives advice to readers who want to pursue this topic further or get into data science as a profession.
Further Reading
Practical Malware Analysis: The Hands-On Guide to Dissecting Malicious Software
Attribution of Advanced Persistent Threats: How to Identify the Actors Behind Cyber-Espionage
Intelligence-Driven Incident Response: Outwitting the Adversary