Login Papers Register

Need an account to vote? Register to attend at gsec.hitb.org/sg2018/
Deadline is 30th June 2018!

<< previous next >>

The Cassandra Curse: Believe me, You Will Need Machine Learning. Stratosphere IPS for Malware Traffic Detection

Sebastian Garcia

0 vote(s)

Machine Learning divided the opinion of security analysts like nothing else did in the last years. Is it good? is it snake-oil? Is it worth the price? Despite the advances done in the last years on the use of machine learning (ML) for the detection of malware traffic, there is no clear evidence of how these tools may had improved our detection capabilities. Do we really need ML? Amount the questions to answer are: (1) how ML tools compare to traditional methods, (2) if any malware changed its behavior due to ML tools, (3) and how can security specialist use ML tools in their networks for every day work.

More importantly, there are not many free-software IDS that use ML for the people to use and adapt. This work explores these questions and presents new advances done in slips, the Stratosphere Linux IPS tool. Slips is a free-software tool that combines several ML methods with real-time analysis of network traffic. Designed for everyday use on large amounts of traffic, slips allows researchers to improve and speedup their network analysis. Slips combines one network behavior representation model and three different ML algorithms: a markov chains algorithm, a neural network algorithm and a statistical algorithm for false positive reduction.

One of the difficulties to know if ML tools help is that most ML implementations are private. This makes their inner working hidden even for the customers that bough them. Another difficulty is that network traffic is continually changing, making any comparison very difficult to achieve. What is more, ML tools are constantly changing and adapting to the target network, making any comparison highly dependent on time.

To tackle these questions we compared slips against all the state-of-the-art methods that we could execute. The comparison was done taking the date into account, so the tools were updated to the moment when the attack was done. The comparison of the tools is done on a very large dataset of real malware traffic. More than 300 malware executions were done for months for these analysis. More importantly, a large amount of  normal traffic was captured and manually verified. Obtaining verified normal traffic is more difficult than malware traffic. Finally, we generated special mixed datasets were a normal user is infected after some days, while they continue to act normally together with the malware. These unique datasets were never used or published before.

We believe that to deal with a increasingly large amount of traffic in real-time, ML tools are mandatory. ML is not only used to obtain better detections, but to deal with an amount of attacks that no human analyst can ever process. Moreover, our comparison of human analysts with traditional detection tools and ML tools shows that ML can improve the detection in the network and make the analysts more effective.


Sebastian is a malware researcher and security teacher that has extensive experience in machine learning applied on network traffic. He created the Stratosphere IPS project, a machine learning-based, free software IPS to protect the civil society. He likes to analyze network patterns and attacks with machine learning. As a researcher in the AIC group of Czech Technical University in Prague, he believes that free software and machine learning tools can help better protect users from abuse of their digital rights. He has been teaching in several countries and Universities and working on penetration testing for both corporations and governments. He was lucky enough to talk in Ekoparty, DeepSec, Hacktivity, Botconf, HackLu, InBot, SecuritySessions, ECAI, CitizenLab, ArgenCor, Free Software Foundation Europe, VirusBulletin, BSides Vienna, HITB Singapore, CACIC, etc.  As a co-founder of the MatesLab hackspace he is a free software advocate that worked on honeypots, malware detection, distributed scanning (dnmap) keystroke dynamics, Bluetooth analysis, privacy protection, intruder detection, robotics, microphone detection with SDR (Salamandra) and biohacking.