3-DAY TRAINING 3 – Making & Breaking Machine Learning Systems « HITB GSEC

Early Bird (< 30th April): SGD2999

Normal (> 1st June): SGD3899

Seats Available: 8

Overview

Making & Breaking Machi ne Learning Systems is a fast paced session on machine learning from the Infosec professional’s point of view. The class is designed with the goal of providing students with a hands-on introduction to machine learning concepts and systems, as well as making and breaking security applications powered by machine learning.

The lab session is designed with security use-cases in mind, since using machine learning in security is very different from using it in other situations. Students will get first hand experience at cleaning data, implementing machine learning security programs, and performing penetration tests of these systems.

Each attendee will be provided with a comprehensive virtual machine programming environment that is preconfigured for the tasks in the class, as well as any future machine learning experimentation and development that they will do. This environment consist of all of the most essential machine learning libraries and programming environments friendly to even novices at machine learning.

At the end of the class, students will be put through a CTF challenge that will test the machine learning development and exploitation skills that they have learned over the course in a realistic environment.

Key learning objectives:

Familiarizing yourself with popular machine learning algorithms and how to adapt these for different problems
How to clean and sanitize data using powerful data processing libraries in Python
How to build a spam classifier and online anomaly detection system in Python
How to do performance evaluations of machine learning classifiers
Examples for using machine learning in intrusion detection, botnet detection, phishing detection, web vulnerability analysis, malware classification, and behavioural analysis
Perform tuning of machine learning systems to improve classification/detection results
Perform security evaluations and penetration tests on machine learning systems
- Fuzzing machine learning classifiers

How to avoid vulnerabilities in machine learning system and algorithm design
How to use Apache Spark to design scalable and distributed real-time machine learning systems
Write your own machine learning captcha solver

Who should attend:

Security Professionals
Web Application Pentesters
Software/application developers
People interested to start using machine learning for security

Hardware/Software requirements:

Latest version of VirtualBox Installed
Administrative access on your laptop with external USB allowed
At least 20 GB free hard disk space
At least 4 GB RAM (the more the better)

Agenda:

Day 1

● Introduction to machine learning
○ Hands-on guided exploration of Python machine learning libraries:

Data-wrangling using Numpy and Pandas
Scikit-learn’s functions and capabilities
Data visualization using Matplotlib/Seaborn

Walkthrough of the most commonly used machine learning algorithms (with quick hands-on examples/visualizations for select algorithms)
- Supervised learning algorithms
  - Linear/logistic regression
  - Support Vector Machines
- Unsupervised learning algorithms
  - Hierarchical/k-Means clustering
  - Decision trees/Random forests
- Semi-supervised learning
2-hour example: Building (and bypassing) an email spam filter with scikit-learn

Day 2

Loading data efficiently
Using a labeled email/spam corpus training and test set, extract salient features to build a word model of spam
Model tuning, cross-validation, and evaluation process
With complete knowledge of the system, manually craft a piece of spam to bypass the filter

Lecture on application of machine learning in the security/abuse space
- Spam, fraud, malware, phishing, and intrusion detection short examples
- Principles behind selecting the best machine learning models for different use-cases
- Considerations when using machine learning in an adversarial/malicious networks
- Using Keras/TensorFlow for anomaly detection with convolutional neural networks
- Choosing the appropriate model for implementing different types of problems – efficacy comparison of different machine learning techniques for solving the anomaly detection problem, and what other considerations to have
2-hour example: Building a simple network intrusion detection system with 2 different machine learning models
- Importance of understanding the data and the threat model before designing a solution for the problem
- Model tuning, cross-validation, and evaluation process
- Guided comparisons of the performance characteristics for each implementation
- Visualizing and presenting the data for ease of analysis by security operation professionals.

Day 3

Streaming pipelines for machine learning using Apache Spark MLlib (PySpark)
- Overview of Apache Spark
  - General architecture
  - Distributed, scalable machine learning deployments with Spark
- Guided example of a streaming architecture for network anomaly detection using reinforcement learning on Spark
Evaluating the security of machine learning systems
- Techniques and guided example of fuzzing a classifier and regressor to find blind spots in the model
- Evaluation of intelligent learning system architecture that is resilient to model poisoning by an adversary
Machine Learning CTF challenge – captcha bypass challenges (using captcha character classification starter code provided)

TRAINING

Location: TRAINING ROOMS Date: August 21, 2017 Time: 9:00 am - 6:00 pm

Clarence Chio

Anto Joseph