Enhancing the Detection of Unknown Malware using Active Learning

Nir Nissim


The formation of new malwares every day poses a significant challenge to anti-virus vendors since antivirus tools, using manually crafted signatures, are only capable of identifying known malware instances and their relatively similar variants. To identify new and unknown malwares for updating their anti-virus signature repository, anti-virus vendors must daily collect new, suspicious files that need to be analyzed manually by information security experts who then label them as malware or benign. Analyzing suspected files is a time-consuming task and it is impossible to manually analyze all of them. Consequently, anti-virus vendors use machine learning algorithms and heuristics in order to reduce the number of suspect files that must be inspected manually. These techniques, however, lack an essential element-- they cannot be daily updated. In this talk I will introduce a solution for this updatability gap. I will present an active learning (AL) framework and introduce two new AL methods that will assist anti-virus vendors to focus their analytical efforts by acquiring those files that are most probably malicious. Those new AL methods are designed and oriented towards new malware acquisition. A comparison of our methods to existing high performance AL methods and to random selection, which is the naïve method, indicates that the AL methods outperformed random selection for all performance measures. Our AL methods outperformed existing AL method in two respects, both related to the number of new malwares acquired daily, the core measure in this study. Secondly, while the existing AL method showed a decrease in the number of new malwares acquired over ten days, our AL methods showed an increase and a daily improvement in the number of new malwares acquired. Both results point towards increased efficiency that can possibly assist anti-virus vendors.

This framework showed efficiency in several domains related to Cyber-security, including PC malwares (executables), PDF and MS Office files (Non-Executables) as well as Android mobile applications. Recently we have extended the framework's capabilities so it will provide solution in additional domains. We have adjusted it also to the Bio-Medical domain, in which we successfully enhanced the capabilities of a classification model that is used for severity prioritizing among phenotypes (diagnoses).

The presentation will cover my following recent papers:

N. Nissim, R. Moskovitch, L. Rokach, Y. Elovici, Novel Active Learning Methods for Enhanced PC Malware Detection in Windows OS, Expert Systems with Applications, http://dx.doi.org/10.1016/j.eswa. 2014.

Nir Nissim, Aviad Cohen, Chanan Glezer, Yuval Elovici, Detection of malicious PDF files and directions for enhancements: A state-of-the art survey, Computers & Security, Volume 48, February 2015, Pages 246-266, ISSN 0167-4048, http://dx.doi.org/10.1016/j.cose.2014.10.014. http://www.sciencedirect.com/science/article/pii/S0167404814001606

N. Nissim, Aviad Cohen, R. Moskovitch, Y. Elovici, ALPD: Active Learning Framework for Enhancing the Detection of Malicious PDF Files, The first joint International and European conference on Intelligence and Security Informatics (IEEE ISI& EISIC), Hague, the Netherlands (2014).

Nir Nissim, Robert Moskovitch, Lior Rokach,Yuval Elovici, "Detecting Unknown Computer Worm Activity via Support Vector Machines and Active Learning”, Pattern Analysis and Applications (2012) 15:459–475

Nir Nissim, Mary Regina Boland, Robert Moskovitch, Nicholas P Tatonetti, Yuval Elovici, Yuval Shahar, George Hripcsak, "An Active Learning Framework for Efficient Condition Severity Classification". Conference of Artificial Intelligence in Medicine 2015 (AIME-15).

Time and Place

Friday, May 15, 2:00pm
Gates 463