Pros and cons of AI and machine learning in antivirus software
When it comes to cheap and best antivirus software, some vendors call machine learning the magic bullet for malware, but what is the truth in these claims? In this article, we will look at how machine learning is used in antivirus software and if it really is the perfect security solution.
How does machine learning work?
In the antivirus industry, machine learning is generally used to improve the detection capabilities of a product. While traditional detection technology relies on coding rules to detect malicious patterns, machine learning algorithms create a mathematical model based on data samples to predict whether a file is “good” or “bad”.
Simply put, this means using an algorithm to analyze observable data points from two manually created data sets: one containing only malicious files and one containing only non-malicious files.
Then the algorithm develops rules that allow you to distinguish good files from bad files without receiving instructions on the types of models or data points to search. A data point is any unit of information linked to a file, including the internal structure of a file, the compiler used, the text resources compiled in the file, and much more.
The algorithm continues to calculate and optimize your model until it receives an accurate detection system that does not classify good programs as bad and no bad programs as good. You develop your model by modifying the weight or importance of each data point. With each iteration, the model can better identify malicious and non-malicious files.
Machine learning can help detect new malware
Machine learning enables cheap antivirus software to identify new threats without relying on signatures. In the past, antivirus software relied heavily on fingerprints to compare files against a huge database of known malware.
The main mistake here is that signature verifiers can only detect malware that has been seen before. This is a pretty big blind spot, as hundreds of thousands of new malware variants are created every day.
On the other hand, machine learning can be trained to recognize the signs of good and bad files and thus detect malicious patterns and detect malware, whether they have been seen before or not.
One of the main weaknesses of machine learning is that it doesn’t understand the effects of the model it creates, it just does. It simply uses the most mathematically proven and efficient method to process data and make decisions.
As already mentioned, the algorithm is powered by millions of data points without anyone explicitly saying which data points are indicators of malware. The machine learning model must discover this for itself.
The result is that no one can really know which data points, according to the machine learning model, could indicate a threat. It can be a single data point or a specific combination of 20 data points. A motivated attacker could understand how the model uses these parameters to identify a threat and use it to its advantage. Editing a particular, seemingly irrelevant data point in a malicious file may be enough for the model to consider the malware to be safe and undermine the entire model.
To resolve the problem, the provider must add the manipulated file to the registry and recalculate the entire model. It can take days or weeks. Unfortunately, that still wouldn’t solve the underlying problem: Even after rebuilding the model, it would only be a matter of time before the attacker found another data point or a combination of data points that could fool the machine learning system.
A multi-level approach to cybersecurity
Machine learning is a powerful technology that could play an increasingly important role in the world of cybersecurity in the coming years. However, as mentioned above, it has its shortcomings and limitations. If you use antivirus software that relies solely on artificial intelligence or machine learning, you may be vulnerable to malware and other threats.
Solutions using a combination of protection technologies are likely to offer better security than a product entirely based on artificial intelligence. For example, Emsisoft uses the power of artificial intelligence and machine learning, as well as other protection technologies, such as behavior analysis and signature verifications. These systems work in synergy to double and triple each other’s results to give you the best possible protection against malware.
A layered security approach can help you avoid putting all of your eggs in one basket and maximize the likelihood that malware will stop before infecting your system.