Spotlight on security: The problem with false alarms
False Positives (FPs, also known as False Alarms) are harmless and legitimate programs that are incorrectly identified as malicious by an antivirus program. A false positive can have very serious consequences. In some cases, it will not be possible to run a legitimate program if it is blocked by the security software.
This is not only frustrating for the user, but also damaging for the developer of the program that has been blocked, as nobody will trust a program that is immediately flagged as malicious, or rendered useless, by antivirus software. It can also happen that the user will waste time trying to clean their computer, even though there is no actual infection. This can be a major problem. In the worst-case scenario, a computer can be rendered unusable if the false positive is a system file that is needed to make the operating system work properly. This is fortunately quite rare, but does happen occasionally.
In April 2010, a major AV vendor released a malware-definitions file that caused the Windows system file svchost.exe in Windows XP SP3 to be incorrectly defined as malicious. Affected users found that their computers went into an endless reboot cycle. Whilst the vendor reacted quickly to replace the bugged definitions file, the case illustrated well how problematic a false positive can be. Other vendors have had similar problems too. There have been similar cases where a false positive has removed a system file and rendered the system inoperative, and others where a specific program or feature, such as the Google Chrome browser or Pegasus Mail, has been deleted. In one instance, an antivirus program even detected its own update feature as malicious, rendering itself unable to update.
False positives arise because of the constant cat-and-mouse game between antivirus vendors and malware authors, in which each is constantly trying to stay ahead of the other. Initially, some 25 years ago, antivirus programs relied on specific virus definitions to identify known malicious programs. This meant that there were few false positives, because malware programs were clearly defined. However, virus writers started developing new methods to get around this simple protection technology. Polymorphic viruses disguised themselves by changing part of the code within it – this allowed it to escape detection by basic signature-based antimalware engines, whilst still keeping the functionality intact. Antivirus vendors responded by using generic malware detection algorithms that identify multiple threats using a single malicious-code definition. This effectively ignores the changed parts of a polymorphic virus but identifies the common threat code. As malware authors developed malware that avoided existing generic definitions, so AV manufacturers improved their heuristics in order to identify previously-unseen viruses based on their similarity to known existing malware. This was already supported by artificial intelligence (AI) and machine learning (ML) many years ago. Further methods of blocking new malware were then developed, so as to keep pace with the cybercriminals. These included new technologies such as behavioural detection, which identifies and stops potentially malicious behaviour when a program file is executed; file reputation, which checks on e.g. downloads/installations of a file on other systems, and whether there are any reports of it being malicious; and URL blockers, which prevent file downloads from known malware-serving sites.
Unfortunately, many of the various methods developed for identifying unknown malware are not perfect, and can lead to false positives arising. For example, many legitimate programs may integrate themselves into the operating system in a way that resembles malware. Encryption programs and system-restore functions, for example, may run the risk of being labelled as malware by over-zealous behaviour-blockers. AV products that block everything they have never (or only rarely) seen before, or anything not on their whitelist, can be effective at blocking malware, but at the cost of high FP rates and consequent usability nuisances.
Because it is relatively easy for AV-programs to reach high malware detection rates by blocking any unknown programs, effective testing of antivirus software must include a test for false positives, to ensure that users are being protected against malicious programs, as opposed to having all their uncommon programs blocked. Principle 4 of the AMTSO Fundamental Principles of Testing states that “The effectiveness and performance of anti-malware products must be measured in a balanced way”. It specifically mentions that an antivirus program that identifies a high percentage of malware, but also has a high number of false positives, is not necessarily better than one which identifies fewer viruses, but also has fewer false positives. This takes into account the problems that a false positive can cause, described above.
Since August 2008, AV-Comparatives has included a false-positives test in its public tests, in order to ensure that AV programs do not reach a high malware-protection rate at the expense of a large number of FPs. In our detailed FP reports, users can see which applications produced FPs at the time of testing, which detection name was encountered, and how prevalent the observed FPs are.
Below you can find some insights (as of June 2018) into the large clean-files set, which we use to check for false alarms as part of the Consumer Malware Protection Test. When we build our clean-sets for false alarm testing, we taken several factors into consideration, and our decisions and assessments are often based also on internal research/field-studies that we do in co-operation with the University of Innsbruck. Our clean set is constantly updated, i.e. new files are added, and extinct files removed. One source of clean files are the systems of real-life, everyday users who have agreed to share their data with us (we work closely with some computer repair shops). Further source are common program distributions, DVDs from computer magazines, and programs found on software download sites.
Blue figures in the graph below show the distribution of files in our clean set according to age. The orange figures show the distribution of files in the same age categories amongst average user systems as compared to the distribution in the clean-set. Users might be surprised to hear that newly installed and fully updated systems contain many files that are several years old. Many commonly-used programs re-use files from earlier versions that were originally developed some years ago. The distribution below shows that in general our clean-files set leans more toward newer files compared to the distribution in the field.
The blue figures in the second graph show the prevalence of files used in our clean set. Orange figures show the prevalence of the same files amongst a significant sample of real-world users, as compared to the clean-files set. The distribution shows that in general, our clean set leans a bit more towards prevalent files than does real-life distribution in the field.
Additionally, over 1/3 of the PE (program executable) files in the test-set have a valid digital signature.
FPs are usually encountered in the “very low” prevalence category (fewer than 100 current users). Naturally, there are some AV vendors that would like us to use a clean set consisting entirely of very prevalent, very well-known, digitally-signed files that they already have on their whitelists and which do not generate false alarms.
App developers, and users who like to try out newer software in beta stages or immediately after release, know very well that AV programs can cause false alarms with such apps, making it frustrating to use. There is also the danger that if users see a lot of false positives, they will eventually start ignoring the warnings from their AV program, and so may not take a genuine alert seriously.
File reputation systems are useful in protecting computer systems against malware, but have potential to create false alarms too. If such a protection feature blocks or warns about any file that the vendor has not seen or investigated and whitelisted yet, false positives – and thus irritated users – are inevitable.
We advise users to consider false positives when considering buying security software for themselves. A vendor might quote a very high detection rate for their product in an independent test, but without an accompanying figure for false positives, a user cannot be sure that the product will be trouble-free.
The graph below shows the average number of FPs encountered in our public FP tests (as part of the Malware Detection/Protection Tests); averaged out over all the tests, there were 22 FPs per tested product.
It often happens that if one product mistakenly has a false alarm on a legit file, other products start to detect the same file as well, as they are copying detections from each other (snowball effect). We are pleased to note that online scanning service VirusTotal is helping to reduce numbers of false positives by allowing developers to share their files with AV-vendors.