False Alarm Test March 2024
Appendix to the Malware Protection Test
Release date | 2024-04-16 |
Revision date | 2024-04-11 |
Test Period | March 2024 |
Online with cloud connectivity | |
Update allowed | |
False Alarm Test included | |
Platform/OS | Microsoft Windows |
Introduction
In AV testing, it is important to measure not only detection capabilities but also reliability. One aspect of reliability is the ability to recognize clean files as such, and not to produce false alarms (false positives). No product is immune from false positives (FPs), but some produce more than others. False Positives Tests measure which programs do best in this respect, i.e. distinguish clean files from malicious files, despite their context. There is no complete collection of all legitimate files that exist, and so no “ultimate” test of FPs can be done. What can be done, and is reasonable, is to create and use a set of clean files which is independently collected. If, when using such a set, one product has e.g. 15 FPs and another only 2, it is likely that the first product is more prone to FPs than the other. It doesn’t mean the product with 2 FPs doesn’t have more than 2 FPs globally, but it is the relative number that is important. In our view, antivirus products should not generate false alarms on any clean files, irrespective of the number of users affected. While some antivirus vendors may downplay the risk of false alarms and exaggerate the risk of malware, we do not base product ratings solely on the supposed prevalence of false alarms. We currently tolerate a certain number of false alarms (currently 10) within our clean set before penalizing scores. Products that yield a higher number of false alarms are more likely to trigger false alarms with more prevalent files or in other sets of clean files. The prevalence data we provide for clean files is purely for informational purposes. The listed prevalence may vary within the report, depending on factors such as which file/version triggered the false alarm or how many files of the same kind were affected. There can be disparities in the number of false positives produced by two different programs utilizing the same detection engine. For instance, Vendor A may license its detection engine to Vendor B, yet Vendor A’s product may exhibit more or fewer false positives than Vendor B’s product. Such discrepancies could stem from various factors, including differences in internal settings, additional or varying secondary engines/signatures/whitelist databases/cloud services/quality assurance, and potential delays in making signatures available to third-party products.
Tested Products
Test Procedure
In order to give more information to the user about the false alarms, we try to rate the prevalence of the false alarms. Files which were digitally signed are considered more important. Due to that, a file with the lowest prevalence level (Level 1) and a valid digital signature is upgraded to the next level (e.g. prevalence “Level 2”). Extinct files which according to several telemetry sources had zero prevalence have been provided to the vendors in order to fix them, but have also been removed from the set and were not counted as false alarms.
The prevalence is given in five categories and labeled with the following colors:
Level | Presumed number of affected users | Comments | |
---|---|---|---|
1 | Probably fewer than hundred users | Individual cases, old or rarely used files, very low prevalence | |
2 | Probably several hundreds of users | Initial distribution of such files was probably much higher, but current usage on actual systems is lower (despite its presence), that is why also well-known software may now affect / have only a prevalence of some hundreds or thousands of users. |
|
3 | Probably several thousands of users | ||
4 | Probably several tens of thousands (or more) of users | ||
5 | Probably several hundreds of thousands or millions of users | Such cases are likely to be seen much less frequently in a false alarm test done at a specific time, as such files are usually either whitelisted or would be noticed and fixed very fast. |
Most false alarms will probably (hopefully) fall into the first two levels most of the time.
False Positives (FPs) serve as a critical measurement for assessing antivirus quality. Moreover, such testing is necessary to prevent vendors from optimizing products solely to perform well in tests. Hence, false alarms are assessed and tested in the same manner as malware tests. A single FP report from a customer can trigger a significant amount of engineering and support work to resolve the issue, sometimes resulting in data loss or system unavailability. Even seemingly insignificant FPs (or FPs on older applications) warrant attention because they may still indicate underlying issues in the product that could potentially cause FPs on more significant files. Below, you’ll find information about the false alarms observed in our independent set of clean files. Entries highlighted in red denote false alarms on files that were digitally signed.
Testcases
All listed false alarms were encountered at the time of testing. False alarms caused by unencrypted data blocks in anti-virus related files were not counted. If a product had several false alarms belonging to the same application, it is counted here as only one false alarm. Cracks, keygens, or other highly questionable tools, including FPs distributed/shared primarily by vendors (which may be in the several thousands) or other non-independent sources are not counted here as false positives.
Sometimes, a few vendors attempt to dispute why some clean or non-malicious software/files are blocked or detected. Explanations may include: the software being unknown or too new and awaiting whitelisting, detection of non-current/old versions due to newer software version availability, limited usage within their userbase, complete absence of any user reports on false positives (thus suggesting false positives are non-existent for them), bugs in the clean software (e.g., an application crashing under certain circumstances), errors or missing information in End User License Agreements making it illegal in some countries (like a missing/unclear disclosure of data transmission), subjective user interface usability issues (e.g., missing the option to close the program in the system tray), software being available only in specific languages (e.g., Chinese), assumptions that the file must be malware because other vendors detect it according to a multiscanning service (copycat behaviour we increasingly observe, unfortunately), or issues with unrelated software from the same vendor/distributor many years ago. If these rules were consistently applied, almost every clean software would be flagged as malware at some point. Such dispute reasons often lack validity and are therefore rejected. Antivirus products could enhance user control and understanding by offering options such as filtering based on language or EULA validity and providing clear explanations for detections rather than blanket classification as malware. This would empower users to manage and understand detection reasons more effectively. Ultimately, it’s not about which specific file is misclassified but that it is misclassified. Achieving a high malware score is effortless if done with lax signatures/heuristics at the expense of false positives. Although we even list here the prevalence of the files, the same detection rules causing those FPs on some rare files can as well be the cause for a major FP case if the detection signatures/heuristics are not properly fixed/adapted.
Test Results
The detection names presented were primarily obtained from pre-execution scan logs, where available. If a threat was blocked during or after execution, or if no clear detection name was identified, we indicate “Blocked” in the “Detected as” column.
Copyright and Disclaimer
This publication is Copyright © 2024 by AV-Comparatives ®. Any use of the results, etc. in whole or in part, is ONLY permitted after the explicit written agreement of the management board of AV-Comparatives prior to any publication. AV-Comparatives and its testers cannot be held liable for any damage or loss, which might occur as result of, or in connection with, the use of the information provided in this paper. We take every possible care to ensure the correctness of the basic data, but a liability for the correctness of the test results cannot be taken by any representative of AV-Comparatives. We do not give any guarantee of the correctness, completeness, or suitability for a specific purpose of any of the information/content provided at any given time. No one else involved in creating, producing or delivering test results shall be liable for any indirect, special or consequential damage, or loss of profits, arising out of, or related to, the use or inability to use, the services provided by the website, test documents or any related data.
For more information about AV-Comparatives and the testing methodologies, please visit our website.
AV-Comparatives
(April 2024)