File Detection Test February 2011
of Malicious Software including false alarm test
|Test Period||February 2011|
|Number of Testcases||403543|
|Online with cloud connectivity|
|False Alarm Test included|
The File Detection Test is one of the most deterministic factors to evaluate the effectiveness of an anti-virus engine. These test reports are released twice a year including a false alarm test. For further details please refer to the methodology documents as well as the information provided on our website. In this test, the following 20 up-to-date Security Products were tested using 403543 prevalent malware samples.
- PC Tools Spyware Doctor with AV 8.0Build: 188.8.131.524
Each test system is running on Microsoft Windows XP SP3 including a respective security product, which was last updated on the 10th of February 2011. The malware sets were frozen on the 22th February 2011. All products had Internet/cloud-access during the test and were tested using default settings. To ensure that all file recognition capabilities are used, we enabled scan of all files, scan of archives and scan for PUA in all products.
On each test system the malware set is scanned. The detections made by the security product are noted and analysed. Although no samples were executed during this test, we considered cases where malware would be recognized on-access, but not on-demand. The test is thus called File Detection Test (as opposed to the earlier On-Demand Tests), as on-access scanning is taken into consideration.
Please note: Several products make use of cloud technologies, which require an active internet connection. Our test is performed using an active internet connection. Although we do not longer show the baseline detection rates without cloud and show instead only the results with active cloud, users should be aware that detection rates may in some few cases be lower if the scan is performed while offline. The cloud should be considered as an additional benefit/feature to increase detection rates (as well as response times and false alarm suppression) and not as a full replacement for local offline detections. Vendors should make sure that users are warned in case that the connectivity to the cloud gets lost e.g. during a scan, which may affect considerably the provided protection and make e.g. the initiated scan useless. We have seen that products which rely much on the cloud may perform better in detecting PE malware, while scoring lower in detecting malware in non-PE format, like present in the “other malware/viruses” category. AMTSO has a rudimentary test to verify the proper functionality of cloud-supported products.
The test-set used has been built consulting telemetry data with the aim of including prevalent malware samples from the last weeks/months prior to the test date which are/were endangering users in the field and consisted of 403543 samples. Furthermore, the distribution of families in the test-set has been weighted based on family-prevalence and was build based on Microsoft’s global telemetry data. This means that as more prevalent a malware family is, as more samples from that family are included in the test-set.
Hierarchical Cluster Analysis
This dendrogram shows the results of the cluster analysis. It indicates at what level of similarity the clusters are joined. The red drafted line defines the level of similarity. Each intersection indicates a group.
As announced in previous reports (and applied already in the Whole-Product Dynamic Test of 2010), this year the awards for this test are given as follow: The total detection rates (with two decimal places) are grouped by the testers after looking at the clusters build with the hierarchal clustering method. The false alarms are taken into account as usual (may be applied stricter in future for “very many” and “crazy many”), but we are evaluating to change the FP rating.
By using clusters, there are no longer fixed thresholds to reach, as the thresholds change based on the various results. The testers may group the clusters rationally and not rely solely on the clusters, to avoid that if e.g. all products would in future score badly, they do not get high rankings anyway.
Users which prefer the old award system, can apply themselves the rating system based on fixed percentages, but should keep in mind that the test-set is just a subset and not an absolute set, so fluctuations of single detection rates in different tests should not be overrated. Users can look at the numbers to compare the different detections rates of products within a specific test over a set of malware.
(given by the testers after consulting statistical methods)
|Very few (0-2 FPs)|
Few (3-15 FP's)
|Many (16-100 FPs)|
|Very many (101-500 FPs)|
|Crazy many (over 500 FPs)|
As nowadays Windows viruses, Macro viruses and scripts are only a small group compared to the prevalent number of Trojans, backdoors, worms, etc., those subgroups are no longer listed separately. They are now counted in the group “other malware/viruses”, together with Rootkits, Exploits, etc.
The test-set used contained 403543 recent/prevalent samples from the last few weeks/months. We estimate the remaining error margin on the final percentages to be below 0.2%.
Total detection rates (clustered in groups)
Please consider also the false alarm rates when looking at the file detection rates below.
Graph of missed samples (lower is better)
The graph below shows the test results against “out-of-box” Malware detection provided by Microsoft Defender, highlighted as the baseline.
The results of our on-demand tests are usually applicable also for the on-access scanner (if configured the same way), but not for on-execution protection technologies (like HIPS, behaviour blockers, etc.).
A good detection rate is still one of the most important, deterministic and reliable features of an Anti-Virus product. Additionally, most products provide at least some kind of HIPS, behaviour-based, reputation-based or other functionalities to block (or at least warn about the possibility of) malicious actions e.g. during the execution of malware, when all other on-access and on-demand detection/protection mechanism failed.
Please do not miss the second part of the report (it will be published in a few months) containing the retrospective test, which evaluates how well products are at detecting new/unknown malware.
Even if we deliver various tests and show different aspects of Anti-Virus software, users are advised to evaluate the software by themselves and build their own opinion about them. Test data or reviews just provide guidance to some aspects that users cannot evaluate by themselves. We suggest and encourage readers to research also other independent test results provided by various well-known and established independent testing organizations, in order to get a better overview about the detection and protection capabilities of the various products over different test scenarios and various test-sets.
Scanning Speed Test
Anti-Virus products have different scanning speeds due to various reasons. It has to be taken in account how reliable the detection rate of an Anti-Virus is; if the Anti-Virus product uses code emulation, if it is querying cloud data, if it does a deep heuristic scan analysis and active rootkit scan, how deep and thorough the unpacking and unarchiving support is, additional security scans, if it really scans all file types (or uses e.g. white lists in the cloud), etc.
Most products have technologies to decrease scan times on subsequent scans by skipping previously scanned files. As we want to know the scan speed (when files are really scanned for malware) and not the skipping files speed, those fingerprinting technologies are disabled and not taken into account here. In our opinion some products should inform the users more clearly about the performance-optimized scans and then let the users decide if they prefer a short performance-optimized scan (which does not re-check all files, with the potential risk of overlooking infected files!) or a full-security scan.
The following graph shows the throughput rate in MB/sec (higher is faster) of the various Anti-Virus products when scanning (on-demand) with highest settings our whole set of clean files (used for the false alarm testing). The scanning throughput rate will vary based on the set of clean files5, the settings and the hardware used.
The average scanning throughput rate (scanning speed) is calculated by the size of the clean-set in MB’s divided by the time needed to finish the scan in seconds. The scanning throughput rate of this test cannot be compared with future tests or with other tests, as it varies from the set of files, hardware used etc. The scanning speed tests were done under Windows XP SP3, on identical Intel Core 2 Duo E8300/2.83GHz, 2GB RAM and SATA II disks. In 2012 we will probably no longer provide the on-demand scanning speed test inside the on-demand detection report.
False Positive (False Alarm) Test Result
In order to better evaluate the quality of the detection capabilities (distinguish good files from malicious files) of anti-virus products, we provide also a false alarm test. False alarms can sometimes cause as much troubles as a real infection. Please consider the false alarm rate when looking at the detection rates, as a product which is prone to cause false alarms achieves higher scores easier. All discovered false alarms were reported and sent to the respective Anti-Virus vendors and should now have been already fixed.
|1.||McAfee||0||very few FPs|
|3.||Bitdefender, eScan, F-Secure||3||few FPs|
|8.||Kaspersky Lab, Trustport||12|
|11.||G DATA, Panda||18||many FPs|
|15.||Qihoo||104||very many FPs|
The graph below shows the number of false alarms found in our set of clean files by the tested Anti-Virus products.
Details about the discovered false alarms (including their assumed prevalence) can be seen in a separate report available at: http://www.av-comparatives.org/wp-content/uploads/2012/04/avc_fps_201102_en.pdf
A product that is successful at detecting a high percentage of malicious files but suffers from false alarms may not be necessarily better than a product which detects less malicious files but which generates fewer false alarms.
The following chart shows the combined file detection rates and false alarms.
Award levels reached in this File Detection Test
AV-Comparatives provides a 3-level-ranking-system (STANDARD, ADVANCED and ADVANCED+). As this report contains also the raw detection rates and not only the awards, expert users that e.g. do not care about false alarms can rely on that score alone if they want to.
* these products got lower awards due to false alarms
Information about used additional third-party engines/signatures inside the products: eScan, F-Secure and Qihoo 360 are based on the Bitdefender engine. G DATA is based on the Avast and Bitdefender engines. PC Tools is using the signatures of Symantec. Trustport is based on the AVG and Bitdefender engines. Webroot is based on the Sophos engine.
Even if we deliver various tests and show different aspects of anti-virus software, users are advised to evaluate the software by themselves and form their own opinions about them. Test data or reviews just provide guidance on some aspects that users cannot evaluate by themselves. We encourage readers to additionally consult other independent test results provided by various well-known and established independent testing organizations, in order to get a better overview about the detection and protection capabilities of the various products over different test scenarios and various test-sets. A list of various reputable testing labs can be found on our website.
Nowadays, almost all products run with the highest protection settings by default (at least during on-demand / scheduled scans), some however may automatically switch to the highest settings once infection detections begin to occur. Due to this, and in order to ensure comparable results, we tested all products with the highest settings unless explicitly advised by the security vendors. The vendors may do this as they prefer the highest settings not to be used due to high number of False Alarms, or perhaps the highest settings will have a performance impact, or maybe they are planning to change/remove the setting in the near future. Below are some notes about the settings used (scan all files etc is always enabled), e.g.: where the settings are not set to the highest by default:
- Avast, AVIRA, Kaspersky, Symantec: asked to get tested with heuristic set to high/advanced. For this reason, we recommend users to consider also setting the heuristics to high/advanced.
- F-Secure, Sophos: asked to get tested and awarded based on their default settings (i.e. without using their advanced heuristics / suspicious detections setting).
- AVG, AVIRA: asked to do not enable/consider the informational warnings of packers as detections. So, we did not count them as detections (neither on the malware set, nor on the clean set).
AV-Comparatives prefers to test with default settings. As most products run with highest settings by default (or switch to highest automatically when malware is found, making it impossible to test against various malware with “default” settings), in order to get comparable results we set also the few remaining products to highest settings (or leave them to lower settings) in accordance with the respective vendors. We kindly ask vendors to provide stronger settings by default, i.e. set their default settings to highest levels of detection, esp. for scheduled scans or scans initiated by the user this would make more sense. We also kindly ask vendors to remove paranoid settings inside the user interface which are too high to be ever of any benefit for normal users. As some vendors decided to take part in our tests using the stronger settings (although they know that this will be applied and impact also other tests like false alarm test, performance test, etc.), it appears that the better option would be to go for the stronger settings by default and that is why we recommend users to consider to use those settings too.
Copyright and Disclaimer
This publication is Copyright © 2011 by AV-Comparatives ®. Any use of the results, etc. in whole or in part, is ONLY permitted after the explicit written agreement of the management board of AV-Comparatives prior to any publication. AV-Comparatives and its testers cannot be held liable for any damage or loss, which might occur as result of, or in connection with, the use of the information provided in this paper. We take every possible care to ensure the correctness of the basic data, but a liability for the correctness of the test results cannot be taken by any representative of AV-Comparatives. We do not give any guarantee of the correctness, completeness, or suitability for a specific purpose of any of the information/content provided at any given time. No one else involved in creating, producing or delivering test results shall be liable for any indirect, special or consequential damage, or loss of profits, arising out of, or related to, the use or inability to use, the services provided by the website, test documents or any related data.
For more information about AV-Comparatives and the testing methodologies, please visit our website.