File Detection Test August 2011

Date	August 2011
Language	English
Last Revision	September 27th 2011

of Malicious Software including false alarm test

Release date	2011-09-28
Revision date	2011-09-27
Test Period	August 2011
Number of Testcases	206043
Online with cloud connectivity
Update allowed
False Alarm Test included
Platform/OS	Microsoft Windows

Tested Products
Test Results
Summary Result
Awards

Introduction

The File Detection Test is one of the most deterministic factors to evaluate the effectiveness of an anti-virus engine. These test reports are released twice a year including a false alarm test. For further details please refer to the methodology documents as well as the information provided on our website. In this test, the following 20 up-to-date Security Products were tested using 206043 prevalent malware samples.

Detailed Report

Tested Products

Test Procedure

Each test system is running on Microsoft Windows XP SP3 including a respective security product, which was last updated on the 1^st of August 2011. The malware sets were frozen on the 12^th August 2011. All products had Internet/cloud-access during the test and were tested using default settings. To ensure that all file recognition capabilities are used, we enabled scan of all files, scan of archives and scan for PUA in all products.

On each test system the malware set is scanned. The detections made by the security product are noted and analysed. Although no samples were executed during this test, we considered cases where malware would be recognized on-access, but not on-demand. The test is thus called File Detection Test (as opposed to the earlier On-Demand Tests), as on-access scanning is taken into consideration.

Please note: Several products make use of cloud technologies, which require an active Internet connection. Our tests are performed using an active Internet connection. Users should be aware that detection rates may in some cases be drastically lower if the scan is performed while offline (or when the cloud service is unreachable for various reasons). The cloud should be considered as an additional benefit/feature to increase detection rates (as well as response times and false alarm suppression), and not as a full replacement for local offline detections. Vendors should make sure that users are appropriately warned in the event that the connectivity to the cloud is lost, which may considerably affect the protection provided, and e.g. make an initiated scan useless. While in our test we check whether the cloud services of the respective security vendors are reachable, users should be aware that being online does not necessarily mean that the cloud service of the products they use is reachable/working properly. In fact, sometimes products with cloud functionality have various network issues due to which no cloud security is provided, but the user is not warned. AMTSO has a rudimentary test to verify the proper functionality of cloud-supported products.

Testcases

The test-set used has been built consulting telemetry data with the aim of including prevalent malware samples from the last weeks/months prior to the test date which are/were endangering users in the field and consisted of 206043 samples. Furthermore, the distribution of families in the test-set has been weighted based on family-prevalence and was build based on Microsoft’s global telemetry data. This means that as more prevalent a malware family is, as more samples from that family are included in the test-set.

Hierarchical Cluster Analysis

This dendrogram shows the results of the cluster analysis. It indicates at what level of similarity the clusters are joined. The red drafted line defines the level of similarity. Each intersection indicates a group.

Ranking System

The malware detection rates are grouped by the testers after looking at the clusters built with the hierarchal clustering method. However, the testers do not stick rigidly to this in cases where it would not make sense. For example, in a scenario where all products achieve low detection rates, the highest-scoring ones will not necessarily receive the highest possible award.

	Detection Rate Clusters/Groups (given by the testers after consulting statistical methods)
	4	3	2	1
Very few (0-2 FPs) Few (3-15 FP's)	TESTED	STANDARD	ADVANCED	ADVANCED+
Many (16-100 FPs)	TESTED	TESTED	STANDARD	ADVANCED
Very many (101-500 FPs)	TESTED	TESTED	STANDARD	STANDARD
Crazy many (over 500 FPs)	TESTED	TESTED	TESTED	TESTED

Test Results

The test-set used contained 206043 recent/prevalent samples from the last few weeks/months. We estimate the remaining error margin on the final percentages to be below 0.2%.

During the test Sophos and Webroot (who also use Sophos technology) scored lower detection rates due to issues with their cloud technology. The results shown include estimated cloud detections (but are listed out of competition).

Total detection rates (clustered in groups)

Please consider also the false alarm rates when looking at the file detection rates below.

1.	G DATA	99.7%
2.	Trustport	99.6%
3.	Avira, Qihoo	99.5%
4.	Panda	99.3%
5.	eScan, F-Secure	98.5%
6.	Bitdefender	98.4%
7.	Kaspersky	98.3%
8.	Avast, ESET	97.3%
9.	McAfee	96.8%
10.	Trend Micro	96.6%
11.	AVG	95.7%
12.	Symantec	95.1%

13.	Sophos, Webroot	94.2%
14.	Microsoft	92.3%

15.	PC Tools	88.4%

16.	K7	85.6%

We observed some few vendors potentially are trying to game the tests to get higher scores. Such practices include e.g. strongly disputing malicious files as “clean” or “potentially wanted software” etc. Some try disputing every malicious files which are not detected by the own product as “unimportant/non-prevalent”, even if other telemetry data shows otherwise. Sometimes some vendors also claim that their cloud should have been detecting all the samples that according to us where not detected. Often we could prove that even with an actual cloud/product they are not detected or could not have been detected back then. In our opinion, vendors which rely on the cloud should make sure that their products are always able to send/get cloud data and warn the user if their cloud is offline/unreachable. If a vendors cloud is down or unreachable at time of testing, it is the fault of the product/vendor and not of the user or test as long as it is done with enabled Internet connection. The results reflect the detection rate provided at that time. If certain clouds require a perfectly stable and ultra-fast internet connection, this should be made clear in the system requirements. Otherwise vendors should provide local clouds to home users like some are already doing for corporates with strict privacy policies.
Furthermore, some vendors which see themselves scoring low in a test often aim to get their results removed from a test for marketing reasons. But we do not allow to withdraw from tests as we want to provide results to our readers. We might think in future about ways to solve this problems, too.

Graph of missed samples (lower is better)

Percentages refer to the used test-set only. Even if it is just a subset of malware, it is important to look at the number of missed malware.

The results of our on-demand tests are usually applicable also for the on-access scanner (if configured the same way), but not for on-execution protection technologies (like HIPS, behaviour blockers, etc.).

A good detection rate is still one of the most important, deterministic and reliable features of an Anti-Virus product. Additionally, most products provide at least some kind of HIPS, behaviour-based, reputation-based or other functionalities to block (or at least warn about the possibility of) malicious actions e.g. during the execution of malware, when all other on-access and on-demand detection/protection mechanism failed.

Please do not miss the second part of the report (it will be published in a few months) containing the retrospective test, which evaluates how well products are at detecting completely new/unknown malware (by on-demand/on-access scanner with local heuristic and generic signatures without cloud).

Even if we deliver various tests and show different aspects of Anti-Virus software, users are advised to evaluate the software by themselves and build their own opinion about them. Test data or reviews just provide guidance to some aspects that users cannot evaluate by themselves. We suggest and encourage readers to research also other independent test results provided by various well-known and established independent testing organizations, in order to get a better overview about the detection and protection capabilities of the various products over different test scenarios and various test-sets.

On-Demand Scanning Speed Test

Anti-Virus products have different scanning speeds due to various reasons. It has to be taken in account how reliable the detection rate of an Anti-Virus is; if the Anti-Virus product uses code emulation, if it is querying cloud data, if it does a deep heuristic scan analysis and active rootkit scan, how deep and thorough the unpacking and unarchiving support is, additional security scans, if it really scans all file types (or uses e.g. white lists in the cloud), etc. Most products have technologies to decrease scan times on subsequent scans by skipping previously scanned files. As we want to know the scan speed (when files are really scanned for malware) and not the skipping files speed, those fingerprinting technologies are disabled and not taken into account here. In our opinion some products should inform the users more clearly about the performance-optimized scans and then let the users decide if they prefer a short performance-optimized scan (which does not re-check all files, with the potential risk of overlooking infected files!) or a full-security scan.
The following graph shows the throughput rate in MB/sec (higher is faster) of the various Anti-Virus products when scanning on-demand with highest settings our whole set of clean files used for the false alarm testing. The scanning throughput rate will vary based on the set of clean files, the settings and the hardware used.

The average scanning throughput rate (scanning speed) is calculated by the size of the clean-set in MB’s divided by the time needed to finish the scan in seconds. The scanning throughput rate of this test cannot be compared with future tests or with other tests, as it varies from the set of files, hardware used etc. The scanning speed tests were done under Windows XP SP3, on identical Intel Core 2 Duo E8300/2.83GHz, 2GB RAM and SATA II disks.
In 2012 we will no longer provide the on-demand scanning speed test.

False Positive (False Alarm) Test Result

In order to better evaluate the quality of the detection capabilities (distinguish good files from malicious files) of anti-virus products, we provide a false alarm test. False alarms can sometimes cause as much troubles as a real infection. Please consider the false alarm rate when looking at the detection rates, as a product which is prone to cause false alarms achieves higher scores easier. All discovered false alarms were reported/sent to the respective Anti-Virus vendors and have been fixed.

1.	McAfee	0	very few FPs
2.	Kaspersky, Microsoft, Panda	1
3.	ESET	3

4.	F-Secure, Trend Micro	6	few FPs
5.	Bitdefender	8
6.	Avast	10
7.	Avira	11
8.	G DATA	14

9.	Sophos, Webroot	16	many FPs
10.	K7	23
11.	Qihoo	25
12.	eScan	29
13.	PC Tools	45
14.	AVG	51
15.	Symantec	57
16.	Trustport	59

Details about the discovered false alarms (including their assumed prevalence) can be seen in a separate report available at: http://www.av-comparatives.org/wp-content/uploads/2012/04/avc_fps_201108_en.pdf

Summary Result

A product that is successful at detecting a high percentage of malicious files but suffers from false alarms may not be necessarily better than a product which detects less malicious files but which generates fewer false alarms.

The following chart shows the combined file detection rates and false alarms.

During the test Sophos and Webroot (who also use Sophos technology) scored lower detection rates due to issues with their cloud technology. Further investigations working with the vendor could not determine the root cause of the issue and the results shown include cloud detections. As we are unable to fully verify them independently (as the cloud can not be reverted back), the results of Webroot and Sophos are shown “out of competition”. The results of Sophos and Webroot are only a rough estimate. Sophos and Webroot had a detection rate of ~94.2%. Although the affected vendors are in good-standing, we have to handle their cloud-issue this way to avoid potential misuse / gaming of other future vendors in future tests.

Award levels reached in this File Detection Test

AV-Comparatives provides ranking awards. As this report also contains the raw detection rates and not only the awards, expert users that e.g. do not care about false alarms can rely on that score alone if they want to. The awards are not only based on detection rates – also false positives found in our set of clean files are considered.

* these products got lower awards due to false alarms

Notes

Information about additional third-party engines/signatures used inside the products: eScan, F-Secure and Qihoo 360 are based on the Bitdefender engine. G DATA is based on the Avast and Bitdefender engines. PC Tools is using the signatures of Symantec. Trustport is based on the AVG and Bitdefender engines. Webroot is based on the Sophos engine.

Avast, AVG, AVIRA and Panda wanted to participate in the tests with their free product version.

Even if we deliver various tests and show different aspects of anti-virus software, users are advised to evaluate the software by themselves and form their own opinions about them. Test data or reviews just provide guidance on some aspects that users cannot evaluate by themselves. We encourage readers to additionally consult other independent test results provided by various well-known and established independent testing organizations, in order to get a better overview about the detection and protection capabilities of the various products over different test scenarios and various test-sets. A list of various reputable testing labs can be found on our website.

Nowadays, almost all products run with the highest protection settings by default (at least during on-demand / scheduled scans), some however may automatically switch to the highest settings once infection detections begin to occur. Due to this, and in order to ensure comparable results, we tested all products with the highest settings unless explicitly advised otherwise by the security vendors. The vendors may do this as they prefer the highest settings not to be used due to high number of False Alarms, or perhaps the highest settings will have a performance impact, or maybe they are planning to change/remove the setting in the near future. Below are some notes about the settings used (scan all files etc is always enabled), e.g.: where the settings are not set to the highest by default:

Avast, AVIRA, Kaspersky, Symantec: asked to get tested with heuristic set to high/advanced. For this reason, we recommend users to consider also setting the heuristics to high/advanced.
F-Secure, Sophos: asked to get tested and awarded based on their default settings (i.e. without using their advanced heuristics / suspicious detections setting).
AVG, AVIRA: asked to do not enable/consider the informational warnings of packers as detections. So, we did not count them as detections (neither on the malware set, nor on the clean set).

AV-Comparatives prefers to test with default settings. As most products run with highest settings by default (or switch to highest automatically when malware is found, making it impossible to test against various malware with “default” settings), in order to get comparable results we set also the few remaining products to highest settings (or leave them to lower settings) in accordance with the respective vendors. We kindly ask vendors to provide stronger settings by default, i.e. set their default settings to highest levels of detection, esp. for scheduled scans or scans initiated by the user this would make more sense. We also kindly ask vendors to remove paranoid settings inside the user interface which are too high to be ever of any benefit for normal users. As some vendors decided to take part in our tests using the stronger settings, it appears that the better option would be to go for the stronger settings by default and that is why we recommend users to consider to use those settings too.

Copyright and Disclaimer

This publication is Copyright © 2011 by AV-Comparatives ®. Any use of the results, etc. in whole or in part, is ONLY permitted after the explicit written agreement of the management board of AV-Comparatives prior to any publication. AV-Comparatives and its testers cannot be held liable for any damage or loss, which might occur as result of, or in connection with, the use of the information provided in this paper. We take every possible care to ensure the correctness of the basic data, but a liability for the correctness of the test results cannot be taken by any representative of AV-Comparatives. We do not give any guarantee of the correctness, completeness, or suitability for a specific purpose of any of the information/content provided at any given time. No one else involved in creating, producing or delivering test results shall be liable for any indirect, special or consequential damage, or loss of profits, arising out of, or related to, the use or inability to use, the services provided by the website, test documents or any related data.

For more information about AV-Comparatives and the testing methodologies, please visit our website.

AV-Comparatives
(September 2011)