Real-World Protection Test March-June 2011

Date	June 2011
Language	English
Last Revision	August 19th 2011

Release date	2011-08-20
Revision date	2011-08-19
Test Period	March - June 2011
Number of Testcases	2480
Online with cloud connectivity
Update allowed
False Alarm Test included
Platform/OS	Microsoft Windows
Methodology	Click here

Tested Products
Test Results
Summary Result
Awards

Introduction

The threat posed by malicious software is growing day by day. Not only is the number of malware programs increasing, also the very nature of the threats is changing rapidly. The way in which harmful code gets onto computers is changing from simple file-based methods to distribution via the Internet. Malware is increasingly infecting PCs through e.g. users deceived into visiting infected web pages, installing rogue/malicious software or opening emails with malicious attachments.

The scope of protection offered by antivirus programs is extended by the inclusion of e.g. URL-blockers, content filtering, anti-phishing measures and user-friendly behaviour-blockers. If these features are perfectly coordinated with the signature-based and heuristic detection, the protection provided against threats increases.

In spite of these new technologies, it remains very important that the signature-based and heuristic detection abilities of antivirus programs continue to be tested. It is precisely because of the new threats that signature/heuristic detection methods are becoming ever more important too. The growing frequency of zero-day attacks means that there is an increasing risk of malware infection. If this is not intercepted by “conventional” or “non-conventional” methods, the computer will be infected, and it is only by using an on-demand scan with signature and heuristic-based detection that the malware can be found (and hopefully removed). The additional protection technologies also offer no means of checking existing data stores for already-infected files, which can be found on the file servers of many companies. Those new security layers should be understood as an addition to good detection rates, not as replacement.

In this test all features of the product contribute protection, not only one part (like signatures/ heuristic file scanning). So the protection provided should be higher than in testing only parts of the product. We would recommend that all parts of a product should be high in detection, not only single components (e.g. URL blocking protects only while browsing the web, but not against malware introduced by other means or already present on the system).

The Whole-Product Dynamic “Real-World” Protection test is a joint project of AV-Comparatives and the University of Innsbruck’s Faculty of Computer Science and Quality Engineering. It is partially funded by the Republic of Austria.

For this test, we normally use the Internet security suite, as any protection features that prevent the system from being compromised can be used. However, a vendor can choose to enter their basic antivirus product instead, if they prefer. The main versions of the products tested in each monthly test run are shown below:

Tested Products

Test Procedure

Testing dozens of antivirus products with 100 URLs each per day is a lot of work which cannot be done manually (as it would be thousands of websites to visit and in parallel), so it is necessary to use some sort of automation. This automation has been developed jointly by the Institute of Computer Science of the University of Innsbruck and AV-Comparatives.

Over the year we had to introduce several changes into the automated systems to prevent some AV vendors trying to “game” the system, as well as update/rewrite our tools due to unannounced changes in the security products, which made it harder to create automated systems. We kindly ask vendors to inform us in advance in the event of product changes which can affect automated testing systems.

Lab Setup

The entire test is performed on real workstations. We do not use any kind of virtualization. Each workstation has its own internet connection with its own external IP. We have special agreements with several providers (failover clustering and no traffic blocking) to ensure a stable internet connection. The tests are performed using a live internet connection. We took the necessary precautions (with specially configured firewalls, etc.) not to harm other computers (i.e. not to cause outbreaks).

Every antivirus program to be tested is installed on its own test computer (please note that the term “antivirus program” as used here may also mean a full Internet Security Suite). All computers are connected to the Internet (details below). The system is frozen, with the operating system and antivirus program installed.

Hardware and Software

For this test we used identical workstations, an IBM Bladecenter and network attached storage (NAS).

	Vendor	Type	CPU	RAM	Hard Disk
Workstations	Fujitsu	E3521 E85+	Intel Core 2 Duo	4 GB	80 GB
Blade Center	IBM	E Chassis	–	–	–
Blades	IBM	LS20	AMD Dual Opteron	8 GB	76 GB
NAS	QNAP	TS-859U-RP	Atom Dual Core	1 GB	16 TB Raid 6

The tests are performed under Windows XP SP3 with no further updates. Some further installed vulnerable software includes:

Vendor	Product	Version	Vendor	Product	Version
Adobe	Flash Player ActiveX	10.1	Microsoft	Internet Explorer	7
Adobe	Flash Player Plug-In	10	Microsoft	Office Professional	2003
Adobe	Acrobat Reader	8.0	Microsoft	.NET Framework	3.5
			Sun	Java	6.0.140

Settings

We use every security suite with its default (out-of-the-box) settings. If user interactions are required, we will choose the default option. Our whole-product dynamic test aims to simulate real-world conditions as experienced every day by users. Therefore, if there is no predefined action, we will always use the same action where we consider the warning/message to be very clear and definitive. If the message leaves it up to the user, we will mark it as such, and if the message is very vague, misleading or even suggests trusting e.g. the malicious file/URL/behaviour, we will consider it to be a miss, as the ordinary user would. This year we will be stricter with required user decisions/interactions than last year. We consider “protection” to mean that the system is not compromised. This means that the malware is not running (or is removed/terminated) and there are no significant/malicious system changes. An outbound-firewall alert about a running malware process, which asks whether or not to block traffic form the users´ workstation to the internet is too little, too late and not considered by us to be protection.

Preparation for every testing day

Every morning, any available antivirus software updates are downloaded and installed, and a new base image is made for that day. This ensures that even in the case the antivirus would not finish a bigger update during the day, it would at least use the updates of the morning, as would happen to the user in the real world.

Testing Cycle for each malicious URL

Before browsing to each new malicious URL/test-case we update the programs/signatures. New major product versions (i.e. the first digit of the build number is different) are installed once a month, which is why in each monthly report we only give the product main version number. Our test software starts monitoring the PC, so that any changes made by the malware will be recorded. Furthermore, the detection algorithms check whether the antivirus program detects the malware. After each test case the machine is reverted to its clean state.

Protection

Security products should protect the userâ€™s PC. It is not very important at which stage the protection takes place. This can either be while browsing to the website (e.g. protection through URL Blocker), while an exploit tries to run or while the file is being downloaded/created or while the malware is executed (either by the exploit or by the user). After the malware is executed (if not blocked before), we wait several minutes for malicious actions and also to give e.g. behaviour-blockers time to react and remedy actions performed by the malware. If the malware is not detected and the system is indeed infected/compromised, the process goes to “Malware Not Detecte”. If a user interaction is required and it is up to the user to decide if something is malicious, and in the case of the worst user decision the system gets compromised, we rate this as “user-dependent”. Due to that, the yellow bars in the results graph can be interpreted either as protected or not protected (it´s up to the user).

Due the dynamic nature of the test, i.e. mimicking real-world conditions, and because of the way several different technologies (such as cloud scanners, reputation services, etc.) work, it is a matter of fact that such tests cannot be repeated or replicated in the way that e.g. static detection rate tests can. Anyway, we try to log as much as reasonably possible to prove our findings and results. Vendors are invited to provide useful logs inside their products which can provide the additional proof/data they want in case of disputes. Vendors were given one to two weeks’ time after each testing month to dispute our conclusion about the compromised cases, so that we could recheck if there were maybe some problems in the automation or with our analysis of the results.

In the case of cloud products, we will only consider the results that the products had at the time of testing; sometimes the cloud services provided by the security vendors are down due to faults or maintenance downtime by the vendors, but this is often not disclosed/communicated to the users by the vendors. This is also a reason why products relying too much on cloud services (and not making use of local heuristics etc.) can be risky, as in such cases the security provided by the products can decrease significantly. Cloud signatures/detection/reputation should be implemented in the products to complement the other protection features (like local real-time scan detection and heuristics, behaviour blockers, etc.) and not replace them completely, as e.g. offline cloud services would mean the PCs being exposed to higher risks.

Testcases

We are not focusing on zero-day exploits/malware (although it is possible that they are also present in the URL pool), but mainly on current, visible and relevant malicious websites/malware that are currently out there and problematic to the ordinary users. We are trying to include about 30-50% URLs pointing directly to malware (for example, if the user is tricked by social-engineering into follow links in spam mails or websites, or if the user is tricked into installing some Trojan or other rogue software). The rest/bigger part were exploits/drive-by downloads. These usually seem to be well covered by security products.

We use our own crawling system to search continuously for malicious sites and extract malicious URLs (including spammed malicious links). We also research manually for malicious URLs. If our in-house crawler does not find enough valid malicious URLs on one day, we have contracted some external researchers to provide additional malicious URLs exclusively to AV-Comparatives. Although we have access to URLs shared between vendors and other public sources, we refrain from using these for the tests.

In this kind of testing, it is very important to use enough test cases. If an insufficient number of sam-ples are used in comparative tests, differences in results may not indicate actual differences among the tested products. In fact, we consider even in our tests (with thousands of test-cases) products in the same protection cluster to be more or less equally good; as long as they do not wrongly block clean files/sites more than the industry average.

Ranking System

Ranking system	Protection score Cluster 4	Protection score Cluster 3	Protection score Cluster 2	Protection score Cluster 1
< ∅ FPs	Tested	Standard	Advanced	Advanced+
> ∅ FPs	Tested	Tested	Standard	Advanced

Test Results

Below you can see an overview of the individual testing months:

August 2011 – 403 test cases

September 2011 – 478 test cases

October 2011 – 393 test cases

November 2011 – 332 test cases

We purposely do not give exact numbers for the individual months in this report, to avoid the minor differences of a few cases being misused to state that one product is better than the other in a given month and with a given test-set size. We provide the total numbers in the overall reports, where the size of the test-set is bigger, and differences that are more significant may be observed.

False Positive (False Alarm) Test Result

The false alarm test in the Whole-Product Dynamic test consists of two parts: wrongly blocked domains (while browsing) and wrongly blocked files (while downloading/installing). It is necessary to test both scenarios because testing only one of the two above cases could penalize products which focus mainly on one type of protection method, either e.g. URL/reputation-filtering or e.g. on-access / behaviour / reputation-based file protection. As announced in the previous report of 2010, we are now taking into account wrongly blocked domains/files when awarding the products.

Wrongly blocked domains (while browsing)

We used around two thousand randomly chosen popular domains listed in the Alexa Top One Million sites. Blocked non-malicious domains/URLs were counted as FPs. The wrongly blocked domains have been reported to the respective vendors for review and should now no longer be blocked. By blocking whole domains, the security products are not only risking causing distrust in their warnings, but also eventually causing potential financial damage (beside the damage on website reputation) to the domain owners, including loss of e.g. advertisement revenue. Due to this, we strongly recommend vendors to block whole domains only in the case where the domain’s sole purpose is to carry/deliver malicious code, and to otherwise block just the malicious pages (as long as they are indeed malicious). Products which tend to block URLs based e.g. on reputation may be more prone to this and score also higher in protection tests, as they may block many unpopular/new websites.

Wrongly blocked files (while downloading/installing)

We used over a hundred different applications listed either as top downloads or as new/recommended downloads from about 16 different popular download portals. The applications were downloaded from the websites (if original developer site was given, we used that source instead of the download portal host), saved to disk and installed to see if they get blocked at any stage of this procedure. Additionally, we included a few files whose status as malware had been disputed over the past months of the Dynamic Test. It is noteworthy that the status of some files we initially counted as missed malware was disputed by several vendors, i.e. they regarded the files as clean/legitimate software. Also, the clean status of the very same files was disputed by other vendors (and even by the same vendors who had initially insisted that they were clean) when we counted them as false positives (in one case, additional files were recategorized by a vendor after the deadline/preview to come closer to the mean value). For the indisputable clean ones, some vendors tried to argue that they should not be counted as false positives, because although the files are clean, and their product blocks them (along with most other clean files from the same domains), only very few of their monitored users have downloaded those files or have those files on their machines. In fact, other telemetry/prevalence data shows varying/higher numbers of downloads by users. As it would be easy to get a high protection score by blocking everything that is rare or unknown, wrongly blocked files/sites have to be taken into account too. The user-base argument cannot be independently verified, and any vendor could claim that false alarms or missed malware is not affecting their user base.

The duty of security products is to protect against malicious sites/files, not to censor or limit the access only to well-known popular applications and websites. If the user deliberately chooses a high security setting, which warns that it may block some legitimate sites or files, then this may be considered acceptable. However, we do not regard it to be acceptable as a default setting, where the user has not been warned. None of the products blocked extremely popular applications. As the test is done at points in time and FPs on very popular software/websites are usually noticed and fixed within a few hours, it would be surprising to encounter FPs on very popular applications. Due to this, FP tests which are done e.g. only on very popular applications, or which use only the top 50 files from whitelisted/monitored download portals would be a waste of time and resources. Users do not care whether they are infected by malware which affects only them, just as they do not care if the FP count affects only them.

While it is preferable that FPs do not affect many users, it should be the goal to avoid having any FPs and to protect against any malicious files, no matter how many users are affected or targeted. Prevalence of FPs based on user-base data is of interest for internal testing of AV vendors, but for the ordinary user it is important to know how accurately its product distinguishes between clean and malicious files. Although some vendors gave some of the FPs a very low prevalence rating among their user base, the real mean value (according to other clouds/telemetry data with different user-bases) of the highest reported current prevalence for the encountered file FPs are in the several thousands; the median value is 1000.

The below table shows the numbers of wrongly blocked domains/files:

	Wrongly blocked clean domains/files (blocked / user-dependent[1])	Wrongly blocked score [lower is better]
AVG, Kaspersky Lab	– / – (-)	–
Lavasoft	1 / – (1)	1
Baidu, ESET	2 / – (2)	2
Bitdefender	3 / – (3)	3
Emsisoft	1 / 9 (10)	5,5
Avast, Avira, McAfee	6 / – (6)	6
Sophos	9 / – (9)	9
Tencent	11 / – (11)	11
	average (16)	average (15)
Panda	17 / – (17)	17
Fortinet	19 / – (19)	19
Quick Heal	23 / 1 (24)	23,5
eScan	25 / – (25)	25
Microsoft	29 / – (29)	29
BullGuard, ThreatTrack Vipre	30 / – (30)	30
F-Secure	19 / 47 (66)	42,5
Trend Micro	49 / – (49)	49

[1] Although user dependent cases are extremely annoying (esp. on clean files) for the user, they were counted only as half for the “wrongly blocked rate” (like for the protection rate).

To determine which products have to be downgraded in our award scheme due to the rate of wrongly blocked sites/files, we consulted and backed up our difficult decision by using a clustering method, and by looking at the average scores. Some vendors may consider relatively new files coming from websites that they e.g. do not yet have in their whitelist as being suspicious, esp. if their cloud has no or not much info about those files. This works well for blocking malicious files and increasing protection, but unfortunately it also leads to a higher degree of wrongly blocked clean files, until the popularity/reputation/information about the files improves. The following products (with above average FPs) had to be downgraded: Symantec, PC Tools and Webroot.

Summary Result

Test period: March– June 2011 (2480 Test cases)

The graph below shows the overall protection rate (all samples), including the minimum and maximum protection rates for the individual months.

	Blocked	User dependent	Compromised	Protection Rate [Blocked % + (User dependent % / 2)]	Cluster
Symantec	2458	11	11	99.3%	1
F-Secure	2459	4	17	99.2%	1
Bitdefender	2457	-	23	99.1%	1
G DATA	2453	-	27	98.9%	1
Trend Micro	2446	-	34	98.6%	1
Panda	2445	-	35	98.6%	1
ESET	2436	-	44	98.2%	1
Kaspersky	2424	23	33	98.2%	1
Avast	2407	33	40	97.7%	1

AVIRA	2402	-	78	96.9%	2
Qihoo	2383	31	66	96.7%	2

Sophos	2370	-	110	95.6%	3
AVG	2358	6	116	95.2%	3
Webroot	2348	1	131	94.7%	3
PC Tools	2262	165	53	94.5%	3

McAfee	2320	-	160	93.5%	4
K7	2276	17	187	92.1%	4

Award levels reached in this Real-World Protection Test

We use a ranking system (Tested, STANDARD, ADVANCED and ADVANCED+) to summarize the results (protection and “accuracy”). Overviews of levels reached in previous main tests can be found on our website. The awards are decided and given by the testers based on the observed test results (after consulting statistical models).
The following certification levels are for the results reached in the Whole-Product Dynamic Test:

* these products got lower awards due to false alarms

Notes

In the second half of 2011, we will include (at the request of the respective vendors) “Avast Antivirus Free”, “Panda Cloud Antivirus” and “McAfee Total Protection” instead their respective Internet Security versions.

While “McAfee Internet Security” does include URL protection, it does not contain a proactive solution that actually blocks access to malicious sites. Rather it only warns users of potentially malicious sites through its browser traffic light system. However, McAfee does offer the additional SiteAdvisor Live component that can be purchased through the SiteAdvisor ‘option’ button. Alternatively, “McAfee Total Protection” includes McAfee SiteAdvisor Live by default. McAfee SiteAdvisor Live does have the capabilities to proactively block malicious URLs.

Copyright and Disclaimer

This publication is Copyright © 2011 by AV-Comparatives ®. Any use of the results, etc. in whole or in part, is ONLY permitted after the explicit written agreement of the management board of AV-Comparatives prior to any publication. AV-Comparatives and its testers cannot be held liable for any damage or loss, which might occur as result of, or in connection with, the use of the information provided in this paper. We take every possible care to ensure the correctness of the basic data, but a liability for the correctness of the test results cannot be taken by any representative of AV-Comparatives. We do not give any guarantee of the correctness, completeness, or suitability for a specific purpose of any of the information/content provided at any given time. No one else involved in creating, producing or delivering test results shall be liable for any indirect, special or consequential damage, or loss of profits, arising out of, or related to, the use or inability to use, the services provided by the website, test documents or any related data.

For more information about AV-Comparatives and the testing methodologies, please visit our website.

AV-Comparatives
(August 2011)