xisting anti-scraping measures such as ‘Captcha
’ or dedicated account logins can prevent access by automated ‘Bots’ and little else. They are a waste of time. ‘Captcha
’ is bypassed / solved using OCR tools i.e. “www.captchabrotherhood.com
”. Human sweatshops (such as ‘DeathByCaptcha
’) uses cheap 3rd world labour to solve and bypass your anti-scraping processes in just a few seconds.
Screen Scraping is the cornerstone upon which hundreds of online travel agents (i.e. Expedia, Lastminute, Opodo, BravoFly, OnTheBeach, LowCostHolidays…etc) depend, but it is also employed by price comparison sites (PriceRunner, GoCompare, MoneySupermarket…), content aggregators (Yodlee), drop shippers (Ebay, Zavvi, Hut Group), news aggregators (Google, Huffington Post…), affiliate marketeers (NetMovers, Home.co.uk…), data resellers (Profitero), and indexing ‘bots’ (Google, Yahoo).
Aside from the serious legal, administrative, technical and logistical problems for business – consumers are constantly defrauded, copyright and IPR breaches are rife and 1st Tier service or goods providers lose billions. Computer fraud and ID theft is endemic across gambling, entertainment and auction websites, social networking sites are unable to prevent anonymous cyber-bullying and other criminal computer related activities and goods / service providers can lose over 20% of their entire annual revenue, a figure that significantly affects sales, branding, customer relationships and ultimately – share price.
Can Scrapers be Stopped?A
commercial website getting 15 million hits a day cannot expect a human to examine over twenty thousand hits a minute
(350 hits per second!) – but Data Portcullis
can do this.
and our competitors?
ur competitors rely on server traffic monitoring such as IP address blocking –
an outmoded manual process that is childishly simple to bypass. Wilson et al (2010) states that tracking Users by IP address, TCP port number or SSL session ID is a ‘fundamentally flawed’
method of detecting website attackers.
Our competitors “…perform a constant analysis of your website’s traffic and alerts our expert IT engineers whenever there’s an incident that looks like a scraping attack. Our operators evaluate the alert to see whether it’s a real attack or not” (CompuTrad 2013) – in other words – their detection systems are not automated and human operators just do their best to find attackers – results are given usually long after attackers have gone.
This is an inefficient and poor use of human resource, and would struggle to detect (for example) any website being attacked a thousand times an hour from 250 different IP addresses (i.e. once every 15 minutes per IP).