Other Industries Involved in Screen Scraping
Thomas (2013) shows that 25% (3.75 million hits a day) of website traffic to the Ryanair PLC website (www.ryanair.com) is caused by screen-scrapers, leading to an estimated €900,000 p.a. in legal and administrative costs and millions of Euros per annum lost from potentially 5.5 million passengers.
Ryanair’s difficulties are not company or industry specific. In one of the first cases of its kind, American Airlines v Farechase Inc (2003) won the right to prevent itinerary harvesting. Ryanair have so far prosecuted 36 OTA’s across Europe (TravelMole, 2012), and in 2008 – cancelled thousands of passenger tickets booked via third party websites (Wall Street Journal, 2008).
Kaplan (2009) states that “in a recent 30-day period, more than 75,000 sites “reused” a newspaper article without sharing ad revenue with the original source. Of those sites, which the FSC [Fair Syndication Consortium] describes as “unlicensed” to use the content, there were 112,000 “near-exact” copies of newspaper content
Kaplan goes on to show that Google is responsible for over half of the unlicensed newspaper articles and combined with Yahoo – accounted for over 75% of unlicensed article duplication.
riffiths (2011) cites ‘NetMovers.co.uk’ as an online real estate aggregator
that lists properties scraped from high street agents, but lists them under the ‘Netmovers’ brand using their own contact details, thereby allowing them first option on ‘ancillary’ services such as mortgages or removals – and even selling the lead itself back to the primary estate agent.
Major property aggregator ‘www.home.co.uk’ scrapes over 30 estate agents (Advisory 2013) and uses premium rate telephone numbers as their contact method. CEO of Century 21 Don Lawby stated “I am opposed to anybody taking, just independently, scraping data or removing data without permission…..We have spent millions of dollars and an exorbitant amount of effort to get that data on to our sites.” (Clareity 2013). 93% of realtor executives showed interest in a solution to their problem.
According to Hadfield (2006) ‘Moneysupermarket’ extracts data by screen-scraping and puts pressure on companies’ front-end systems. Esure’s IT Director Mark Foulsham confirms “This places very high loads on us. We get huge CPU spikes if we get requests from Money-supermarket”. Insurance aggregators stand accused of providing misleading and false insurance policy information and prices – and face a detailed regulatory review by the Financial Conduct Authority (FCA) in 2013 (Dale, 2013).
ata aggregation can lead to abuses of various global data protection and financial services acts as they are harvesting and storing financial, medical and personal data specifically legislated and regulated by approved governing bodies. Account aggregators are not covered by such protective regulation.
According to Mugavero (2000), legislation generally forbids recognised financial institutions from sharing customer non-public personal financial information with non-affiliated parties – however, account aggregators use their scraped data for targeted product cross selling, personal advertising and other practises because they fit into a niche not properly governed by existing regulatory schemes, thus exposing them to the risk of data being held and disseminated by unlicensed organisations.
he retail industry
uses screen-scraping to resell goods at a higher price. ‘Wiseguy Tickets Inc’. scraped 1.5 million event tickets from ‘TicketMaster’ and sold them at inflated prices to brokers (making $25 million in the process) who then resold them at higher mark-ups to the public (Mary Pat, 2010).
‘Drop Shipping’ – widely employed by ‘Ebay’ users (Ebay, 2013) and ‘The Hut Group’ (www.zavvi.com) resell products at inflated prices and generate few overheads, little risk, no products to store, no posting and packaging and no obligation to the end purchaser.
Large retailers (Amazon, Walmart, Vodafone, Tesco, etc…) have their websites scraped by price comparison sites such as Pricerunner, Shopzilla, Moneysupermarket or Zavvi. Several dedicated screen-scraping companies such as ‘Profitero’ (www.profitero.com) provide “retail intelligence technology” to monitor 50 million products across 4,000 retailers around the world entirely by screen-scraping.
According to Profitero, subscribers can precisely monitor and track competitors prices and stock in real time and receive alerts when their competitors increase or decrease an item price, what promotions they run or what their stock levels are so they may position their goods accordingly in real time. Their entire business model is dependent upon screen-scraping.
Existing ‘traditional’ anti-scraping techniques heavily utilise IP blocking – a method that is easily circumvented by proxies and fake IP’s, but CANNOT circumvent DATA PORTCULLIS