How web scraping works and how to limit your risks: Internet Scambusters #998
Web scraping, or crawling, is a huge data mining operation that both crooks and legitimate operators use to get hold of every personal detail about you that they can.
It's mostly perfectly legal but it can cost you your privacy, to say nothing of the billions of dollars of revenue businesses lose out on.
This week's issue explores how web scraping works and the limited weapons at your disposal to protect yourself.
Let's get started…
For Sale: Your Personal Profile Via Web Scraping
If you or anyone else has ever posted a single fact about you online, even if it's just the color of your hair, you're almost certainly a victim of web scraping.
This is the tactic used by scammers, hackers, and, most commonly, data brokers to build a profile of you that can then be used for crime or sold to retailers and others who want to sell you stuff.
Automated "bots" or "crawlers" constantly search web pages, especially on social media and online forums, for names and information -- and there's plenty of it to find. Then they drop the data into spreadsheets, gradually building up a detailed profile of you and your family.
It's perfectly legal if the information these people harvest is publicly available. Price comparison sites, for example, use crawlers to gather their info, which, as we know, can be extremely useful.
But there are other much more dubious scraping activities. For instance, even though only your friends may be able to see your Facebook posts (if you've used the right security settings), anyone else who visits your page can usually see certain information posted in the "Intro" section.
Unless you've changed privacy settings to the max and even though they're not your friends, they may also be able to see photos of your followers, details of places you visited, movies you've seen, books you've read, and your updated profile photos. Plus, the full details of anything you've posted as "public."
They can work out a lot more about you by setting up quiz and survey pages, which, when you answer or "like," not only gives them your name (so they can find your personal page), but also an insight into your personality, political views, and a lot more.
Get the picture? The web scrapers certainly do.
And it's not just about Facebook. The site is a good example of the risks you face but the scraping bots are everywhere.
Furthermore, scraping apps, which can be bought online, can also be used to dig behind entire websites, reading all the underlying computer code, enabling them potentially to completely copy and reproduce a page. If the code is not protected, scammers then use it to create fake sign-on pages to phish for personal information.
Malicious scrapers also can steal copyrighted blogs, pictures and text, completely ignoring any restrictions imposed by site owners. These activities are, of course, illegal.
You May Be Helping
Just to make things worse, you may actually be helping scammers and spammers by allowing them to use your computer for their scraping activities. This happens when your device is trapped into a network of compromised computers ("botnet") after you unsuspectingly download malware.
Online classified service Craigslist reports that botnets crawl and steal the text from millions of its ads. The data, which may include users' contact information, can be reformatted and then offered for sale. Craigslist alleges harvesters have been charging up to $20,000 a month for their stolen data.
Web scraping is especially a threat to online retailers and similar businesses. Estimates suggest global revenue losses caused by scraping costs them about $70 billion per year.
One scraping software and service seller boasts that the practice "has never been so easy." This seller actually claims to tailor its scrapers to specific business sectors, such as real estate, or to big internet operators like Google Search, TikTok, and Instagram.
Web Scraping Protection for Consumers
But what can we consumers do to protect ourselves?
The first and most important thing to remember is that every single word or picture you ever post, any likes and preferences you show, any "friend" you follow and any social media group you belong to, is potentially vulnerable to scrapers and hackers. Think before you post, "like," share, or answer questions.
Second, use privacy settings to the fullest wherever they're available. Opting, for instance, to allow only friends and followers to see your Facebook posts is not enough. Run a full Facebook privacy check and do the same with every site you regularly visit.
Other actions you can take include:
- Changing your profile name if you currently use your real one on social media sites.
- Using up to date security software to block botnet malware.
- Blocking "friend" requests from people you don't know.
- Considering software and services that offer to remove information about you on the internet.
- Learning what the sites that you use do to protect against data harvesting.
Sadly, when researching this topic, we discovered that most search results are focused on how to scrape rather than how to avoid it and how to defeat sites' attempts to block scraping.
There is very little information available for consumers, and since experts say there's technically no way to stop web scraping, it's up to you.
This Week's Scam Alerts
Say 'No' to USBs: Don't be tempted to insert that mysterious USB drive that arrived in the mail. It's the latest trick being used by scammers to get you to install malware and ransomware on your PC. USBs are as cheap as dirt these days, so crooks send out malware-laden ones in the thousands, hoping that curiosity will prompt recipients to try to see what's on them. But you won't, will you?
$80 Million Haul: Crooks are netting an estimated $80 million per month globally from fake surveys and giveaways by impersonating well-known brands. Using ads, text messages, social media, and on-screen pop-ups, they lure victims to cloned and malicious sites, then take their victims through a series of pages during which they gather information for identity theft or data harvesting, while pretending a big prize is just one more click away. It never is.
Time to conclude for today -- have a great week!