We bloggers generally pride ourselves on uniqueness and creativity. We get a rush when we see others linking to our posts and reading our feeds, since it usually means that they find value in what we have to say. Unfortunately, not everyone who reads your blog does so for legitimate reasons. Some unscrupulous individuals in the blogosphere are only out to scrape your content for their own websites, ripping off your material and claiming it as their own.
Should I feel flattered?
After all, imitation is the greatest form of flattery, isn’t it? Generally speaking, no, you shouldn’t be happy about it. Scrapers have little respect for your content except insofar as it can make money for them. To use an analogy, they’re not celebrity impersonators; they’re guys in trench coats selling fake Rolex watches.
How do I stop it from happening?
Short of putting your blog behind a password barrier (which I don’t recommend), there’s really no defense against someone scraping your website. Basically, if they can read it, they can steal it, and there’s no sense in keeping everyone from reading your blog when 99% of your visitors don’t have any malicious intent.
There are some ways of hindering (but not stopping) scrapers. For example, if you’re technically savvy enough to understand server logs, you can deny access to your site based on IP, domain, or irregular user agent as recommended in the most recent Whiteboard Friday at SEOmoz. Note, however, that techniques like this can still be bypassed by clever scrapers.
How do I find scraper sites?
To catch scrapers in the act, just take a reasonably popular post on your site that’s a few weeks old. Find a short sentence in the post that seems unique, put quotes around it, and plug it into Google. Whatever comes up that isn’t your site is potentially a scraper. If you don’t find anything at first, try it with different posts and sentences until you’re satisfied that nobody’s scraping you.
Edit: Since writing this, several people have brought CopyScape to my attention as a useful tool for finding scrapers. It’s a search-like service that crawls a page on your site and tells you where others might have copied it. Despite requiring a paid membership to see full results, I’d say it’s pretty handy.
What should you do when you find them?
The first thing you should do is attempt to contact the scraper, requesting that they cease and desist their activities. This seems to have worked well for Maki over at Dosh Dosh, and it’s definitely the polite way of handling things (whether or not scrapers deserve the courtesy). If you don’t receive a positive response, it’s advisable to continue with a more aggressive approach.
There are several ways to make sure that scrapers get their comeuppance. For starters, if their site uses AdSense, you can report them to Google for a violation of the AdSense terms and conditions. The same may apply for other affiliate ad programs. If it results in their account being suspended, they won’t be making money off of your content or anyone else’s.
It’s also possible to tell the scraper’s web host about their illegitimate dealings. Just plug their domain name into any Whois lookup to get the contact info. Forget shutting down their income; if the host takes action, you may be able to shut down their entire site.
Blogs hosted on WordPress.com can also be reported for scraping.
Should I take legal action?
Copyright is tricky business on the World Wide Web, mainly because one country’s laws may not be respected across borders. If the scraper’s actions have been very damaging, you may want to consider legal action. Then again, there’s rarely a lot of harm done, so it’s probably best to forego a lawsuit. (Note: Since I’m not a legal professional, this is just my opinion. Go consult a lawyer if you want real legal advice.)
If you are considering mounting a lawsuit against a scraper, it may be worthwhile to review the Digital Millennium Copyright Act (in the US) or the European Union Copyright Directive (in Europe).
But aren’t they hurting my search engine rankings?
A few years ago, the answer might have been yes. Times have changed, though, and search engines have wised up to the practice of scraping. Chances are good that the scraper’s version of your content will be viewed as what it is, a copy. You’ll get the credit for your originality; they’ll get flagged as potential spam, eventually being penalized or delisted entirely.
Note that there’s a fine line between syndication and scraping. Some sites may echo your posts as a means of highlighting content around the web that they view as important. In fact, that’s one of the main reasons that RSS exists. Generally speaking, if the website in question gives you proper credit and features your content alongside others in a similar theme, its intent may not be malicious. If, however, their website is basically a copy of yours with no credit or links back to yours, it’s probably a scraper.