Wednesday, June 11, 2008

One of my posts was scraped

A few days ago I wrote a post about Einstein connection to math. When I went to Technorati yesterday I saw that someone linked to this post. When I followed the link, I came to a blog that is clearly used for scraping. From Wikipedia: "A scraper site is a website that copies all of its content from other websites using web scraping. No part of a scraper site is original."

I am not putting a link to it in this post, if you want to visit it go to Technorati and look for blogs that link to Math Pages, or go to the post about Einstein- the link is on the bottom of the page (for now).

The blog in question didn't copy all of the post - just a small paragraph, a quote of Einstein (it was clearly copied by some program). The post there links back to Math Pages, but my name is not mentioned. Such blogs usually use auto generated names - I appear there as "snakeman11689".
I have nothing against getting links, but such blogs have bad reputation. From what I heard getting a lot of links from scrapper sites may hurt the my blog page rank.

For now only one of my posts appeared there, but it is probably not the end of the story. For some reason Blogger does not notify me about new backlinks so it is not simple to discover them.
I am thinking about contacting the owner of this blog and ask him not to scrap my blog. It would also be nice if he used my name on the post he scraped and not an auto generated one. What do you think I should do?

Update: Well, only an hour past after I published this post and I already need to update it.. I just found out that my post Learning Math was also scrapped. It is a different blog this time, but I think it the same person. I tried to use whois to find information about him, but not successfully. If I will manage to find anything interesting I will put it here.

Update 2: Correction - three of my posts were scraped. I noticed the third only now, but it was scraped about a week ago. By the way all of the three scraper blogs domains are of the form "something.net" - without even www at the beginning of the URL.

Update 3: I was told that from reading my post it is not clear if scraping is a good or bad thing. To clarify this point - it is a bad thing. Such blogs are set up to generate money, and they also function as link farms. But this is not why they are a problem. Search engines (especially Google) has a reputation for not liking spam, and this is how such blogs are viewed but search engines. It takes some time for them to get classified as spam, but when it happens it means problems to those they took content from. This is because when they take even a short paragraph from your post, they also leave a backtrack on your blog. For Google it looks like you are linking to them, and linking to sites that are classified as spam can have negative results (page rank reduction for example). If only a post or two was picked up by such site, I doubt there will be any effect. However, three of my posts were scraped during one week - and I doubt it will stop.
To clarify it a bit further - I am reacting to this in such a way mainly because I am currently really annoyed by spam. I get 1-2 spam emails every day to an email address I have to keep (got it from my ISP), one person regularly sends me chain letters and just yesterday I was contacted by some spammer who tried to sell me his "best method for getting money online".

Update 4: I found emails of the owners of the blogs in question - and I sent them an email asking to credit me as the author of the content they scraped and to not post anything taken from my blog again. One of the the email bounced back, and I got a message that another is still not delivered.

No comments: