Scraper No Scraping - Idea for Fighting Plagiarism for Google or Microsoft R&D

Thanks to Barry Campbell, for highlighting this very disturbing article by Lee Gomes in today’s WSJ, titled Writer Creates ‘Original Content’ But Is in for a Surprise.

It talks about how shady firms are paying writers to plagiarize from multiple websites. The plagiarism, or scaping is done to produce search engine optimized content for better rankings. Gomes points out that search engines are responsible for inciting this riot, just like a TV cameraman in a crowd. Search engines are being fooled; but getting richer? Ultimately the real content originators and the advertisers are the ones paying the price. It’s a problem and it needs to be stopped or curtailed.

Here’s an idea for combating plagiarism online. Invent some kind of license tag, or beacon that identifies and recognizes a content originator and tracks future iterations. When content is published with this beacon, it pings the search engines and says here’s some new content by a trusted source. Any new text published on the web using all or parts of the text would then get published to an inbox managed by the Content Originator of that text. The inbox would allow the CO to self-police their content and report alleged cases of plagiarism back to the engine. In cases where there is a possible violation, there should be some kind of escalation procedure to notify the alleged offender to either dispute or remove the offending content. The stick could be some kind of penalization for the domain or IP address. Hey, Google PR Department, if you’re listening to blogs, this might be just the thing to make us COs happy and possible take the sting off of the whole caving into China thing.

My daughter watches Dora The Explorer. Dora and Boots say “Swiper no swiping” to Swiper the Fox. Most of the time the fox goes off with his tail between his legs saying “oh man”. Too bad we can’t just say “Scraper no scapping”, to the scarapers.

Tags:
Filed under: Fighting Spam

Posted by Stephen Turcotte on March 7, 2006 8:44 PM | | Comments (1) | TrackBacks (0)

Comments

As I've said before elsewhere, such a system already exists. Much of it is still in the works, but take a look at this article on ESBNs (Now called ESNs on Numly.com). Though it doesn't have all of the tracking functions you requested at the time, most of them are iun the works and one can already do a search for the unique ESN to locate other uses of a work.

http://www.plagiarismtoday.com/?p=166

And yet another service exists for tracking reuse of RSS feed. Feedburner has redone their stat system to idenfity ?Uncommon Uses? of a work.

http://www.plagiarismtoday.com/?p=183

Very handy stuff.

Maybe this will help you out some.

PS: Sorry about linking to my own site. I just know that i can trust my own articles to be at least somewhat accurate. I hate spreading misinformation.

Posted by: Jonathan Bailey | March 9, 2006 11:17 PM

TrackBack

TrackBack URL for this entry:
http://www.scoutblogging.com/trackback/496

Post a comment

(If you haven't left a comment here before, you may need to be approved by the site owner before your comment will appear. Until then, it won't appear on the entry. Thanks for waiting.)