DonalOBrien.net
Comment Spam Prevention
11 September 2008 14:13

As promised, here's a brief description of how I manage comment spam. I actually employ two different methods. I forget the original sites I found these on, so if anyone recognizes their method here, let me know and I'll be sure to give credit where it is due.

The first method is to parse the comment and assign points to it. If a comment ends up with a negative spam score, it is marked as spam and will not be displayed to regular readers. I regularly check these to see if anything has been incorrectly identified as spam (or indeed incorrectly identified as not). However, if the score is less than -20, it is deemed as definite spam and is discarded. I'm not going to explain my exact rules, but here's a basic outline of some of them:

  • Every URL in the comment is a lost point (-1)
    • However, less than a certain number of URLs is a plus
  • A comment less than a certain number of characters gets a penalty
  • Every occurance of a stop word (e.g. porn, viagra, casino) is a penalty.
  • If the comment contains nothing but links, it loses a lot of points.
  • If the comment is empty, it is spam.

As you may be able to tell, it is quite hard to get positive points. One might think that this would lead to a lot false positives. However, I have yet to come across a comment incorrectly identified as spam.

The second method is much simpler and (surprisingly) effective. The majority of comment spam is left by a bot (rather than a human). Most bots will parse the HTML to determine the form the HTTP request needs to take by looking at the elements of the comment form. I have added a text-field named "email". Using CSS, I have hidden it from view (using the display style). The bots don't apply the CSS when parsing the HTML document, so they think this is another field that needs to be filled in. However, a human will never even see it and won't (in general) be aware of its presence. So, if this field is filled in, it is assumed the comment was posted by a bot and the whole thing is automatically discarded, without even going through the scoring process above.

Since putting in place the second method, I have recieved virtually no comment spam. The first method is fairly redundant at the moment, but it's handy to have it in place as a backup for the day when the bots smarten up (which they will sooner or later).

Add Comment



Comments
02 January 2009 05:49

All the steps are clearly mentioned. Thanks
29 December 2008 21:40
WTF is this...
16 December 2008 16:10
I typed in currency converter and found this page also.
30 November 2008 16:23
I was wondering how come I was shown this page from the search index even though the page has a different content to what I searched for? Happy Christmas and the new years :)
13 November 2008 17:19
Sounds like good measures taken to prevent comment spam in wordpress Blogs comments postings. I am about to do the same cause the spam received in our comments bugs us all. Thanks!
29 October 2008 22:12
Yes this page does show in under currency converter search :) , but I am glad about it!

I have unit converter pages online, would you please be interested in placing my website's link (code below) on your page with the unit conversion tables? Thank you very much.

<!--CONVERT-TO.COM LINK CODE STARTS HERE-->
<a title="Convert-to.com is related to online calculators and converters." href="http://convert-to.com" target="_top">Convert To dot com</a>
<!--CONVERT-TO.COM LINK CODE ENDS HERE-->
Private
15 October 2008 05:06
Why the hell does this page come up with Google Currency converter in the Google Toolbar?!
14 October 2008 00:06
Dear Sir I Wish My web
your web site link in my web page and my web page link is your web page