DonalOBrien.net
Comment Spam Prevention
11 September 2008 14:13

As promised, here's a brief description of how I manage comment spam. I actually employ two different methods. I forget the original sites I found these on, so if anyone recognizes their method here, let me know and I'll be sure to give credit where it is due.

The first method is to parse the comment and assign points to it. If a comment ends up with a negative spam score, it is marked as spam and will not be displayed to regular readers. I regularly check these to see if anything has been incorrectly identified as spam (or indeed incorrectly identified as not). However, if the score is less than -20, it is deemed as definite spam and is discarded. I'm not going to explain my exact rules, but here's a basic outline of some of them:

  • Every URL in the comment is a lost point (-1)
    • However, less than a certain number of URLs is a plus
  • A comment less than a certain number of characters gets a penalty
  • Every occurance of a stop word (e.g. porn, viagra, casino) is a penalty.
  • If the comment contains nothing but links, it loses a lot of points.
  • If the comment is empty, it is spam.

As you may be able to tell, it is quite hard to get positive points. One might think that this would lead to a lot false positives. However, I have yet to come across a comment incorrectly identified as spam.

The second method is much simpler and (surprisingly) effective. The majority of comment spam is left by a bot (rather than a human). Most bots will parse the HTML to determine the form the HTTP request needs to take by looking at the elements of the comment form. I have added a text-field named "email". Using CSS, I have hidden it from view (using the display style). The bots don't apply the CSS when parsing the HTML document, so they think this is another field that needs to be filled in. However, a human will never even see it and won't (in general) be aware of its presence. So, if this field is filled in, it is assumed the comment was posted by a bot and the whole thing is automatically discarded, without even going through the scoring process above.

Since putting in place the second method, I have recieved virtually no comment spam. The first method is fairly redundant at the moment, but it's handy to have it in place as a backup for the day when the bots smarten up (which they will sooner or later).

Add Comment



Comments
Mushtaq
21 June 2009 22:30
THANKS
05 June 2009 00:08
good converter
No one
01 June 2009 05:11
What is this???
28 May 2009 11:36
like the idea of the hidden field for identifying spam.
One quick question though. If someone is using one of the many automatic form fillers that exist for IE and Firefox - wouldn't that also populate this field and cause the post to file?

paddlemom
20 May 2009 08:17
can you check your temperature converter. I am trying to convert celcius to farentheight and I get what looks like a hex number, rather than a real value. Tks.
12 May 2009 04:10
my neme is sonia
24 April 2009 16:54
Are you no longer blogging? I came back and nothing new in many months :(
17 April 2009 09:18
i want to convert
06 April 2009 21:13
Currency convertor brought me here :(
09 February 2009 05:23
Hi Barry,
The hidden field is supposed to be empty. A human will never see it, but a bot (at least the current crop) doesn't realize that it's hidden (because it's done using CSS). So, if it's filled in, we assume that it's a bot doing the filling and ignore the post.
Barry
15 January 2009 07:12
Donal,

I like the idea of the hidden field for identifying spam.
One quick question though. If someone is using one of the many automatic form fillers that exist for IE and Firefox - wouldn't that also populate this field and cause the post to file?
02 January 2009 05:49

All the steps are clearly mentioned. Thanks
29 December 2008 21:40
WTF is this...
16 December 2008 16:10
I typed in currency converter and found this page also.
30 November 2008 16:23
I was wondering how come I was shown this page from the search index even though the page has a different content to what I searched for? Happy Christmas and the new years :)
13 November 2008 17:19
Sounds like good measures taken to prevent comment spam in wordpress Blogs comments postings. I am about to do the same cause the spam received in our comments bugs us all. Thanks!
29 October 2008 22:12
Yes this page does show in under currency converter search :) , but I am glad about it!

I have unit converter pages online, would you please be interested in placing my website's link (code below) on your page with the unit conversion tables? Thank you very much.

<!--CONVERT-TO.COM LINK CODE STARTS HERE-->
<a title="Convert-to.com is related to online calculators and converters." href="http://convert-to.com" target="_top">Convert To dot com</a>
<!--CONVERT-TO.COM LINK CODE ENDS HERE-->
Private
15 October 2008 05:06
Why the hell does this page come up with Google Currency converter in the Google Toolbar?!
14 October 2008 00:06
Dear Sir I Wish My web
your web site link in my web page and my web page link is your web page