If You Post it, they will SPAM it

Most people don't care for spam, in all forms.
I hope that my Internet savvy readers will know that you should never, ever, post a personal email address online in public view. In personal emails, in password-protected forums, sure, post away, but otherwise, posting an email address in plain-text is a one-way ticket to SPAM-ville.
So, if you already know this stuff, why am I writing about it? Because, obviously, not everyone does. Over the past year, I’ve been responsible for the design and upkeep of a local church web site. Of course, one of the best (nerdy) perks is being able to analyze all the unique stats that roll in. One very helpful metric, the “search engine terms” metric, as its commonly referred to, shows you what people terms or phrases that a visitor bounced off a search engine in order to find your site. An interesting trend began to appear after awhile; one that I hadn’t seen before. It seems that someone, or something, had come to the site after searching for something such as “church in california @hotmail.com.” At first, I only saw a couple of these, but after awhile, these hits began to occurring weekly with different phrases, “pastors in california @hotmail.com,” “email contacts of pearsons @hotmail.com,” “prayer 2009 @gmail.co.th,” and so on. After digging into the stats more, I was able to pull the IP address of the machines that had landed on the site after those searches. The IP address? 74.125.77.132. I’ll wait a second for the nerds to run a trace.
Weird, huh? That address points squarely at Google. Not all the searches had that address attached to them. For example, one search traced back to Togo Telecom, an ISP in France.
No doubt some of you already know what this is all about, but just in case, I’ll dispense with the details of my theory. The hits are coming from bots which are programmed to harvest email addresses for specific campaigns. Yes, even church pastors and staff get spammed from “religious” organizations with special “services” to sell. The method of querying a couple of keywords, then a popular email provider is actually pretty smart in a, let-someone-else-do-the-heavy-lifting kind of way. The hits from Google are most likely a result of the bot choosing to visit the cached link — a snapshot of the web page as it was indexed — provided by Google for each search result so the coveted email address it seeks will still be available on the page, just waiting to be added to a list of email addresses for sale. A search engine bascially hands a list of pages to a bot with email addresses on them, making it even faster to crawl pages than to randomly bounce from site to site hoping to find them.
For example, if I wanted to spam people who are involved with Relay for Life I would search for “relay for life @yahoo.com,” or if I had a fraudulent operation running on fake Scantron forms, I could search “school teacher @hotmail.com.”
So, in review: Don’t post any email address online in plain-text, unless, of course, you enjoy the extra reading material. Currently, the safest way to allow web site visitors to contact you is to use a temporary “throw-away” address, or a form with CAPTCHA verification. Another method I consider safe enough is generating an image that shows the email address without actual text on the page (don’t use the mailto link either!). Any of these email image generators will do. Though your email address appears on the page, it isn’t easily read by a bot harvesting email addresses from text. Though the technology is there, as far as I know, very few spammers bother with OCR (optical character recognition) technology since there are still so many good addresses readily available in plain text.
I wonder what would happen if I Googled “looking for unheard of foreign entity to transfer large sums of cash with no assurance of legitimacy @citibank.com.”
SPAM Clogs the Tubes [Akismet Stats]
There’s a lot of controversy going on all over the Internet and the real world over the massive use of peer-to-peer technology (P2P), and it’s legitimate and illegitimate implications. Namely, the massive amount of bandwidth P2P communications consume. Frankly, I believe the ISPs are blaming P2P users for using the available bandwidth on the network (regardless of the fact that those users have paid for a certain level of service) and threatening to charge for overages on ridiculously low monthly bandwidth caps instead of upgrading their networks as they should be doing just as any other company would have to who needs to keep up with demand in order to remain a viable competitor!
I’m sorry, I digress. That discussion is a whole ‘nother week of blog postings dedicated to the topic.
However, my topic today is related. Akismet (a WordPress bloggers best friend against SPAM) keeps detailed stats on the comments it filters. Every comment that is submitted to a blog with Akismet installed gets passed through Akismet and is checked to see if it looks like SPAM. If it passes the test, the comment is allowed to post on the blog. If for some reason a comment is caught that isn’t SPAM or a comment it let through that is SPAM, I can mark it manually, and Akismet will “learn” from its mistake.
Anyway, back to the stats. Listed here, you can see a basic graph outlining the number of HAM comments (legitimate comments) in blue, and the number of SPAM comments in orange. That is a outrageous HAM to SPAM ratio! And that’s only on blogs that Akismet tracks!
My point is… there’s A LOT OF SPAM. I’m sure the bandwidth it takes to transmit all that crap is fairly substantial. There have been studies that try to pinpoint the number, but they all differ. They do, however, agree that it is certainly more taxing than legitimate communications.
The solution? These ISPs should spend more of their resources locking down something that NOBODY likes (SPAM) and less time alienating their customers over crap like DNS-based behavioral tracking and bandwidth caps.
P.S. Akismet is the reason you don’t have sign-up or enter in some CAPTCHA or other human-verification code in order to comment here! So yeah! Give it up for Akismet!
Splogs Suck
This site has only been up for about a week and already a splog, or spam blog, had gotten a hold of it.
Splogs are “fake” blogs whose content is usually entirely stolen from legitimate blogs. In this case, I wrote a post last week that cited Wikipedia and contained a tag for Wikipedia. I believe this is how the splog in question ripped off my content. Splogs I’ve had experience with will re-post information from any blog that shows up in a Technorati or Google Blog Search feed with a particular tag or keyword.
The splog will attempt to get my content indexed in search engines to generate traffic to the splog’s site which is usually inundated with paid advertising links (read: pr0n).
Often, too, the splog will send a pingback to the original blog. Pingbacks (also known as trackbacks) are used to notify a blog that another blog has cited information from a particular post. The pingback will appear as a snippet of the new post in the comments section of the cited post. This a great feature because it allows easy link trading between bloggers with similar interests, and it gives credit to the original author.
The problem with splogs sending pingbacks are now the sucker gets a link in my comments to his/her rip-off of my post. Fortunately, Akismet is catching on to these and will file it as spam before it hits my blog.
There’s nothing I can do about the bastard stealing my content (except to change the content, or play the sucker), really, but the silver lining in spam blog is that I know my site is being searched and indexed!
Splog in question: http://wikipedia.doorwayblogging.info/ (NSFW)

