Spam me! - Music Banter Music Banter

Go Back   Music Banter > Community Center > The Lounge
Register Blogging Search Today's Posts Mark Forums Read
Welcome to Music Banter Forum! Make sure to register - it's free and very quick! You have to register before you can post and participate in our discussions with over 70,000 other registered members. After you create your free account, you will be able to customize many options, you will have the full access to over 1,100,000 posts.

Reply
 
Thread Tools Display Modes
Old 07-16-2009, 09:03 AM   #1 (permalink)
Fish in the percolator!
 
Seltzer's Avatar
 
Join Date: Dec 2005
Location: Hobbit Land NZ
Posts: 2,870
Default Spam me!

I never thought I'd be saying this, but I'd like you to forward me your spam to spamme.repository@gmail.com in its original state if possible (without > symbols and Fwd: prefixes).

As a personal project, I'm coding a Bayesian spam filter (link here for anyone who's interested). In a nutshell, Bayesian spam filtering involves classifying e-mails as spam or ham (not spam) based on the word content. To illustrate by simplistic example, words like 'inheritance', 'account', 'rolex' are spammy words which tend to occur in spam e-mails. If a newly received e-mail contains a large number of these spammy words, there is a higher chance it will be chucked in the spambox.

In order to build up this database of words and associated probabilistic data, the spam filter needs to undergo a learning phase in which it is fed a mass number of e-mails and told whether they are spam or ham. After the learning phase, it can (theoretically) be entrusted to decide for itself whether an e-mail is spam. I don't actually receive much spam and what spam I do receive is mostly of one type which makes it statistically biased. What I need is a lot of spam of many different types and that's where I hope you guys can help out.
__________________
Seltzer is offline   Reply With Quote
Old 07-16-2009, 03:16 PM   #2 (permalink)
Partying on the inside
 
Freebase Dali's Avatar
 
Join Date: Mar 2009
Posts: 5,584
Default

Will do.

What I've noticed with most of my spam is that the account name and subject are often typed L1k3 th15 in order to elude spam filters.
Is that going to be an issue, or would there be some way to have a program determine numbers used in place of letters?
__________________
Freebase Dali is offline   Reply With Quote
Old 07-16-2009, 04:43 PM   #3 (permalink)
Account Disabled
 
Join Date: Dec 2006
Location: Methville
Posts: 2,116
Default

Also sometimes they dodge filters lik.e thi.s.
The Unfan is offline   Reply With Quote
Old 07-16-2009, 06:05 PM   #4 (permalink)
Partying on the inside
 
Freebase Dali's Avatar
 
Join Date: Mar 2009
Posts: 5,584
Default

They're so clever.
Gotta love the ones that are all:

From: Maria D. Sanchez
Subject: Hey! I finally found you!!!
_______________________________________________
body:

OMG ENLARGE UR PENIS LOL
__________________
Freebase Dali is offline   Reply With Quote
Old 07-16-2009, 10:20 PM   #5 (permalink)
Fish in the percolator!
 
Seltzer's Avatar
 
Join Date: Dec 2005
Location: Hobbit Land NZ
Posts: 2,870
Default

Deliberate misspelling is a common spam technique which most spam filters account for. I imagine that when filters encounter an unfamiliar word (not in the database), they cycle through all of its possible forms given the possible forms of each letter and check that none of the results match a familiar word before confirming it as a new word and adding it.

And the spamming technique of having an innocent subject line, possibly some innocent text and the spam message following that, is insidious for two reasons. The first is that having examined the legit looking subject line, people will often open the e-mail based on that. The second is Bayesian poisoning which fools some spam filters - spammers will often insert legit looking paragraphs which contain non-spammy words and then follow those with the spam content. Since Bayesian spam filters tend to consider the spamminess of all words in the e-mail, the legit paragraphs lower the overall spamminess of the e-mail and can allow this spam to slip through the filter. There are simple ways of preventing common Bayesian poisoning though.

A very common spamming technique I see nowadays is image spam. The idea there is that the spam is contained within the image so text-based spam filters cannot process it. But Gmail uses optical character recognition (as used in scanners) to extract text from those pictures - and it probably treats the text quite suspiciously.
__________________
Seltzer is offline   Reply With Quote
Old 07-23-2009, 06:29 AM   #6 (permalink)
Fish in the percolator!
 
Seltzer's Avatar
 
Join Date: Dec 2005
Location: Hobbit Land NZ
Posts: 2,870
Default

So does anyone else have some delectable spam for me? If you'd like to forward it, the address is spamme.repository@gmail.com
__________________
Seltzer is offline   Reply With Quote
Reply

Thread Tools
Display Modes

Similar Threads



© 2003-2024 Advameg, Inc.