Discussion:
[SA-exim] Greylisting algorithms after end of DATA
Magnus Holmgren
2007-01-13 13:37:07 UTC
Permalink
Traditional greylisting combines the remote host, envelope sender, and
envelope recipient and checks if that triplet has been seen before (not too
long ago but also at least some time ago) after each RCPT command. (Correct
me if I'm wrong.) The advantage is that it saves bandwidth.

Running SpamAssassin after end of DATA but before accepting the mail gives the
advantage that greylisting can be applied only to grey mail - the delaying of
clearly non-spam mail can be avoided. It also means that e.g. the Message-ID
can be considered when determining whether we have seen the message before.
In fact, nothing prevents us from using an arbitrary set of header fields
(such as Subject, Message-ID, From) in constructing the key, if it gives
better confidence in what we want to know: whether the other end retries
after a temporary failure. (We could even accept delivery and whitelist based
on a partial match, say 3 of 4, to better cope with the braindead mail
servers that unfortunately exist.) After we have determined that it does,
there's no reason to greylist further mail. (Well, there might be a reason to
delay mail from new senders at large ESPs like Hotmail, if that means that
URIs in the spam get the time to end up in URIBLs. This is open to
discussion.)

So, what I suggest for a future SA-Exim version (and to anyone implementing
something similar using only Exim ACLs is this): For each host (or /24 or /64
network), store a list of records representing messages that host has tried
to deliver. A record contains a timestamp and a key, which could be a hash of
$rh_From:, $rh_Subject:, $recipients (but see below) etc. When a message
matches an existing record, check the timestamp, and if enough time has
passed, replace the whole list with "whitelisted" (if not, do nothing). (Most
of the time, just one message arrives before the host gets whitelisted.)

One question to be solved is about $recipients. The envelope recipients have
to be checked since a spammer can send the same spam to many addresses but
with the same From: field. Most often there is only one recipient, and even
otherwise, normally the list is the same from delivery attempt to delivery
attempt, but it could change if one or more recipients were temporarily
rejected on one occasion but not the other. Furthermore, it can't be demanded
that MTAs give the list in the same order each time.

When storing the list of attempted deliveries in a file I'd prefer if the file
didn't have to be rewritten, only appended to. Maybe it can be deemed enough
if one recipient is found in the list of recipients of the first delivery
attempt.

Comments please!
--
Magnus Holmgren ***@lysator.liu.se
(No Cc of list mail needed, thanks)

"Exim is better at being younger, whereas sendmail is better for
Scrabble (50 point bonus for clearing your rack)" -- Dave Evans
Magnus Holmgren
2007-01-20 22:04:10 UTC
Permalink
Post by Magnus Holmgren
So, what I suggest for a future SA-Exim version (and to anyone implementing
something similar using only Exim ACLs is this): For each host (or /24 or
/64 network), store a list of records representing messages that host has
tried to deliver. A record contains a timestamp and a key, which could be a
hash of $rh_From:, $rh_Subject:, $recipients (but see below) etc. When a
message matches an existing record, check the timestamp, and if enough time
has passed, replace the whole list with "whitelisted" (if not, do nothing).
(Most of the time, just one message arrives before the host gets
whitelisted.)
One question to be solved is about $recipients. The envelope recipients
have to be checked since a spammer can send the same spam to many addresses
but with the same From: field. Most often there is only one recipient, and
even otherwise, normally the list is the same from delivery attempt to
delivery attempt, but it could change if one or more recipients were
temporarily rejected on one occasion but not the other. Furthermore, it
can't be demanded that MTAs give the list in the same order each time.
When storing the list of attempted deliveries in a file I'd prefer if the
file didn't have to be rewritten, only appended to. Maybe it can be deemed
enough if one recipient is found in the list of recipients of the first
delivery attempt.
No comments (on this list) so far. One more question: Does anyone use the
Whitelisted count and Query count lines in the tuple files for anything
(debugging, statistics, ...)?
--
Magnus Holmgren ***@lysator.liu.se
(No Cc of list mail needed, thanks)

"Exim is better at being younger, whereas sendmail is better for
Scrabble (50 point bonus for clearing your rack)" -- Dave Evans
Marc MERLIN
2007-01-22 07:57:26 UTC
Permalink
Post by Magnus Holmgren
Post by Magnus Holmgren
When storing the list of attempted deliveries in a file I'd prefer if the
file didn't have to be rewritten, only appended to. Maybe it can be deemed
enough if one recipient is found in the list of recipients of the first
delivery attempt.
No comments (on this list) so far. One more question: Does anyone use the
Whitelisted count and Query count lines in the tuple files for anything
(debugging, statistics, ...)?
That's indeed what I've put it there for, but I never personally used it

Marc
--
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems & security ....
.... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/
Chris Lightfoot
2007-01-14 13:22:28 UTC
Permalink
Post by Magnus Holmgren
In fact, nothing prevents us from using an arbitrary set of header fields
(such as Subject, Message-ID, From) in constructing the key, if it gives
better confidence in what we want to know: whether the other end retries
after a temporary failure. (We could even accept delivery and whitelist based
Identifying messages in this way is obviously in some
sense not adequate, since you can always put any
message-ID on any message; and some messages don't have
message-IDs at all. I don't know whether that level of
sophistication will matter versus the current behaviour of
spammers. If it does then looking at the kinds of hashes
that things like `Vipul's Razor' use is probably a good
idea.
--
``Odd things, animals.
Dogs look up to you. Cats look down to you.
Only pigs see you as an equal.'' (Churchill)
Loading...