[benzedrine.ch logo]
Contents
Home
Daniel Hartmeier
Packet Filter
pfstat
Mailing list
Annoying spammers
Prioritizing ACKs
Transparent squid
Proxy ICB/IRC
milter-regex
milter-spamd
milter-checkrcpt
login_yubikey
Dorabella
Tron
Planet Wars
Hexiom solver
3D-ODRPP
Polygon partition
Mikero's grid puzzle
Dark Star
Misc
Statistics


Reject mail matching regular expressions

Introduction

sendmail's milter API allows programs to register themselves and get called during mail transactions. Such plugins will see all mails passing through sendmail, including SMTP envelope parameters and mail headers and body. They can cause sendmail to reject messages with permanent or temporary error replies or discard messages silently, based on arbitrary conditions.

milter-regex is a very simple plugin that rejects or discards messages matching regular expressions. It doesn't add much processing overhead, so even a busy mail server can afford to run it.

Inline filtering

Filtering mails 'inline', i.e. while the SMTP transaction is happening, has several advantages compared to post-processing as commonly done using procmail. Messages rejected inline do not have to be stored locally just to get deleted again later. The sender immediately gets an SMTP error code and the receiver doesn't generate any bounce messages (which might get sent to fake sender addresses, and cost bandwidth and queue space).

Furthermore, inline filtering applies to all messages passing through the system. A single filter can reject incoming and outgoing messages to and from all users.

Regular expressions

Spam filters like SpamAssassin can use complex algorithms to detect offending messages, at the cost of consuming considerable resources. Regular expression matching is much simpler and allows to reject large volumes of unwanted messages at low cost, greatly reducing the load on more complex filters called subsequently. Regular expressions are a commonly known and versatile tool, and well-suited for quickly matching the most urgent threats.

Motivation

The milter API is relatively new, but already several plugins have been written that filter messages in various ways, some of them using regular expressions in some form. milter-regex does not provide any fundamentally different features. Its main goal is to support both basic and extended regular expressions in a useable way and stay lean enough to be affordable on busy mail servers. It doesn't change or add headers, and relinquishes resources back to sendmail as early as possible (not reading message bodies when there are no expressions to match the body against). milter-regex runs on OpenBSD and is BSD licensed.

Man page

MILTER-REGEX(8)		    System Manager's Manual	       MILTER-REGEX(8)

NAME
     milter-regex - sendmail milter plugin for regular expression filtering

SYNOPSIS
     milter-regex [-d] [-c config] [-f facility] [-j dirname] [-l loglevel]
		  [-m number] [-p pipe] [-r pid-file] [-t] [-u user]
		  [-G group] [-P mode] [-U user]

DESCRIPTION
     The milter-regex plugin can be used with the milter API of sendmail(8) to
     filter mails using regular expressions matching SMTP envelope parameters
     and mail headers and body.

     The options are as follows:

     -d		Don't detach from controlling terminal and produce verbose
		debug output on stdout.

     -c config	Use the specified configuration file instead of the default,
		/etc/milter-regex.conf.

     -f facility
		Use the specified syslog facility instead of the default,
		daemon.

     -j dirname
		Change root to the specified directory.

     -l loglevel
		Only log messages up to and including the specified level.
		See syslog(3) for the numerical values, e.g. the default
		LOG_INFO=6.

     -m number	Ignore mail body after the specified number of lines.

     -p pipe	Use the specified pipe to interface sendmail(8).  Default is
		unix:/var/spool/milter-regex/sock.

     -r pid-file
		Write the pid to the specified file. Default is not to write a
		file.

     -t		Test the configuration file and immediately exit with a status
		indicating whether the file is valid.

     -u user	Run as the specified user instead of the default, _milter-
		regex.	When milter-regex is started as root, it calls
		setuid(2) to drop privileges.  The non-privileged user should
		have read access to the configuration file and read-write
		access to the pipe.

     -G group	Set the group ID of the pipe.

     -P mode	Set the permissions of the pipe to the specified mode instead
		of the default, 0600.

     -U user	Set the user ID of the pipe.

SENDMAIL CONFIGURATION
     The plugin needs to be registered in the sendmail(8) configuration, by
     adding the following lines to the .mc file

	   INPUT_MAIL_FILTER(`milter-regex',
		   `S=unix:/var/spool/milter-regex/sock, T=S:30s;R:2m')

     rebuilding /etc/mail/sendmail.cf from the .mc file using m4(1), and
     restarting sendmail(8).

PLUGIN CONFIGURATION
     The configuration file consists of rules that, when matched, cause
     sendmail(8) to reject mails.  Emtpy lines and lines starting with # are
     ignored, as well as leading whitespace (blanks, tabs).  Trailing
     backslashes can be used to wrap long rules into multiple lines.  Each
     rule starts with one of the following commands:

     reject <message>
	   Subsequent rules cause the mail to be rejected with a permanent
	   error consisting of the specified text part.	 The SMTP reply
	   consists of the three-digit code 554 (RFC 2821 "command rejected
	   for policy reasons"), the extended reply code 5.7.1 (RFC 1893
	   "Permanent Failure", "Security or Policy Status", "Delivery not
	   authorized, message refused") and the text part (which defaults to
	   "Command rejected", if not specified).  This is a permanent
	   failure, which causes the sender to remove the message from its
	   queue without trying to retransmit, commonly generating a bounce
	   message to the sender.

     tempfail <message>
	   Subsequent matching rules cause the mail to be rejected with a
	   temporary error consisting of the specified text part.  The SMTP
	   reply consists of the three-digit code 451 (RFC 2821 "Requested
	   action aborted: local error in processing"), the extended reply
	   code 4.7.1 (RFC 1893 "Persistent Transient Failure", "Security or
	   Policy Status", "Delivery not authorized, message refused") and the
	   text part (which defaults to "Please try again later", if not
	   specified).	This is a temporary failure, which causes the sender
	   to keep the message in its queue and try to retransmit it, commonly
	   for several days.

     discard
	   Subsequent matching rules cause the mail to be accepted but then
	   discarded silently.	Note that connect and helo rules should not
	   use discard.

     quarantine <message>
	   Subsequent matching rules cause the mail to be quarantined in
	   sendmail(8).

     accept
	   Subsequent matching rules cause the mail to be accepted without
	   further rule evaluation.  Can be used for whitelist criteria.

     A command is followed by one or more expressions, each causing the
     previous command to be executed when matched.  The following expressions
     can be used:

     connect <hostname> <address>
	   Reject the connection if both the sender's hostname and address
	   match the specified regular expressions.  The numerical address is
	   either dotted-quad (IPv4) or coloned-hex (IPv6).  The hostname is
	   the result of a DNS reverse resolution of the numerical address
	   (which sendmail(8) performs independantly of the milter plugin).
	   When resolution fails, the hostname contains the numerical address
	   in square brackets.

     helo <name>
	   Reject the connection if the sender supplied HELO name matches the
	   specified regular expression.  Commonly, the sender supplies his
	   fully-qualified hostname as HELO name.

     envfrom <address>
	   Reject the mail if the sender supplied envelope MAIL FROM address
	   matches the specified regular expression.  Addresses commonly have
	   the form <user@host.doma.in>.

     envrcpt <address>
	   Reject the mail if the sender supplied envelope RCPT TO address
	   matches the specified regular expression.

     header <name> <value>
	   Reject the mail if a header matches the specified name and value.
	   For instance, the header "Subject: Test" matches name Subject and
	   value Test.

     body <line>
	   Reject the mail if a body line matches the specified regular
	   expression.

     macro <name> <value>
	   Reject the mail if a sendmail macro value matches.

     The plugin regularly checks the configuration file for modification and
     reloads it automatically.	Signals like SIGHUP will terminate the plugin,
     according to the milter signal handler.  The plugin reacts to any kind of
     error, like syntax errors in the configuration file, by failing open,
     accepting all messages.  When the plugin is not running, sendmail(8) will
     accept all messages.

REGULAR EXPRESSIONS
     The regular expressions used in the configuration rules are enclosed in
     arbitrary delimiters, no further escaping is needed.

     The first character of an argument is taken as the delimiter, and all
     subsequent characters up to the next occurance of the same delimiter are
     taken literally as the regular expression.	 Since the delimiter itself
     cannot be part of the regular expression (no escaping is supported), a
     delimiter must be chosen that doesn't occur in the regular expression
     itself.  Each argument can use a different delimiter, all characters
     except spaces and tabs are valid.

     Two immediately adjacent delimiters form an empty regular expression,
     which always matches and requires no regexec(3) call.  This can be used
     in rules requiring multiple arguments, to match only some arguments.

     See re_format(7) for a detailed description of basic and extended regular
     expressions.

     Optionally, the following flags can be used after the closing delimiter:
     e	  Extended regular expression.	This sets REG_EXTENDED for regcomp(3).
     i	  Ignore upper/lower case.  This sets REG_ICASE.
     n	  Not matching.	 Reverses the matching result, i.e. the mail is
	  rejected if the regular expression does not match.

BOOLEAN EXPRESSIONS
     A rule can consist of either a simple term or more complex expressions.
     A term has the form

     header /From/ /domain/i

     and expressions can be built combining terms with operators "and", "or",
     "not" and parentheses, as in

     header /From/ /domain/i and body /money/
     ( not header /From/ /domain/ ) and ( body /sex/ or body /fast/ )

     Operator precedence should not be relied on, instead parentheses should
     be used to resolve any ambiguities (they usually produce syntax errors
     from the parser).

MACROS
     Macros allow to store terms or expressions as a name, and $name can be
     used as term within other rules, expressions or macro definitions.
     Example:

     friends	     = header /^Received$/ /^from [^ ]*(ork.net|home.com)/e
     attachments     = header ,^Content-Type$, ,multipart/mixed, and \
			 body ,^Content-Type: application/,
     executables     = $attachments and body ,name=".*.(pif|exe|scr)"$,e

     reject "executable attachment from non-friends"
     $executables and not $friends

     Macro names must begin with a letter and may contain alphanumeric
     characters and punctuation characters.  Reserved keywords (like "reject"
     or "header") cannot be used as macro names.  Macros must be defined
     before use, the definition must precede the use in the configuration
     file, read from top to bottom.

EVALUATION
     Rules are evaluated in the order specified in the configuration file,
     from top to bottom.  When a rule matches, the corresponding action is
     taken, that is the last action specified before the matching rule.

     The plugin evaluates the rules every time a line of mail (or envelope) is
     received.	As soon as a rule matches, the action is taken immediately,
     possibly before the entire mail is received, even if further lines might
     possibly make other rules match, too.  This means the first rule matching
     chronologically has precedence.

     If evaluation for a line of mail makes two (or more) rules match, the
     rule that comes first in the configuration file has precedence.

     Boolean expressions are short-circuit evaluated, that means "a or b"
     becomes true as soon as one of the terms is true and "a and b" becomes
     false as soon as one of the terms is false, even if the other term is not
     known, possibly because the relevant mail line has not been received yet.

EXAMPLES
     # /etc/milter-regex.conf example

     # Accept anything encrypted, just to demonstrate sendmail macros
     accept
     macro /tls_version/ /TLSv/

     tempfail "Sender IP address not resolving"
     connect /\[.*\]/ //

     reject "Malformed HELO (not a domain, no dot)"
     helo /\./n

     reject "Malformed RCPT TO (not an email address, not <.*@.*>)"
     envrcpt /<(.*@.*|Postmaster)>/ein

     reject "HTML mail not accepted"
     # use comma as delimiter here, as / occurs within RE
     header /^Content-type$/i ,^text/html,i
     body ,^Content-type: text/html,i

     # Swen worm
     discard
     header /^(TO|FROM|SUBJECT)$/e //
     header /^Content-type$/i /boundary="Boundary_(ID_/i
     header /^Content-type$/i /boundary="[a-z]*"/
     body ,^Content-type: audio/x-wav; name="[a-z]*\.[a-z]*",i

     # Some nasty spammer
     reject "Business Corp spam, get lost"
     body /^Business Corp. for W.& L. AG/i and \
	     ( body /043.*317.*0285/ or body /0041.43.317.02.85/ )


LOGGING
     milter-regex sends log messages to syslogd(8) using facility daemon and,
     with increasing verbosity, level err, notice, info and debug.  The
     following syslog.conf(5) section can be used to log messages to a
     dedicated file:

     !milter-regex
     daemon.err;daemon.notice	     /var/log/milter-regex

GRAMMAR
     Syntax for milter-regex in BNF:

     file	     = ( rule | macro ) file
     rule	     = action expr-list
     action	     = "reject" msg | "tempfail" msg | "discard" |
		       "quarantine" msg | "accept"
     msg	     = ( '"' | "'" ) string ( '"' | "'" )
     expr-list	     = expr [ expr-list ]
     expr	     = term | term "and" expr | term "or" expr | "not" term
     term	     = '(' expr ')' | "connect" arg arg | "helo" arg |
		       "envfrom" arg | "envrcpt" arg | "header" arg arg |
		       "body" arg | "macro" arg arg | '$' name
     arg	     = del regex del flags
     del	     = '/' | ',' | '-' | ...
     flags	     = [ 'e' ] [ 'i' ] [ 'n' ]
     macro	     = name '=' expr

FILES
     /etc/milter-regex.conf

SEE ALSO
     mailstats(1), regex(3), syslog(3), syslog.conf(5), re_format(7),
     sendmail(8), syslogd(8)

     Simple Mail Transfer Protocol, RFC 2821.

     Enhanced Mail System Status Codes, RFC 1893.

HISTORY
     The first version of milter-regex was written in 2003.  Boolean
     expression evaluation was added in 2004.

AUTHORS
     Daniel Hartmeier <daniel@benzedrine.ch>

OpenBSD 6.1		      September 24, 2003		   OpenBSD 6.1

More examples

If you have interesting rules that work for you, you're very welcome to contribute them.

HELO with your own IP address

From Christopher Kruslicky:
tempfail "Malformed HELO (can't be me)"
helo /^62\.65\.145\.30$/
Some spammers pick your own IP address as HELO, assuming it has a better chance of getting accepted by you than a random IP address (or some potentially non-resolving hostname).

Dynamic host addresses

From Darren Henderson:
# from your examples, tempfailing non-resolving rDNS connections                  
tempfail "Sender IP address not resolving"                                        
connect /\[.*\]/ //                                                               

# reject things that look like they might come from a dynamic address             
reject "Looks like a dynamic address"                                             
connect /[0-9][0-9]*\-[0-9][0-9]*\-[0-9][0-9]*/ //                                
connect /[0-9][0-9]*\.[0-9][0-9]*\.[0-9][0-9]*/ //                                
connect /[0-9]{12}/e //                                                           
So, we reject anything that has three digit sets deperated by a dash, (ie adsl-134-11-333-11.someisp.net). We reject anything that has 3 or more numeric subdomains, (ie dialup.123.45.67.8.someisp.net). And finally reject any address that has a group of 12 digits, (ie pool123045067003.someisp.net).

Forged Outlook headers

Analyzing the spam that still gets delivered (and then promptly detected by SpamAssassin), I found that most of it uses fake Outlook headers. So let's add a rule to detect that inline (blatantly stealing rules from SpamAssassin ;).

HAS_MIMEOLE             = header /^X-MimeOLE$/ //
HAS_MSMAIL_PRI          = header /^X-MSMail-Priority$/ //
HAS_X_MAILER            = header /^X-Mailer$/ //
HAS_OUTLOOK_IN_MAILER   = header /^X-Mailer$/ /Microsoft (CDO|Outlook) /e
MISSING_OUTLOOK_NAME    = ( $HAS_MIMEOLE or $HAS_MSMAIL_PRI ) and \
                            $HAS_X_MAILER and not $HAS_OUTLOOK_IN_MAILER
OUTLOOK_MUA             = header /^X-Mailer$/ / Outlook /
OUTLOOK_MSGID_1         = header /^Message-ID$/ \
                            /^<[0-9a-f]{12}\$[0-9a-f]{8}\$[0-9a-f]{8}@>$/
OUTLOOK_MSGID_2         = header /^Message-ID$/ \
                            /^<[A-Za-z0-9-]{7}[A-Za-z0-9]{20}@hotmail\.com>$/
IMS_MSGID               = header /^Message-ID$/ \
                            /^<[A-F]{36,40}@>$/
UNUSABLE_MSGID          = header /^List-Unsubscribe$/ //
FORGED_MUA_OUTLOOK      = $OUTLOOK_MUA and not ( $UNUSABLE_MSGID or \
                            $OUTLOOK_MSGID_1 or $OUTLOOK_MSGID_2 )
MSGID_OE_SPAM_4ZERO     = header /^Message-ID$/ \
                            /<[a-f0-9]{12}\$[a-f0-9]{8}\$0000[a-f0-9]{4}@/

reject "Forged Outlook headers"
$MISSING_OUTLOOK_NAME or $FORGED_MUA_OUTLOOK or $MSGID_OE_SPAM_4ZERO
Some performance benchmarks would be interesting here, I'm quite sure these rules evaluate much cheaper inline in milter-regex than in SpamAssassin (Perl) after accepting delivery, or a milter plugin using spamc. If you measure how many mails per second max either of these can handle on a specific machine, please let me know.

Sources

Makefiles for GNU/Linux and Solaris are included, but might need some tweaking. If they don't work for you, please try to fix them and send me corrections. Some patches to build under Linux (not supported by me).

History

3.0: April 23rd, 2022

Takao Abe added GeoIP filtering criteria, you can find his version on github.com/milter-regex.

2.7: December 12th, 2019

Add -t option to test the configuration file and exit with a status, suggested by Ralph Seichter.

2.6: April 26th, 2019

Treat socket file name without prefix like local file, from Takao Abe. Make pid file writable only by root, from Ralph Seichter.

2.5: April 18th, 2019

Add -r option to write pid file. Based on FreeBSD port patches.

2.4: March 2nd, 2019

Add -f option to set syslog facility. Patch from Takao Abe.

2.3: January 28th, 2019

Bug fix: for actions followed by multiple expressions (not just one arbitrarily complex expression), when multiple expressions become defined during the same sequence point, but with different values (e.g. one true, another false), depending on the expression order, the action might not be taken, when it should be.

This affects all prior versions since 1.0. As a workaround, use only a single expression per action (duplicating action lines where needed), or combine multiple expressions to a single expression per action using 'or'.

Report and testing by JCA.

2.2: September 25, 2018

Add -U, -G, and -P options to set pipe user, group, and permissions. Suggested and tested by Ralph Seichter.

2.1: September 26, 2017

Default maximum log level to 6 (LOG_INFO), i.e. exclude LOG_DEBUG.

2.0: November 25, 2013

Add -l option to specify maximum log level.

1.9: November 21, 2011

Add -j option to chroot. Improve building on various platforms. Fix some typos in documentation and example config.

1.8: August 12, 2010

Log symbolic host name together with numeric IP address.

1.7: August 4, 2007

Support filtering sendmail macros, like {auth_type}.

1.6: June 6, 2005

Support sendmail quarantine action. Requires non-ancient sendmail (>= 8.13) and libmilter, as shipping with recent *BSD releases by default.
More fixes for the state machine, dealing with multi-message connections.

1.5: March 19, 2004

Fix logic errors in dealing with multi-message connections (SMTP RSET, HELO or MAIL FROM resetting SMTP state). Add cb_abort callback.

1.4: March 13, 2004

Some performance improvements, abort rule evaluation immediately when no further rules can possibly match. Compile without -Werror, as some ports generate warnings.

1.3: March 8, 2004

Two bugfixes related to RCPT TO: rule evaluation (DSN options and multiple receipients would match incorrectly), umask(0177) for pipe, fix for Solaris daemon() implementation. Improved logging (From:, To: and Subject: headers, when available).

1.2: February 27, 2004

Some logging improvements and small fixes. Adds Makefiles for GNU/Linux and Solaris. Thanks to everyone who helped me solve the build problems.

1.1: February 25, 2004

Support macro definition/expansion.

1.0: February 24, 2004

Now supports boolean expressions, so multiple regular expressions can be combined using and, or, not and parentheses.

Note that the new parser now requires quotes around reject/tempfail messages. If you get syntax errors in your existing configuration file, lacking quotes are a likely cause. Otherwise rulesets are backwards compatible with pre-1.0 versions.

0.1: September 24, 2003

First version.

Related links

Last updated on Sat Apr 23 18:46:10 2022 by daniel@benzedrine.ch.