The Less-Wrong IRC Regex

First off, regex alone can’t split arbitrary IRC messages into seperate parts ready to be consumed and evaluted by other code. It simply cannot do that with the restrictions placed upon it (particularly, not allowing recursive matches), despite most other IRC message-splitting regexes trying to do just that. You will need to have other code post-process the matches this produces – however, it does work!


Here is the regex to match arbitrary IRC messages, first raw and then highlighted:

^(?:@([^\r\n ]*) +|())(?::([^\r\n ]+) +|())([^\r\n ]+)(?: +([^:\r\n ]+[^\r\n ]*(?: +[^:\r\n ]+[^\r\n ]*)*)|())?(?: +:([^\r\n]*)| +())?[\r\n]*$

^(?:@([^\r\n ]*) +|())(?::([^\r\n ]+) +|())([^\r\n ]+)(?: +([^:\r\n ]+[^\r\n ]*(?: +[^:\r\n ]+[^\r\n ]*)*)|())?(?: +:([^\r\n]*)| +())?[\r\n]*$

Here is an online example of this regex matching IRC messages, and here is the raw regex and our sample IRC messages in a gist.


This regex splits messages into groups differently than most other IRC regexes, and the match it produces does need to be processed before being used. Below, I explain the point and use of each matched group so you can use this effectively.

You need to split messages on newlines before you pass them to this regex to be evaluated. Keep in mind that empty messages should be skipped as per the RFCs.


Explained Regex

This is the regex split up into parts to be explained. Newlines and comments are added, and sections are indented

# start of the line
^

# tags group
(?:
    # match the block of tags as specified in the message-tags spec
    @([^\r\n ]*)
    # strip trailing space characters here, because of how source and
    # verb matching works
     +

    # if no tags, produce an empty group
    |()
)

# source group
(?:
    # source starts with a colon and then a string
    :([^\r\n ]+)
    # strip trailing space characters here, because of how verb
    # matching works
     +

    # if no source, produce an empty group
    |()
)

# verb group
(
    # match the first string after the source group
    [^\r\n ]+
)

# regular params group
(?:
    # match block of spaces here instead of inside the group
    # because this makes the output group cleaner
     +
    # match the list of params
    (
        # ensure this isn't a trailing by not matching colon
        # for first character of parameter
        [^:\r\n ]+[^\r\n ]*
        
        # match subsequent parameters
        (?:
            # match spaces separating parameters
             +
            # match parameter value, same as above
            [^:\r\n ]+[^\r\n ]*
        )*
    )

    # if no regular params, produce an empty group
    |()
)? # optional match

# trailing params group
(?:
    # match block of spaces
     +
    # colon marks the start of a trailing parameter
    :
    # match trailing param value, anything other than new message marker
    ([^\r\n]*)

    # if server has spaces at end of param list without a
    # trailing colon, match it as a trailing parameter anyway.
    # this goes AGAINST THE RFC, but it has been seen in the wild
    # and makes sense to do
    | +()

    # explicitly do not produce an empty group here if no trailing
    # parameter exists
)? # optional match

# match trailing \r and \n characters
[\r\n]*

# end of the line
$

Matched Groups

Here are the matched groups for each message:

  1. IRCv3 Message Tags
  2. Source
  3. Verb
  4. Parameters
  5. Trailing Parameter (optional)

1. IRCv3 Message Tags

This group matches the IRCv3 Message Tags, if any, that are sent with the message. These tags must be split and evaluated by the parser after receiving them, as specified in the message-tags specification.

2. Source

This group matches the source of the message. e.g., this may be example.com, dan!~d@localhost, etc.

3. Verb

This group matches the verb of the message. e.g., PRIVMSG, 005, MODE, etc.

4. Parameters

This group matches all the regular parameters of the message. The parser MUST split these parameters itself, as regex alone cannot split these properly.

The blob of regular parameters is captured as-is from the server. These parameters are split by one (or more) space characters to receive the actual parameters. If this is an empty string, there are no regular parameters and this group should be skipped.

For example, here is a list of messages and the contents of this group for them, as well as the actual split list of parameters:

Example 1

   msg: ":coolguy EX #example 32 abcd"
 group: "#example 32 abcd"
params: ["#example", "32", "abcd"]

Example 2

   msg: ":coolguy EX #exam   43 acdc"
 group: "#exam   43 acdc"
params: ["#exam", "43", "acdc"]

Example 3

   msg: ":coolguy PRIVMSG #exam :Hiya!"
 group: ""
params (regular params only): []

5. Trailing Parameter (optional)

This group is special in that it may not always exist. If it does exist, it means that the server did have a trailing parameter for this message. If it does not exist, there was no trailing parameter.

If this group exists, it MUST be appended to the parameter list created by evaluating the previous group. It is treated EXACTLY THE SAME as a regular parameter. The reason for this is that these two messages are identical:

  • :example.com PING 12345
  • :example.com PING :12345

If you treat the trailing parameter differently than you treat regular parameters, things will break randomly as client and server software can output the last parameter of any message as either a regular or a trailing param.

Here are a few examples of trailing parameters, and the final created param lists:

Example 1

   msg: ":coolguy EX #example 32 abcd :Hey dude!"
 group: "Hey dude!"
params: ["#example", "32", "abcd", "Hey dude!"]

Example 2

   msg: ":coolguy EX #exam   43 acdc :  "
 group: "  "
params: ["#exam", "43", "acdc", "  "]

Example 3

   msg: ":coolguy EX absd :"
 group: ""
params: ["absd", ""]

Example 4

   msg: ":coolguy EX #exam   43 acdc"
 group: <does not exist>
params: ["#exam", "43", "acdc"]

Final Notes

Remember to combine the regular and trailing parameters, and to skip the regular params group entirely if it is an empty string. Remember to post-process the tags section as per the message-tags spec.

Here is an online example of this regex matching IRC messages, and here is the raw regex and our sample IRC messages in a gist.

If you have any questions about this, feel free to contact me!