Parsing Email Address Lists

by Andy Prevost

Monday December 4 2023

Much of my programming involves dealing with email transportation from mailing lists or lists generated by clients from databases.

That presents a problem. The lists are unusually unstructured meaning that each email address could be in almost any format – including strings and arrays.

Here's an example:

$mailing_list = array(
  'john.doe@domainone.com',
  'johan.doer@domaintwo.com',
  array('Johann Sebastian Bach' => 'johann.bach@composer.com'),
  array('w.a.mozart@rockingpiano.com' => 'Wolfgang Amadeus Mozart'),
  'Ludwig van Beethoven <ludwig@vanbeethoven.com>',
  'I. P. Knightly',
);

Note the first two email addresses are strings, the second two are arrays (the arrays are not similar, the second array item is malformed). The fifth is RFC formatted. And the sixth is not an email at all and would normally generate an error.

There is no in-built PHP function to parse these emails into a "normalized" structure. By "normalized", what I mean is a structure that is intermediate and can be further processed into a valid and structured email list. 

I decided to write one. 

There is an in-built PHP function that has a pattern. That function is imap_rfc822_write_address(). Although the name of the function contains rfc822, the result is rfc2822 compliant. The parameters for that function are: imap_rfc822_write_address(string $mailbox, string $hostname, string $personal): string|false. Using the very first array item ('john.doe@domainone.com') as an example, the $mailbox would be 'john.doe', the $hostname would be 'domainone.com', and $personal would be empty. Using 'array('Johann Sebastian Bach' => 'johann.bach@composer.com')' as an example, $mailbox would be 'johann.bach', $hostname would be 'composer.com', and $personal would be 'Sebastian Bach'.

The aim, then, is to parse the mailing list into an array containing those same parameters, plus one extra that would make life a bit easier while developing. That resulting array would look like (using the second example):


[0] => Array
        (
            [email] => johann.bach@composer.com
            [mailbox] => johann.bach
            [hostname] => composer.com
            [personal] => Johann Sebastian Bach
        )

I did end up changing two of the keys, I don't recall why. 'hostname' is now 'host' and 'personal' is now 'name', It really doesn't matter much what the keys are, they are consistent with the in-built PHP function is ever needed.

Here's the code I came up with. I have done quite a bit of testing of the code and I now have it in use, and stable.

/**
 * by Andy Prevost
 * accepts string or array of email names/addresses, any format
   returns array standardizing structure of email address components
     - full addy, mailbox name, @ host, personal name
 * @var mixed string/array
 * @return array (structured)
 */
function email_parse2Array($in) {
  $narr = [];
  if (is_string($in) && strpos($in,',') !== false) {
    $narr = array_map('trim', explode(',',$in));
  } elseif (is_string($in)) {
    $narr = [ $in ];
  }
  if (count($narr) > 0) { $in = $narr; }
  $narr = [];
  if (is_array($in)) {
    $i = 0;
    foreach ($in as $key => $val) {
      if (is_array($val)) {
        $tmp  = $val;
        $key = key($val);
        $val  = $val[$key];
      }
      if (is_numeric($key)) {
        if (strpos($val, '<')) {
          $bits = explode('<',$val);
          $val  = rtrim($bits[1],'>');
          $tmp  = $val;
          $nam  = $bits[0];
        }
        elseif (filter_var($val, FILTER_VALIDATE_EMAIL)) {
          $tmp = $val;
          $nam = '';
        }
      } else {
        $tmp = '';
        if (filter_var($key, FILTER_VALIDATE_EMAIL)) {
          $tmp = $key;
          $nam = $val;
        } elseif (filter_var($val, FILTER_VALIDATE_EMAIL)) {
          $tmp = $val;
          $nam = $key;
        }
      }
      if (!empty($tmp)) {
        $bits                = explode('@',$tmp);
        $narr[$i]['email']   = $tmp;
        $narr[$i]['mailbox'] = $bits[0];
        $narr[$i]['host']    = $bits[1];
        $narr[$i]['name']    = $nam;
      }
      unset($tmp);
      $i++;
    }
  }
  return $narr;
}

Enjoy!
Andy

◀ Previous Next ▶

Post a Comment