Mail agents in the PostgreSQL community

A few weeks back, I noticed the following tweet from Michael Paquier:

tweet

And my first thought was "that can't be right" (spoiler: Turns out it wasn't. But almost.)

The second thought was "hmm, I wonder how that has actually changed over time". And of course, with today being a day off and generally "slow pace" (ahem), what better way than to analyze the data that we have. The PostgreSQL mailinglist archives are all stored in a PostgreSQL database of course, so running the analytics is a quick job.

Of course, actually figuring out which MUA is in use by looking at the emails is not trivial. At the first step, one can look at the User-Agent header. Except some MUAs call it X-Mailer. Once you have that, it has to be normalized (just the number of different ways for example Thunderbird has used to indicate it's version throughout the years is large -- and then we have to consider things like IceDove that's actually the same). And finally, there are user agents that simply don't write a header for it, but can be identified in other ways -- such as GMail today, and Pine back in the days.

Given that we already parse our archives with a Python based system, I knew that the standard library modules there did a decent job for our archives, so I ended up writing a pl/python function to return the (approximated and normalized) user agent, and then ran it across approximately 1.1 million emails in the archives.

It's not a direct comparison to whatever Michael looked at, as it looks at all the mailinglists, but the distribution is not a million miles off. And I was mainly interested in looking at things over time. The end result looks like this for the top 10 MUAs (by percent) since 1997:

MUA usage

Unfortunately the graph quickly becomes hard to read (there are only so many colors), but here are a few highlights:

  • GMail (blue) is by far the most popular MUA. In 2016 it represented 42% of all emails sent, vs 16.5% for the runner up, Thunderbird (dark orange). GMail has been on an almost continuously rising curve since 2004.
  • Back in 1997, plain Mozilla was 31% of our email (brighter orange). It rapidly declined and was more or less replaced by Thunderbird around 2004, but Thunderbird never reached the same level of popularity.
  • Our git commit messages (by MUA, so this is combined for all PostgreSQL projects, not just the core one) was over 9% of the total number of messages in 2016 (the red line that's holding steady in the last years).
  • I failed to figure out from headers which MUA Tom Lane uses, but his emails alone is 7.5% of the messages (bright green) which is enough to skew the stats (so I gave him his own color). From 2008 to 2012, he personally was the third largest MUA.
  • Mutt (combined, dark red line) reached it's top around 2010. It has fallen since, but is consistently holding a position in the top 5.
  • Combined Microsoft platforms (Outlook, Exchange, Hotmail, Office365 etc - purple line) was popular from about 1999-2006, after which it's fallen below 5%. This pretty much coincides with the rise of GMail.
  • Evolution made a sudden appearance in 2002 has mostly dropped off completely since 2013.
  • Apple mail (including iPhone and iPad mail) peaked in 2011 with just over 3%, and is falling below 2% most years. We know there are a lot of Apple users in the community, but clearly they use other MUAs.
  • Yahoo has clearly never been very big in the PostgreSQL community -- it only made it into the top 10 in 2013 and 2014 (after which it disappeared again, most likely because of their mailinglist unfriendly setup made the two active community contributors using it migrated away)
  • In 1997, Pine was 40% of our email, and the most popular MUA by far. By 2005, it was almost completely gone.
  • Somebody who sent a lot of email must've used ELM in 1998-1999. It went from close to zero, to being the third biggest MUA for two years, and then drop completely off the radar again. Either that, or the headers it used changed enough that it could not longer be parsed.
  • Parsing has become slightly easier, but I also spent more time on recent years. For that reason, in 2000 a full 16% of the emails were from "Unknown MUA", down to about 2.5% in 2011 and completely gone from top 10 since then.
  • Our own sites generate about 1% of the mails (bug reports, commitfest status updates, apt repository updates etc). This rate is of course higher on some of the private mailinglists that are used specifically for that, but they are not in the archives and thus not part of this analysis.

So what can we conclude from that? Not too much, I guess, beyond that I made another graph.

Happy New Year to everybody in the community!


Comments

Fun graph. Thanks.

Posted on Jan 9, 2017 at 07:49 by Noah Misch.

Conferences

I speak at and organize conferences around Open Source in general and PostgreSQL in particular.

Upcoming

PGConf.Asia
Dec 4-6, 2017
Tokyo, Japan
FOSDEM PGDay 2018
Feb 2-4, 2018
Brussels, Belgium
ConFoo 2018
Mar 7-9, 2018
Montreal, Canada
Nordic PGDay 2018
Mar 13, 2018
Oslo, Norway
PGDay.paris 2018
Mar 15, 2018
Paris, France

Past

2Q PGconf
Nov 6-7, 2017
New York, USA
PGConf.EU 2017
Oct 24-27, 2017
Warsaw, Poland
Inagural Oslo PUG meetup
Sep 12, 2017
Oslo, Norway
Postgres Open 2017
Sep 6-8, 2017
San Francisco, USA
PGDay.RU
Jul 5-7, 2017
St Petersburg, Russia
More past conferences