This morning, I set out to improve the performance of "mailman import21" command. If you have used it in the past, you will know that it is slow. Until now, I never had an idea about why? Here were my ideas:

  • Too many database calls and sqlite3 being the usual self

    Although, I forgot that it is slow irrespective of the database
    backend. Maybe we are doing way too many queries?

  • Too many string comparisons

    We all know string comparisons are slow, but how slow could they be?

  • Something wasteful being done over and over again.

Here is a rough estimate of the time it takes to import mailman2.1's config.pck for two lists:

    151 members: 58 seconds
    1429 members: 9 minutes

This is quote slow, 9 minutes is a lot. So, I set out to do the usual python profiling using the standard library cProfile module and only wrapped it around mailman.utilities.importers._import_roster. That method is the slowest one since if you have run the the command, you know it takes the maximum amount of time importing the list of members.

Without even looking at the entire output, the problem was apparent and none of the ones that I guessed before:

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.009    0.009   50.692   50.692 /home/maxking/Documents/mm3/core/src/mailman/utilities/importer.py:600(_import_roster)
      151    0.001    0.000   45.691    0.303 /home/maxking/Documents/mm3/core/src/mailman/utilities/passwords.py:35(encrypt)

90% of the time is spent trying to encrypt user passwords, for each of the imported member. Well, duh, encryption is an expensive operation and when you do that once per-imported member, it is definitely going to be slow.

Mailman 3 uses passlib for crypto and so I set out to figure out if there is a hashing algorithm which can do this much faster and perhaps has a C library wrapper that we can use to speed things up. I settled on argon2 cipher with a supporting library argon2_cffi. Then I changed the config and tried the imports again:

  151 members: 15.884 seconds
  1429 memebrs: 2minutes 29 seconds

That was a significant improvement over the previous numbers.

Although, another interesting fact is the user passwords are kind of useless in Mailman 3. In Mailman 2 you had to setup a password or one was auto-generated for you per-list and you needed that to login to the web ui. However, in Mailman 3, the passwords (in Core's database) aren't used for logging in since Web Frontend stores the authentication tokens (social auth or passwords). In fact, the users who sign up first time on Mailman 3 probably don't ever have a password set in Mailman Core's database.

So, I commented out the code that actually imports the password(src/mailman/utilities/importer.py#L663-664) and the import speed improved even more, obviously:

  151 members: 4 seconds
  1429 members: 57 seconds

I am hoping that I can commit the change with the commented out code, unless I am reminded of a use for the passwords in Core's database. Then, it might be a bit more of work trying to figure out another way to improve the speed.

Thanks for reading up!