"Speed" in Async Web Frameworks

5 min read

"Speed" in Async Web Frameworks

Over the weekend of 2020-06-06/07, Cal Patterson's post "Async Python is not faster" popped on my radar across multiple link aggregators. I agree - async python is not "faster," but I do have some thoughts of my own on the subject.

To begin I have a few caveats:
a) I am not an expert developer
b) I am not an expert on async
c) I am involved with the Sanic project, a python async web framework

With those out of the way, I'd like to talk a bit about the value of non-blocking io in a very simple way.

Consider this (and yes, it too is slightly unrealistic):
You are hungry. You decide you want to eat five hamburgers to satiate your hunger. To get those hamburgers, you go to the nearest quick-serve restaurant (McSynchronous's, or Micky Sync's) and order them.

This restaurant has one cook and that cook has only the capacity to make one burger at a time. It is not a great set up, but it does the job. It takes some time to cook each burger patty and assemble the burger, but in order to make it easy on you, the restaurant will give you each burger as soon as it is ready. Assuming each burger takes two minutes to make and assemble, after ten minutes you will have all of your burgers.

This looks like the following:

(1) order
(2) start burger 
(3) prep burger (two minutes)
(4) deliver burger 
(5) repeat steps 2 through 4 four times

Total time: ten minutes.

Down the street is another joint (this one is called Async & Await Burger) that also has one cook, but this cook has the capability prep multiple burgers at the same time. For this cook, it also takes two minutes to prep each burger, but because the cook has the advantage of being able to prep multiple burgers together, they can be delivered faster. This is done because the cook does not have to focus on each burger one at a time, but instead can start a burger and then start the next burger, returning to each burger as it is finished.

This looks like the following:

(1) order
(2) start burgers 1-5
(3) prep burgers 1-5 (two minutes each, simultaneously)
(4) deliver burgers

Total time: two minutes.

Finally let us add a restriction to our A&A Burger scenario: there is only capacity to prep one burger at a time. In this scenario, despite the cook being able to handle multiple burgers, it is not possible, and therefore it will take the cook ten minutes, just like at Micky Sync's.

Imagine, however, that you want just one burger. A&A and Micky Sync's both take the same amount of time to prepare just one burger, so there is no speed advantage at all.

All of this is the same for synchronous and asynchronous web frameworks and their "speed" in general. The ability for one process to handle multiple simultaneous requests is the advantage of asynchronous operations. If you place a synchronous or blocking request (our capacity to prep only one burger at a time, remember) the advantage is removed.

I have selected two highly regarded python frameworks to demonstrate this: falcon, served by gunicorn, and starlette, served by uvicorn. The code is as on parity as I can make them for synchronous and asynchronous as possible:

falcon:

import falcon
import time

class DefaultResource:
    def on_get(self, req, resp):
        """Handles GET requests"""
        time.sleep(5)
        resp.media = "200 OK"

falconapi = falcon.API()
falconapi.add_route('/', DefaultResource())

to serve:
gunicorn app:falconapi

starlette:

from starlette.applications import Starlette
from starlette.responses import PlainTextResponse
from starlette.routing import Route
import asyncio

async def default(request):
    await asyncio.sleep(5)
    return PlainTextResponse("200 OK")


starletteapi = Starlette(debug=True, routes=[
    Route('/', default),
])

to serve:
uvicorn app:starletteapi

To illustrate this, I used a simple (and old) tool - ab, making 5 requests at the same time as follows:
ab -n 5 -c 5 http://localhost:8000/

Results for falcon:

This is ApacheBench, Version 2.3 <$Revision: 1874286 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking localhost (be patient).....done


Server Software:        gunicorn/20.0.4
Server Hostname:        localhost
Server Port:            8000

Document Path:          /
Document Length:        8 bytes

Concurrency Level:      5
Time taken for tests:   25.031 seconds
Complete requests:      5
Failed requests:        0
Total transferred:      795 bytes
HTML transferred:       40 bytes
Requests per second:    0.20 [#/sec] (mean)
Time per request:       25031.488 [ms] (mean)
Time per request:       5006.298 [ms] (mean, across all concurrent requests)
Transfer rate:          0.03 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.3      1       1
Processing:  5006 11014 6527.0  12515   20025
Waiting:     5005 11013 6527.2  12515   20024
Total:       5007 11014 6526.9  12516   20025
ERROR: The median and mean for the initial connection time are more than twice the standard
       deviation apart. These results are NOT reliable.

Percentage of the requests served within a certain time (ms)
  50%  10013
  66%  15018
  75%  15018
  80%  20025
  90%  20025
  95%  20025
  98%  20025
  99%  20025
 100%  20025 (longest request)

Results for starlette:

This is ApacheBench, Version 2.3 <$Revision: 1874286 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking localhost (be patient).....done


Server Software:        uvicorn
Server Hostname:        localhost
Server Port:            8000

Document Path:          /
Document Length:        6 bytes

Concurrency Level:      5
Time taken for tests:   10.007 seconds
Complete requests:      5
Failed requests:        0
Total transferred:      695 bytes
HTML transferred:       30 bytes
Requests per second:    0.50 [#/sec] (mean)
Time per request:       10007.417 [ms] (mean)
Time per request:       2001.483 [ms] (mean, across all concurrent requests)
Transfer rate:          0.07 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    1   0.5      1       1
Processing:  5003 5003   0.2   5003    5004
Waiting:     5002 5003   0.5   5003    5003
Total:       5003 5004   0.6   5004    5005

Percentage of the requests served within a certain time (ms)
  50%   5004
  66%   5005
  75%   5005
  80%   5005
  90%   5005
  95%   5005
  98%   5005
  99%   5005
 100%   5005 (longest request)

Finally, starlette with a blocking call (time.sleep(5)) instead of await asyncio.sleep(5):

This is ApacheBench, Version 2.3 <$Revision: 1874286 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking localhost (be patient).....done


Server Software:        uvicorn
Server Hostname:        localhost
Server Port:            8000

Document Path:          /
Document Length:        6 bytes

Concurrency Level:      5
Time taken for tests:   25.036 seconds
Complete requests:      5
Failed requests:        0
Total transferred:      695 bytes
HTML transferred:       30 bytes
Requests per second:    0.20 [#/sec] (mean)
Time per request:       25035.827 [ms] (mean)
Time per request:       5007.165 [ms] (mean, across all concurrent requests)
Transfer rate:          0.03 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    1   0.6      1       2
Processing:  5009 17023 6716.0  20026   20026
Waiting:     5007 17022 6716.5  20026   20026
Total:       5009 17024 6716.6  20028   20028

Percentage of the requests served within a certain time (ms)
  50%  20027
  66%  20028
  75%  20028
  80%  20028
  90%  20028
  95%  20028
  98%  20028
  99%  20028
 100%  20028 (longest request)

As you can see, this is almost exactly the same as falcon.

With web APIs, things are rarely this simple, and there are other advantages to be found with the level of maturity of the ecosystem (synchronous libraries have had longer time to get things right) and one should never overlook ease of use to make sure things are done correctly, but the general idea is as follows:

Much of the "speed" of async IO comes from being able to do additional work while the other work is in progress, but does not need attention. In the simple tests above, this is serving a basic GET request, but it applies everywhere.

Keep in mind also that a single synchronous (blocking) task implemented under an otherwise asynchronous model can eliminate any performance gains.

Image of Stephen Sadowski

Stephen Sadowski

Leader focusing on quality, delivery, technical debt management, and leadership education about DevOps and SRE practices