"Speed" in Async Web Frameworks
Over the weekend of 2020-06-06/07, Cal Patterson's post "Async Python is not faster" popped on my radar across multiple link aggregators. I agree - async python is not "faster," but I do have some thoughts of my own on the subject.
To begin I have a few caveats:
a) I am not an expert developer
b) I am not an expert on async
c) I am involved with the Sanic project, a python async web framework
With those out of the way, I'd like to talk a bit about the value of non-blocking io in a very simple way.
Consider this (and yes, it too is slightly unrealistic):
You are hungry. You decide you want to eat five hamburgers to satiate your hunger. To get those hamburgers, you go to the nearest quick-serve restaurant (McSynchronous's, or Micky Sync's) and order them.
This restaurant has one cook and that cook has only the capacity to make one burger at a time. It is not a great set up, but it does the job. It takes some time to cook each burger patty and assemble the burger, but in order to make it easy on you, the restaurant will give you each burger as soon as it is ready. Assuming each burger takes two minutes to make and assemble, after ten minutes you will have all of your burgers.
This looks like the following:
(1) order
(2) start burger
(3) prep burger (two minutes)
(4) deliver burger
(5) repeat steps 2 through 4 four times
Total time: ten minutes.
Down the street is another joint (this one is called Async & Await Burger) that also has one cook, but this cook has the capability prep multiple burgers at the same time. For this cook, it also takes two minutes to prep each burger, but because the cook has the advantage of being able to prep multiple burgers together, they can be delivered faster. This is done because the cook does not have to focus on each burger one at a time, but instead can start a burger and then start the next burger, returning to each burger as it is finished.
This looks like the following:
(1) order
(2) start burgers 1-5
(3) prep burgers 1-5 (two minutes each, simultaneously)
(4) deliver burgers
Total time: two minutes.
Finally let us add a restriction to our A&A Burger scenario: there is only capacity to prep one burger at a time. In this scenario, despite the cook being able to handle multiple burgers, it is not possible, and therefore it will take the cook ten minutes, just like at Micky Sync's.
Imagine, however, that you want just one burger. A&A and Micky Sync's both take the same amount of time to prepare just one burger, so there is no speed advantage at all.
All of this is the same for synchronous and asynchronous web frameworks and their "speed" in general. The ability for one process to handle multiple simultaneous requests is the advantage of asynchronous operations. If you place a synchronous or blocking request (our capacity to prep only one burger at a time, remember) the advantage is removed.
I have selected two highly regarded python frameworks to demonstrate this: falcon, served by gunicorn, and starlette, served by uvicorn. The code is as on parity as I can make them for synchronous and asynchronous as possible:
falcon:
import falcon
import time
class DefaultResource:
def on_get(self, req, resp):
"""Handles GET requests"""
time.sleep(5)
resp.media = "200 OK"
falconapi = falcon.API()
falconapi.add_route('/', DefaultResource())
to serve:gunicorn app:falconapi
starlette:
from starlette.applications import Starlette
from starlette.responses import PlainTextResponse
from starlette.routing import Route
import asyncio
async def default(request):
await asyncio.sleep(5)
return PlainTextResponse("200 OK")
starletteapi = Starlette(debug=True, routes=[
Route('/', default),
])
to serve:uvicorn app:starletteapi
To illustrate this, I used a simple (and old) tool - ab, making 5 requests at the same time as follows:ab -n 5 -c 5 http://localhost:8000/
Results for falcon:
This is ApacheBench, Version 2.3 <$Revision: 1874286 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking localhost (be patient).....done
Server Software: gunicorn/20.0.4
Server Hostname: localhost
Server Port: 8000
Document Path: /
Document Length: 8 bytes
Concurrency Level: 5
Time taken for tests: 25.031 seconds
Complete requests: 5
Failed requests: 0
Total transferred: 795 bytes
HTML transferred: 40 bytes
Requests per second: 0.20 [#/sec] (mean)
Time per request: 25031.488 [ms] (mean)
Time per request: 5006.298 [ms] (mean, across all concurrent requests)
Transfer rate: 0.03 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.3 1 1
Processing: 5006 11014 6527.0 12515 20025
Waiting: 5005 11013 6527.2 12515 20024
Total: 5007 11014 6526.9 12516 20025
ERROR: The median and mean for the initial connection time are more than twice the standard
deviation apart. These results are NOT reliable.
Percentage of the requests served within a certain time (ms)
50% 10013
66% 15018
75% 15018
80% 20025
90% 20025
95% 20025
98% 20025
99% 20025
100% 20025 (longest request)
Results for starlette:
This is ApacheBench, Version 2.3 <$Revision: 1874286 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking localhost (be patient).....done
Server Software: uvicorn
Server Hostname: localhost
Server Port: 8000
Document Path: /
Document Length: 6 bytes
Concurrency Level: 5
Time taken for tests: 10.007 seconds
Complete requests: 5
Failed requests: 0
Total transferred: 695 bytes
HTML transferred: 30 bytes
Requests per second: 0.50 [#/sec] (mean)
Time per request: 10007.417 [ms] (mean)
Time per request: 2001.483 [ms] (mean, across all concurrent requests)
Transfer rate: 0.07 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 1 0.5 1 1
Processing: 5003 5003 0.2 5003 5004
Waiting: 5002 5003 0.5 5003 5003
Total: 5003 5004 0.6 5004 5005
Percentage of the requests served within a certain time (ms)
50% 5004
66% 5005
75% 5005
80% 5005
90% 5005
95% 5005
98% 5005
99% 5005
100% 5005 (longest request)
Finally, starlette with a blocking call (time.sleep(5)) instead of await asyncio.sleep(5):
This is ApacheBench, Version 2.3 <$Revision: 1874286 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking localhost (be patient).....done
Server Software: uvicorn
Server Hostname: localhost
Server Port: 8000
Document Path: /
Document Length: 6 bytes
Concurrency Level: 5
Time taken for tests: 25.036 seconds
Complete requests: 5
Failed requests: 0
Total transferred: 695 bytes
HTML transferred: 30 bytes
Requests per second: 0.20 [#/sec] (mean)
Time per request: 25035.827 [ms] (mean)
Time per request: 5007.165 [ms] (mean, across all concurrent requests)
Transfer rate: 0.03 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 1 0.6 1 2
Processing: 5009 17023 6716.0 20026 20026
Waiting: 5007 17022 6716.5 20026 20026
Total: 5009 17024 6716.6 20028 20028
Percentage of the requests served within a certain time (ms)
50% 20027
66% 20028
75% 20028
80% 20028
90% 20028
95% 20028
98% 20028
99% 20028
100% 20028 (longest request)
As you can see, this is almost exactly the same as falcon.
With web APIs, things are rarely this simple, and there are other advantages to be found with the level of maturity of the ecosystem (synchronous libraries have had longer time to get things right) and one should never overlook ease of use to make sure things are done correctly, but the general idea is as follows:
Much of the "speed" of async IO comes from being able to do additional work while the other work is in progress, but does not need attention. In the simple tests above, this is serving a basic GET request, but it applies everywhere.
Keep in mind also that a single synchronous (blocking) task implemented under an otherwise asynchronous model can eliminate any performance gains.