Effect of connection pooling on node.js performance
Node.js is popularly known for its asynchronous, non-blocking and event-driven I/O model and the scalability it can achieve in executing I/O bound operations.
However, a developer must keep in mind that there are some things which can severely limit the scalability of the Node.js application. I have earlier already covered How DNS resolution is a blocking call in node and must be handled to achieve scalability
In this post, I am going to cover connection pooling. I have observed that connection pooling is a critical engineering decision which is easily ignored while developing node.js applications. While interacting with any external resource such as MySQL, PostgreSQL, Redis or MongoDB, each of it requires a connection pool for any sizeable node.js application.
To demonstrate this, I am going to compare two programs, one without pooling and one with pooling and a simple database query.
I have this simple function which makes a call to database.
If I execute this and time it, it takes an average of 1.5 - 2 seconds
Lets, now run multiple of such queries in series first. I slightly change my program to this version:
When I execute this version where I am making 5 calls in series, I get an average of 8 - 9 seconds
Lets make this to parallel now, common sense says that the wall clock time should be much faster in case of parallel.
Now, If I time this verison, where I am making 5 calls in parallel.
I still get the almost same wall time clock. If you closely observe the time output, the bulk of the time is not even spent in our program. Bulk of the time is actually spent in waiting to get the connection back from mysql, since we are re-using the single connection again and again. Thus, there is literally no performance gain inspite of making the calls in “parallel”
To fix this, I am going to change the program to use a connection pool. I am creating a simple pool as follows:
And change our function to use the connection pool:
Again, If I time the single execution time here. I get
Not much has changed, we got 1.756
while using without connection pooling. The benefit of connection pooling is only when we run multiple requests together. So now, lets execute in series of 5 queries as done earlier.
Again, no real benefit. What is happening here is that the second query is executed only after the first query is executed. Hence, this time is perfectly fine. Its just that bulk of time is wasted.
Let’s move this to 5 parallel queries.
This time we have brought it down to 2.311 seconds
from 8.168 seconds
which was the time it took to execute without having a connection pool.