The EventLoop is the secret sauce in any NodeJS based app.
It provides the ‘magical’ async behaviour and takes away the extra pain involved in explicit thread based parallelisation. On the flip side you have to account for the resulting single threaded JavaScript engine that processes the callbacks from the EventLoop. If you don’t then the traditional style of writing ‘blocking’ code can and will trip you over!
The LIBUV has an EventLoop which loops through the queue of events and executes the JS callback function (on a single thread as at any given time).
You can have multiple event sources (Event Emitters in NodeJS land) running in LIBUV on multiple threads (e.g. doing file I/O and socket I/O at the same time) that put events in the queue. But there is always ONE thread for executing JS therefore can only ‘handle’ one of those events at a time (i.e. execute the associated JS callback function).
Keeping this in mind let us look at a few such ‘natural’ errors where the code looks fine to the untrained eye but the expected output is not produced.
1) Wave bye bye to While Loops with Flags!
A common scenario is where we have while loops controlled by a flag variable for example. If you were wanting to read from console till the user types ‘exit’ then you would write something like this using blocking functions:
[codesyntax lang=”php”]
while (command != ‘exit’ ) //Do something with the command command = reader.nextLine() end while
[/codesyntax]
It will work because the loop will always be blocked till the nextLine() method executes and gives us a valid value for the command or throws an exception.
If you try and do the same in NodeJS using the async functions you might be tempted to re-write it as below. First we register a callback function which will trigger when the enter key is hit on the console. It will accept as a parameter the full line typed on the console. We promptly put this into the global command object and finish. After setting up the callback, we start an infinite loop waiting for ‘exit’. In case the command is undefined (null) we just loop again (‘burning rubber’ so as to say).
[codesyntax lang=”php”]
var command = null //Register a callback function reader.on(‘data’, function (data) { command = data }) while (command != ‘exit’ ) if (command !=null) //Do something with the command command = null end if end while
[/codesyntax]
Unfortunately this code will never work. Any guesses what will be the output? If you guessed that it will go into an infinite loop with command always equal to ‘null’ you are correct!
The reason is very simple: JS code in NodeJS is processed by a single thread. In this case that single thread will be kept busy going through the while loop. Thus it will never get a chance to handle the console input event by executing the callback. Thus command will always stay ‘null’.
This can be fixed by removing the while loop.
[codesyntax lang=”php”]
var command = null //Register a callback function reader.on(‘data’, function (data) { command = data if(command == 'exit') process.exit() end if /* Here we can either parse the command and perform the required action OR we can emit a custom event which all the available command processors listen for but only the target command processor responds */ })
[/codesyntax]
2) Forget the For Loop (at least long running ones)
This next case is a very complex one because it is very hard to figure out whether its the for loop thats to blame. The symptoms may not show up all the time and they may not even show up in the output of your app. The symptoms can also change depending on things like the hardware configuration and configuration of database servers your code is interacting with (if any).
Let us take a simple example of inserting a fixed length array of data items into a database. In case the insert function is blocking the following code will work as expected.
[codesyntax lang=”php”]
for(var i=0; i<data.length; i++) database.insert(data[i]) end for
[/codesyntax]
In case the insert function is non-blocking (e.g. NodeJS) then we can experience all kinds of weird behaviour depending on the length of the array, such as incomplete insertions, sporadic exceptions and even instances where everything works as expected!
In case of the while loop example, the JS thread is blocked forever so no callbacks are processed. In case of for loops, the JS thread is blocked till the loop finishes running. This means in our example if we are using non-blocking insert the loop will execute rapidly without waiting for the insert to complete. Instead of blocking, the insert operation will generate an event on completion.
This is part of the reason why NodeJS applications can get a lot of work done without resorting to explicit thread management.
If the array is big enough we can end up flooding the receiver leading to buffer overflows along the way and resulting in dropped inserts. In some cases if the array is not that big the system may behave normally.
The question of how big an array can we deal with is also difficult to answer. It changes from case to case, as it depends on the hardware, the configuration of the target database (e.g. buffer sizes) and so on.
The solution involves getting rid of the long running for loop and using events and callbacks. This throttles the insert rate by making them sequential (i.e. making sure next insert is triggered only when the previous insert has completed)
[codesyntax lang=”php”]
var count = 0 //Callback function to add the next data item function insertOnce() if(count>MAX_COUNT) /* Exit process by closing any external connections (e.g. database) and clearing any timers. Ending the process by force is another option but it is not recommended */ end database.insert(data[count], function () //Called once current data has been inserted emit_event('inserted') end ) count++ end //Call insertOnce on the inserted event event_listener.on('inserted', insertOnce) //Start the insertion by doing the first insert manually. insertOnce()
[/codesyntax]
3) Are we done yet?
Blocking is not always a bad thing. It can be used to track progress because when a function returns you know it has completed its work one way or the other.
One way to achieve in NodeJS is to use some kind of a counter global variable that counts down to zero or up to a fixed value. Another way to do this is to set and clear timers in case you are not able to get a count value. This technique works well when you have to monitor the progress of a single stage of an operation (e.g. inserting data into a database as in our example above).
But what if we had multiple stages that we wanted to make sure execute in a synchronous manner. For example:
1) Load raw data into database
2) Calculate max/min values
3) Use max/min values to normalise raw data and insert into a new set of tables
There are some disadvantages with this approach:
1) Counters and timers add unwanted bulk to your code
2) Global variables are easy to override accidentally especially when using simple names like ‘count’
3) Your code begins to look like a house with permanent scaffolding around it
Furthermore once you detect that the one stage has finished, how do you proceed to the next stage?
Do you get into callback hell and just start with the next stage there and then, ending up with a single code file with all three stages nested within callbacks (Answer: No!)?
Do you try and break your stages into separate code files and use spawn/exec/fork to execute them (Answer: Yes)?
It is a rather dull answer but it makes sure you don’t have too much scaffolding in any one file.