Understanding the NodeJS EventLoop

The EventLoop is the secret sauce in any NodeJS based app.

It provides the ‘magical’ async behaviour and takes away the extra pain involved in explicit thread based parallelisation. On the flip side you have to account for the resulting single threaded JavaScript engine that processes the callbacks from the EventLoop. If you don’t then the traditional style of writing ‘blocking’ code can and will trip you over!

The LIBUV has an EventLoop which loops through the queue of events and executes the JS callback function (on a single thread as at any given time).

You can have multiple event sources (Event Emitters in NodeJS land) running in LIBUV on multiple threads (e.g. doing file I/O and socket I/O at the same time) that put events in the queue. But there is always ONE thread for executing JS therefore can only ‘handle’ one of those events at a time (i.e. execute the associated JS callback function).

Keeping this in mind let us look at a few such ‘natural’ errors where the code looks fine to the untrained eye but the expected output is not produced.

1) Wave bye bye to While Loops with Flags!

A common scenario is where we have while loops controlled by a flag variable for example. If you were wanting to read from console till the user types ‘exit’ then you would write something like this using blocking functions:

[codesyntax lang=”php”]

while (command != ‘exit’ ) 

//Do something with the command

command = reader.nextLine()

end while

[/codesyntax]

It will work because the loop will always be blocked till the nextLine() method executes and gives us a valid value for the command or throws an exception.

If you try and do the same in NodeJS using the async functions you might be tempted to re-write it as below. First we register a callback function which will trigger when the enter key is hit on the console. It will accept as a parameter the full line typed on the console. We promptly put this into the global command object and finish. After setting up the callback, we start an infinite loop waiting for ‘exit’. In case the command is undefined (null) we just loop again (‘burning rubber’ so as to say).

[codesyntax lang=”php”]

var command = null

//Register a callback function 

reader.on(‘data’, function (data) { command = data })

while (command != ‘exit’ ) 

if (command !=null)

//Do something with the command

command = null

end if

end while

[/codesyntax]

Unfortunately this code will never work. Any guesses what will be the output? If you guessed that it will go into an infinite loop with command always equal to ‘null’ you are correct!

The reason is very simple: JS code in NodeJS is processed by a single thread. In this case that single thread will be kept busy going through the while loop. Thus it will never get a chance to handle the console input event by executing the callback. Thus command will always stay ‘null’.

This can be fixed by removing the while loop.

[codesyntax lang=”php”]

var command = null

//Register a callback function 

reader.on(‘data’, function (data) 
	{ 
		command = data 
		if(command == 'exit')
		
			process.exit()
		
		end if
		
		/*
		Here we can either parse the command
		and perform the required action 
		 
	 OR 
		 
		 we can emit a custom event which all
		 the available command processors listen for 
		 but only the target command processor responds		
		*/
	
	})

[/codesyntax]

 2) Forget the For Loop (at least long running ones)

This next case is a very complex one because it is very hard to figure out whether its the for loop thats to blame. The symptoms may not show up all the time and they may not even show up in the output of your app. The symptoms can also change depending on things like the hardware configuration and configuration of database servers your code is interacting with (if any).

Let us take a simple example of inserting a fixed length array of data items into a database. In case the insert function is blocking the following code will work as expected.

[codesyntax lang=”php”]

for(var i=0; i<data.length; i++)
	database.insert(data[i])
end for

[/codesyntax]

In case the insert function is non-blocking (e.g. NodeJS) then we can experience all kinds of weird behaviour depending on the length of the array, such as incomplete insertions, sporadic exceptions and even instances where everything works as expected!

In case of the while loop example, the JS thread is blocked forever so no callbacks are processed. In case of for loops, the JS thread is blocked till the loop finishes running. This means in our example if we are using non-blocking insert the loop will execute rapidly without waiting for the insert to complete. Instead of blocking, the insert operation will generate an event on completion.

This is part of the reason why NodeJS applications can get a lot of work done without resorting to explicit thread management.

If the array is big enough we can end up flooding the receiver leading to buffer overflows along the way and resulting in dropped inserts. In some cases if the array is not that big the system may behave normally.

The question of how big an array can we deal with is also difficult to answer. It changes from case to case, as it depends on the hardware, the configuration of the target database (e.g. buffer sizes) and so on.

The solution involves getting rid of the long running for loop and using events and callbacks. This throttles the insert rate by making them sequential (i.e. making sure next insert is triggered only when the previous insert has completed)

[codesyntax lang=”php”]

var count = 0

//Callback function to add the next data item
function insertOnce()

	if(count>MAX_COUNT)
	
		/*
                 Exit process by closing any external connections (e.g. database)
                 and clearing any timers. Ending the process by force is another option
                 but it is not recommended
                */
                
	
	end

	database.insert(data[count], 
		
		function ()
	
		//Called once current data has been inserted
		
		emit_event('inserted')
		end
		
		)
		
	count++

end

//Call insertOnce on the inserted event
event_listener.on('inserted', insertOnce)

//Start the insertion by doing the first insert manually.
insertOnce()

[/codesyntax]

 3) Are we done yet?

 Blocking is not always a bad thing. It can be used to track progress because when a function returns you know it has completed its work one way or the other.

One way to achieve in NodeJS is to use some kind of a counter global variable that counts down to zero or up to a fixed value. Another way to do this is to set and clear timers in case you are not able to get a count value. This technique works well when you have to monitor the progress of a single stage of an operation (e.g. inserting data into a database as in our example above).

But what if we had multiple stages that we wanted to make sure execute in a synchronous manner. For example:

1) Load raw data into database

2) Calculate max/min values

3) Use max/min values to normalise raw data and insert into a new set of tables

There are some disadvantages with this approach:

1) Counters and timers add unwanted bulk to your code

2) Global variables are easy to override accidentally especially when using simple names like ‘count’

3) Your code begins to look like a house with permanent scaffolding around it

Furthermore once you detect that the one stage has finished, how do you proceed to the next stage?

Do you get into callback hell and just start with the next stage there and then, ending up with a single code file with all three stages nested within callbacks (Answer: No!)?

Do you try and break your stages into separate code files and use spawn/exec/fork to execute them (Answer: Yes)?

It is a rather dull answer but it makes sure you don’t have too much scaffolding in any one file.

Javascript: Playing with Prototypes – I

The popularity of Javascript (JS) has skyrocketed ever since it made the jump from the browser to the server-side (thank you Node.JS). Therefore a lot of the server-side work previously done in Java and other ‘core’ languages is now done in JS. This has resulted in a lot of Java developers (like me) taking a keen interest in JS.

Things get really weird when you try and map a ‘traditional’ OO language (like Java) to a ‘prototype’ based OO language like JS. Not to mention functions that are really objects and can be passed as parameters.

That is why I thought I would explore prototypes and functions in this post with some examples.

Some concepts:

1) Every function is an object! Let us see, with an example, the way JS treats functions.

[codesyntax lang=”javascript” lines=”normal”]
function Car(type) {
this.type = type;
//New function object is created
this.getType = function()
{
return this.type;
};
}

//Two new Car objects
var merc = new Car(“Merc”);
var bmw = new Car(“BMW”);
/*
* Functions should be defined once and reused
* but this proves that the two Car objects
* have their own instance of the getType function
*/
if(bmw.getType == merc.getType)
{
console.log(true);
}
else
{
//Output is false
console.log(false);
}
[/codesyntax]

The output of the above code is ‘false’ thereby proving the two functions are actually different ‘objects’.

 

2) Every function (as it is also an object) can have properties and methods. By default each function is created with a ‘prototype’ property which points to a special object that holds properties and methods that should be available to instances of the reference type.

What does this really mean? Let us change the previous example to understand what’s happening. Let us play with the prototype object and add a function to it which will be available to all the instances.

[codesyntax lang=”javascript” lines=”normal”]

function Car(type) {
   this.type = type;
}

Car.prototype.getType = function()
{
    return this.type;
}

//Two new Car objects
var merc = new Car("Merc");
var bmw = new Car("BMW");

/*
 * Functions should be defined once and reused
 * This proves that the two Car objects
 * have the same instance of the getType function
 */
if(bmw.getType == merc.getType)
{
    //Output is true
    console.log(true);
}
else
{
    console.log(false);
}

[/codesyntax]

We added the ‘getType’ function to the prototype object for the Car function. This makes it available to all instances of the Car function object. Therefore we can think of the prototype object as the core of a Function object. Methods and properties attached to this core are available to all the instances of the function Object.

This core object (i.e. the prototype) can be manipulated in different ways to support OO behaviour (e.g. Inheritance).

 

3) Methods and properties can be added to both the core or the instance. This enables method over-riding as shown in the example below.

[codesyntax lang=”javascript” lines=”normal”]

function Car() {
    
}

//Adding a property and function to the prototype
Car.prototype.type = "BLANK";

Car.prototype.getType = function()
{
    return this.type;
}

//Two new Car objects
var merc = new Car();
var bmw = new Car();

//Adding a property and a function to the INSTANCE (merc)
merc.type = "Merc S-Class";
merc.getType = function()
{
    return "I own a "+this.type;
}

//Output
console.log("Merc Type: ", merc.getType());
console.log("BMW Type: ", bmw.getType());
console.log("Merc Object: ",merc);
console.log("BMW Object: ",bmw);

[/codesyntax]

 

The output:

Merc Type:  I own a Merc S-Class

> This shows that the ‘getType’ on the instance is being called.

BMW Type:  BLANK

> This shows that the ‘getType’ on the prototype is being called.

Merc Object:  { type: ‘Merc S-Class’, getType: [Function] }

> This shows the ‘merc’ object structure in JSON format. We see the property and function on the instance.

BMW Object:  {}

> This shows the ‘bmw’ object structure in JSON format. We see there are no properties or functions attached to the instance.