amachwe

Time-Series Dashboards with Grafana and Influx DB

I have been collecting currency data (once every hour) over the last couple of years, using a Node.JS retriever. The data goes in to a Mongo DB instance. We store the timestamp and all the currency pairings against the US Dollar (e.g. how may GBPs get you 1 USD).

Some other features we may need:

What if we want different pairings? Say we wanted to have pairings against Gold or Euros?
We may need to normalise the data if we want to compare the movement of different currencies?
We may also need ways of visualising the data (as time series etc.)

To cater to this I decided to bring in Grafana (I did not want to use the ELK stack because I believe if you don’t need text search don’t use elasticSEARCH!). Grafana does not support Mongo out of the box so I had to bring in Influx DB.

Bringing in Influx DB allowed me to integrate painlessly with Grafana (sort of). But as we all know – when you are integrating systems there is a certain minimum amount of pain you have to experience. Juggling around just shifts that pain to hopefully a location where you can deal with it efficiently. I my case I moved the pain to the integration between Mongo and Influx.

I had to create a component that pulls out the raw data from Mongo DB, pumps it through a pipeline to filter (to get the pairings of interest – the full data set has 168 currencies in it!), normalise and inject into Influx DB.

A side note: I used the Influx DB Java API which was REALLY easy to use, this encouraged me to go ahead with the Influx – Mongo integration in Java.

I also wanted the Influx – Mongo integration to be ‘on-demand’ so that we can create different pairings against different targets – such as the big 3 against Gold, major currencies against SDR (International Monetary Fund – Special Drawing Rights) etc. and populate different time-series databases in Influx. So I gave the integration a REST interface using Spark-Java.

I did encounter one problem with Grafana – InfluxDB integration was the ‘easy’ query system did not work – I had to create one manually – which is pretty straight forward as the InfluxDB documentation is decent.

Results

I was able to get the dashboards to work with normalised time-series of 6 currencies against Gold (XAU): US Dollar (USD), Indian Rupee (INR), Nigerian Naira (NGN), Euro (EUR), British Pound (GBP) and Chinese Yuan (CNY). I discovered something interesting when I was looking at the data around the ‘Brexit’ vote (23/24 June 2016).

The screenshot above is from Grafana. Filtered to show EUR-XAU, GBP-XAU, USD-XAU and NGN-XAU. We see a massive dip in Nigerian Naira before the actual Brexit vote. I was surprised so I googled and it seems that few days before Brexit vote the Naira was allowed to float against USD (earlier it was pegged to it) which led to a massive devaluation.

Then on Brexit day we see a general trend for the selected currencies to fall against Gold as it suddenly became the asset of choice once the ‘unthinkable’ had happened, with the GBP-XAU registering the largest fall. As you can see the Naira also registers a dip against Gold (XAU).

This also shows how universal Gold is. At one time all currencies were pegged against Gold, now unofficially USD is the currency of trade, but this shows when something unthinkable or truly ‘world-changing’ happens people run back to the safety of Gold.

Revisiting Apache Karaf – Custom Commands

And we are not in Kansas any more!

I wanted to talk about the new-ish Apache Karaf custom command system. Things have been made very easy using Annotations. It took me a while to get all the pieces together as most of the examples out there were using the deprecated Command system.

To create a custom command in Karaf shell we need the following:

Custom Command Class (one per Custom Command)
Entry in the Manifest to indicate that Custom Commands are present (or the correct POM entry if using maven-bundle-plugin and package type of ‘bundle’)

This is a lot simpler than before where multiple configuration settings were required to get a custom command to work.

The Custom Command Class

This is a class that contains the implementation of the command, it also contains the command definition (including the name, scope and arguments). We can also define custom ‘completers’ to allow tabbed command completion. This can be extended to provide state-based command completion (i.e. completion can adapt to what commands have been executed previously in the session).

A new instance of the Custom Command Class is spun up every time the command is executed so it is inherently thread-safe, but we have to make sure any heavy lifting is not done directly by the Custom Command Class. [see here]

Annotations

There are a few important annotations that we need to use to define our own commands:

@Command

Used to Annotate the Custom Command Class – this defines the command (name, scope etc.).

@Service

After the @Command, just before the Custom Command Class definition starts. Ensures there is a standard way of getting a reference to the custom command.

@Arguments

Within the Custom Command Class, used to define arguments for your command. This is required only if your command requires command line arguments (obviously!).

@Reference

Another optional – if your Custom Command Class requires reference to other services/beans. The important point to note here is that Custom Command Class (if you use the auto-magical way of setting it up) needs to have a default no-args constructor. You cannot do custom instantiation (or at least I was not able to find a way – please comment if you know how) by passing any beans/service refs your command may require to work. These can only be injected via the @Reference annotation. The reason for this is pretty straight forward, we want loose coupling (via interfaces) so that we can swap out the Services without having to change any config/wiring files.

Example

Let us create a simple command which takes in a top level directory location and recursively lists all the files and folders in it.

Now we want to keep the traversal logic separate and expose it as a ‘Service’ from the Custom Command Class because it is a highly re-usable function.

The listing below will declare a command to be used as:

karaf prompt> custom:listdir ‘c:\\top_level_dir\\’

[codesyntax lang=”java5″]

package custom.command;

import org.apache.karaf.shell.api.action.Action;
import org.apache.karaf.shell.api.action.Argument;
import org.apache.karaf.shell.api.action.Command;
import org.apache.karaf.shell.api.action.lifecycle.Service;

@Command(name="listdir", scope="custom", description="list all files and folders in the provided directory")
@Service
public class DirectoryListCommand implements Action
{
        //Inject our directory service to provide the listing.
	@Reference
	DirectoryService service;
	
        //Command line arguments - just one argument.
	@Arguments(index=0,name="topLevelDir", required=true, description="top level directory absolute path")
	String topLevelDirectory = null;
	
        //Creating a no-args constructor for clarity.
        public DirectoryListCommand()
        {}
        
        //Logic of the command goes here.
	@Override
	public Object execute() throws Exception
	{
		// Use the directory service we injected to get the
		// listing and print it to the console.
		
		return "Command executed";
	}
}

[/codesyntax]

The main thing in the above listing is the ‘Action’ interface which provides an ‘execute’ method to contain the logic of the command. We don’t see a ‘BundleContext’ anywhere to help us get the Service References because we use the @Reference tag and inject what we need.

This has a positive side effect of forcing us to create a Service out of the File/Directory handling functionality, thereby promoting re-use across our application. Otherwise previously we could have used the BundleActivator to initialise commands out of POJOs and register them with Karaf.

Declaring the Custom Command

To declare the presence of custom commands you need to add the following tag in the MANIFEST.MF file within the bundle:

Karaf-Commands: *

That will work for any command. Yes! it is a generic flag that you need to add in the Manifest. It tells Karaf that there are custom commands in this bundle.

Listing below is from an actual bundle:

[codesyntax lang=”text”]

Bundle-Name: nlp.digester
Bundle-SymbolicName: rd.ml.Digester
Bundle-Version: 0.0.1
Karaf-Commands: *
Export-Package: rd.ml.digester.service

[/codesyntax]

Artificial Neural Networks: Problems with Multiple Hidden Layers

In the last post I described how we work with Multi-layer Perceptron (MLP) model of artificial neural networks. I had also shared my repository on GitHub (https://github.com/amachwe/NeuralNetwork).

I have now added a Single Perceptron (SP) and Multi-class Logistic Regression (MCLR) implementations to it.

The idea is to set the stage for deep learning by showing where these types of ANN models fail and why we need to keep adding more layers.

Single Perceptron:

Let us take a step back from a MLP network to a Single Perceptron to delve a bit deeper into its working.

The Single Perceptron (with a one output) acts as a Simple Linear Classifier. In other words, for a two class problem, it finds a single hyper-plane (n-dimensional plane) that separates the inputs based on their class.

The image above describes the basic operation of the Perceptron. It can have N inputs with each input having a corresponding weight. The Perceptron itself has a bias or threshold term. A weighted sum is taken of all the inputs and the bias is added to this value (a linear model). This value is then put through a function f to get the actual output of the Perceptron.

The function f is an activation function with the simplest case being the so-called Step Function:

f(x) = 1 if [ Sum(weight * input) + bias ] > 0

f(x) = -1 if [ Sum(weight*input) + bias ] <= 0

Perceptron might look simple but it is a powerful model. Implement your own version or walk through my example (rd.neuron.neuron.perceptron.Perceptron) and associated tests on GitHub if you are not convinced.

At the same time not all two-class problems are created equal. As mentioned before, the Single Perceptron partitions the input space using a hyper-plane to provide a classification model. But what if no such separation exists?

Single Perceptron - Linear Separation — Single Perceptron – Linear Separation

The image above represents two very simple test cases: the 2 input AND and XOR logic gates. The two inputs (A and B) can take values of 0 or 1. The single output can similarly take the value of 0 or 1. The colourful lines represents the model which should separate out the two classes of output (0 and 1 – the white and black dots) and allow us to classify incoming data.

For AND the training/test data is:

0, 0 -> 0 (white dot)
0, 1 -> 0 (white dot)
1, 0 -> 0 (white dot)
1, 1 -> 1 (black dot)

We can see it is very easy to draw a single straight line (i.e. a linear model with a single neuron) that separates the two classes (white and black dots), the orange line in the figure above.

For XOR the training/test data is:

0, 0 -> 0 (white dot)
0, 1 -> 1 (black dot)
1, 0 -> 1 (black dot)
1, 1 -> 0 (white dot)

For the XOR it is clear that no single straight line can be drawn that can separate the two classes. Instead what we need are multiple constructs (see figure above). As the single perceptron can only model using a single linear construct it will be impossible for it to classify this case.

If you run a test with the XOR data you will find that the accuracy comes out to be 50%. That might sound good but it is the exact same accuracy if you were to guess one of the classes constantly and the classes were equally distributed. For the XOR case here, as the 0’s and 1’s are equally distributed if we kept guessing 0 or 1 constantly we would still be right 50% of the time.

To put this in contrast to a Multi Layer Perceptron which gives an accuracy of 100%. What is the main difference between a MLP and a Single Perceptron? Obviously the presence of multiple Perceptrons organised in layers! This makes it possible to create models with multiple linear constructs (hyper-planes) which are represented by the blue and green lines in the figure above.

Can you figure out how many units we would need as a minimum for this task? Read on for the answer.

Solving XOR using MLP:

If you used the logic that for the XOR example we need 2 hyper-planes therefore 2 Perceptrons would be required your reasoning would be correct!

Such a MLP network usually would be arranged in a 2 -> 2 -> 1 formation. Where we have two input nodes (as there are 2 inputs A and B), a hidden layer with 2 Perceptrons and a aggregation layer with a single Perceptron to provide a single output (as there is just one output). The input layer doesn’t do anything interesting except presents the values to both the hidden layer Perceptrons. So the main difference between this MLP and a Single Perceptron is that:

we add 1 more processing unit (in the hidden layer)
to aggregate the output to a single variable an aggregation unit (output layer)

If you check the activation of the individual Perceptrons in the hidden layer (i.e. processing layer) of a MLP trained for XOR you will find a pattern for the activation when presented with type 1 Class (A = B – white dot) and when presented with a type 2 Class (A != B – black dot). One possibility for such a MLP is that:

For Class 1 (A = B – white dot): Both the neurons either activate or not (i.e. outputs of the 2 hidden layer Perceptrons are comparable – so either both are high or both are low)
For Class 2 (A != B – black dot): The neurons activate asymmetrically (i.e. there is a clear difference between the outputs of the 2 hidden layer Perceptrons)

Conclusions:

Thus there are three takeaways from this post:

a) To classify more complex and real world data which is not linearly separable we need more processing units, these are usually added in the Hidden Layer

b) To feed the processing units (i.e. the Hidden Layer) and to encode the input we utilise an Input Layer which has only one task – to present the input in a consistent way to the hidden layer, it will not learn or change as the network is trained.

c) To work with multiple Hidden Layer units and to encode the output properly we need an aggregation layer to collect output of the Hidden Layer, this aggregation layer is also called an Output Layer

I would again like to bring up the point of input representation and encoding of output:

We have to be careful in choosing the right input and output encoding
For the XOR and other logic gate example we can simply map the number of bits to the number of inputs/outputs but what if we were trying to process handwritten documents – would you have one character per output? How would you organise the inputs given that the handwritten data can be of different length?

In the next post we will start talking about Deep Learning as we have provided two very important reasons for so-called shallow networks to fail.

Artificial Neural Networks: An Introduction

Artificial Neural networks (ANNs) are back in town after a rather long exile to the edges of Artificial Intelligence (AI) product space. Therefore I thought I would do a post on it to provide an introduction.

For a one line intro: An Artificial Neural Network is a Machine Learning paradigm that mimics the structure of the human brain.

Some of the biggest tech companies in the world (i.e. Google, Microsoft and IBM) are investing heavily in ANN research and in creating new AI products such as driver-less cars, language translation software and virtual assistants (e.g. Siri and Cortana).

There are three main reasons for a resurgence in ANNs:

Availability of cheap computing power in form of multi-core CPUs and GPUs which enables machines to process and learn from ‘big-data’ using increasingly sophisticated networks (e.g. deep learning networks)
Problem with using existing Machine Learning methods against high volume data with complex representations (e.g. images, videos and sound) required for novel applications such as driver-less cars and virtual assistants
Availability of free/open source general purpose ANN libraries for major programming languages (i.e. TensorFlow/Theano – Python; DL4J – Java), earlier either you had to code ANNs from scratch or shell out money for specialised software (e.g. Matlab plugins)

My aim is to provide a trail up to the current state of the art (Deep Learning) over the space of 3-4 posts. To start with, in this post I will talk about the simplest form of ANN (also one of the oldest), called a Multi-Layer Perceptron Neural Network (MLP).

Application Use-Case:

We are going to investigate a supervised learning classification task using simple MLP networks with a single hidden layer, trained using back-propagation.

Simple Multi-Layer Perceptron Network:

The image above describes a simple MLP neural network with 5 neurons in the input layer, 3 in the hidden layer and 2 in the output layer.

Data Set for Training ANNs:

For supervised learning classification tasks we need labelled data sets. Think of it as a set of input – expected output pairs. The input can be an image, video, sound clip, sensor readings etc.; the label(s) can be set of tags, words, classes, expected state etc.

The important thing to understand is that whatever the input, we need to define a representation that optimally describes the features of interest that will help with the classification.

Representation and feature identification is a very important task that machines find difficult to do. For a brain that has developed normally this is a trivial task. Because this is a very important point I want to get into the details (part of my Ph.D. was on this topic as well!).

Let us assume we have a set of grey scale images as the input with labels against them to describe the main subject of the image. To keep it simple let us also assume a one-to-one mapping between images and tags (one tag per image). Now there are several ways of representing these images. One option is to flatten each image into an array where each element represents the grey scale value of a pixel. Another option is to take an average of 2 pixels and take that as an array element. Yet another option is to chop the image into fixed number of squares and take the average of that. But the one thing to keep in mind is whatever representation we use, it should not hide features of importance. For example if there are features that are at the level of individual pixels and we use averaging representation then we might loose a lot of information.

The labels (if less in number) can be encoded using binary notation otherwise we can use other representations such as word vectors.

To formalise:

If X is a given input at the Input Layer;

Y is the expected output at the Output Layer;

Y’ is the actual output at the Output Layer;

Then our aim is to learn a model (M) such that:

Y’ = M(X) where Error calculated by comparing Y and Y’ is minimised.

One method of calculating Error is (Y’-Y)^2

To calculate the total error for n training examples me just use the Mean Squared Error formula (https://en.wikipedia.org/wiki/Mean_squared_error)

Working of a Network:

The MLP works on the principle of value propagation through different layers till it is presented as an ouput at the output layer. For a three layer network the propagation of value is as follows:

Input -> Hidden -> Output -> Actual Output

The propagation of the Error is in reverse.

Error at Output -> Output -> Hidden -> Input

When we propagate the Error back through the network we adjust the weights and biases between the Output-Hidden and Hidden-Input layers. The adjustment is carried out one layer at a time keeping all other layers the same (i.e. updates are applied to the entire network in a single step). This process is called ‘Back-propagation’. The idea is to minimise the Error which is computed as a ‘gradient descent’, sort of like walking through a hilly region but always down hill. What gradient descent does not guarantee is whether the lowest point (i.e. Error) you will reach will be the Global Minimum – i.e. there are no guarantees that the lowest Error figure you found is the lowest possible Error figure unless the error is zero!

This excellent post describes the process of ‘Back-propagation’ in detail with a worked example: https://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example/

The one key point of the process is that as we move from Output to the Input layer, tweaking the weights as we perform gradient descent, a chain of interactions is formed (e.g. Input Neuron 1 affects all Hidden Neurons which in turn affect all Output Neurons). This chain becomes more volatile as the number of Hidden Layers increase (e.g. Input Neuron 1 affects all Hidden Layer 1 Neurons which affect all Hidden Layer 2 Neurons … which affect all Hidden Layer M Neurons which affect all the Output Neurons). As we go deeper into the network the effect of individual hidden neurons on the final Error at the output layer becomes small.

This leads to the problem of the ‘Vanishing Gradient’ which limits the use of traditional methods for learning when using ‘deep’ topologies (i..e. more than 1 hidden layer) because this chained adjustment to the weights becomes unstable and for deeper layers the process no longer resembles following a downhill path. The gradient can become insignificant very quickly or it can become very large.

When training all training examples are presented one at a time. For each of the examples the network is adjusted (gradient descent). Each loop through the FULL set of training examples is called an epoch.

The problem here can be if there are very large number of training examples and their presentation order does not change. This is because initial examples lead to larger change in the network.So if the first 10 examples (say) are similar, then the network will be very efficient at classifying those class of cases but will generalise to other classes very poorly.

A variation of this is called stochastic gradient descent where training examples are randomly selected so the danger of premature convergence is reduced.

Working of a Single Neuron:

A single neuron in a MLP network works by combining the input it receives through all the connections with the previous layer, weighted by the connection weight; adding an offset (bias) value and putting the result through an activation function.

For each input connection we calculate the weighted value (w*x)
Sum it across all inputs to the neuron (sum(w*x))
Apply bias (sum(w*x)+bias)
Apply activation function and obtain actual output (Output = f( sum(w*x)+bias ))
Present the output value to all the neurons connected to this one in the next layer

When we look at the collective interactions between layers the above equations become Matrix Equations. Therefore value propagation is nothing but Matrix multiplications and summations.

Activation functions introduce non-linearity into an otherwise linear process (see Step 3 and 4). This allows the network to handle non-trivial problems. Two common activation functions are: Sigmoid Function and Step Function.

More info here: https://en.wikipedia.org/wiki/Activation_function

Implementation:

I wanted to dig deep into the workings of ANNs which is difficult if you use a library like DL4J. So I implemented my own using just JBLAS matrix libraries for the Matrix calculations.

The code can be found here: https://github.com/amachwe/NeuralNetwork

It also has two examples that can be used to evaluate the working.

XOR Gate
1. Has 4 training instances with 2 inputs and a single output, the instances are: {0,0} -> 0; {1,1} -> 0; {1,0} -> 1; {0,1} -> 1;
MNIST Handwritten Numbers
1. Has two sets of instances (single handwritten digits as images of constant size with corresponding labels) – 60k set and 10k set
2. Data can be downloaded here: http://yann.lecun.com/exdb/mnist/

MNIST Example:

The MNIST dataset is one of the most common ‘test’ problems one can find. The data set is both interesting and relevant. It consists of images of hand written numbers with corresponding labels. All the images are 28×28 and each image has a single digit in it.

We use the 10k instances to train and 60k to evaluate. Stochastic Gradient Descent is used to train a MLP with a single hidden layer. The Sigmoid activation function is used throughout.

The input representation is simply a flattened array of pixels with normalised values (between 0 and 1). A 28×28 image results in an array of 784 values. Thus the input layer has 784 neurons.

The output has to be a label value between 0 and 9 (as images have only single digits). We encoded this by having 10 output neurons with each neuron representing one digit label.

That just leaves us with the number of hidden neurons. We can try all kinds of values and measure the accuracy to decide what suits best. In general the performance will improve as we add more hidden units up to a point after that we will encounter the law of diminishing returns. Also remember more hidden units means longer it takes to train as the size of our weight matrices explode.

For 15 hidden units:

a total of 11,760 weights have to be learnt between the input and hidden layer
a total of 150 weights have to be learnt between the hidden and output layer

For 100 hidden units:

a total of 78,400 weights have to be learnt between the input and hidden layer
a total of 1000 weights have to be learnt between the hidden and output layer

Hidden Units and performance — Hidden Units and Performance

The graph above shows what happens to performance as the number of hidden layer units (neurons) are increased. Initially from 15 till about 100 decent performance gains are achieved at the expense of increased processing time. But after 100 units the performance increase slows down dramatically. Fixed learning rate of 0.05 is used. The SGD is based on single example (mini-batch size = 1)

Vanishing Gradient in MNIST:

Remember the problem of vanishing gradient? Let us see if we can highlight its effect using MNIST. The chaining here is not so bad because there is a single hidden layer but still we should expect the outer – hidden layer weights to have on average larger step size when the weights are being adjusted as compared to the inner – hidden layer weights (as the chain goes from output -> hidden -> input). Let us try and visualise this by sampling the delta (adjustment) being made to weights along with which layer they are in and how many training examples have been shown.

weights update by layer — Weights update by layer and number of training examples

After collecting millions of samples (remember for a 100 hidden unit network each training instance results in almost 80,000 weight updates so it doesn’t take long to collect millions of samples) of delta weight values in hidden and input layer we can take their average by grouping based on layer and stage of learning to see if there is significant difference in the step sizes.

What we find (see image above) is as expected. The delta weight updates in the outer layer are much higher than in the hidden layer to start with, but it converges rapidly as more training examples are presented.Thus the first 250 training examples have the most effect.

If we had multiple hidden layers, the chances are that delta updates for deeper layers would be negligible (maybe even zero). Thus the adaption or learning is being limited to the outer layer and the hidden layer just before it. This is called shallow learning. As we shall see to train multiple hidden layers we have to use a divide and rule strategy as compared to our current layer by layer strategy.

Keep this in mind as in our next post we will talk about transitioning from shallow to deep networks and examine the reasons behind this shift.

Bots using Microsoft Bot Platform and Heroku: Customer Life-cycle Management

This post is about using the Microsoft Bot Platform with Heroku to build a bot!

The demo scenario is very simple:

User starts the conversation
Bot asks for an account number
Customer provides an account number or indicates they are not a customer
Bot retrieves details if available for a personalised greeting and asks how can it be of help today
Customer states the problem/reason for contact
Bot uses sentiment analysis to provide the appropriate response

Bots

Bots are nothing but automated programs that carry out some well defined set of tasks. They are old technology (think web-crawlers).

Recent developments such as Facebook/Skype platform APIs being made available for free, easy availability of cloud-computing platforms and relative sophistication of machine learning as a service has renewed interest in this technology especially for customer life-cycle management applications.

Three main components of a modern, customer facing bot app are:

Communication Platform (e.g. Facebook Messenger, Web-portal, Skype etc.): the eyes, ears and mouth of the bot
Machine Learning Platform: the brain of the bot
Back end APIs for integration with other systems (e.g. order management): the hands of the bot

Other aspects include giving a proper face to the bot in terms of branding but from a technical perspective above three are complete.

Heroku Setup

Heroku provides various flavours of virtual containers (including a ‘free’ and ‘hobby’ ones) for different types of applications. To be clear: a ‘dyno’ is a lightweight Linux container which runs a single command that you specify.

Another important reason to use Heroku is that it provides a ‘https’ endpoint for your app which makes it more secure. This is very important as most platforms will not allow you to use a plain ‘http’ endpoint (e.g. Facebook Messenger). So unless you are ready to fork out big bucks for proper web-hosting and SSL certificates start out with something like Heroku.

Therefore for a Node.JS dyno you will run something like node <js file name>.

The cool thing about Heroku (in my view) is that it integrates with Git so deploying your code is as simple as ‘git push heroku <branch name to push from>’.

You will need to follow a step by step process to make yourself comfortable with Heroku (including installing the Heroku CLI) here: https://devcenter.heroku.com/start

We will be using a Node.JS flavour of Heroku ‘dynos’.

Heroku has an excellent ‘hello world’ guide here: https://devcenter.heroku.com/articles/getting-started-with-nodejs#introduction

Microsoft Bot Platform

The Microsoft Bot Platform allows you to create, test and publish bots easily. It also provides connectivity to a large number of communication platforms (such as Facebook Messenger). Registration and publishing is FREE at the time of writing.

You can find more information on the Node.js base framework here: http://docs.botframework.com/builder/node/overview/

The dialog framework in the MS Bot Platform is based on REST paths. This is a very important concept to master before you can start building bots.

Architecture

Microsoft provide a publishing platform to register your bot.

Once you have the bot correctly published on a channel (e.g. Web, Skype etc.) messages will be passed on to it via the web-hook.

You need to provide an endpoint (i.e. the web-hook) to a web app in Node.JS which implements the bot dialog framework to publish your bot. This web app is in essence the front door to your ‘bot’.

You can test the bot locally by downloading the Microsoft Bot Framework simulator.

The demo architecture is outlined below:

Detailed Architecture for the Demo

There are three main components to the above architecture as used for the demo:

Publish the bot in the Bot Registry (Microsoft) for a channel – you will need your Custom Bot application endpoint to complete this step,in the demo I am publishing only to a web-channel which is the easiest to work with in my opinion. Once registered you will get an application id and secret which you will need to add to the bot app to ‘authorise’ it.
Custom Bot Application (Node.JS) with the embedded bot dialog – the endpoint where the app is deployed needs to be public, a HTTPS endpoint is always better! I have used Heroku to deploy my app which gives me a public HTTPS endpoint to use in the above step.
Machine Learning Services – to provide functionality to make the Bot intelligent, we can have a statically scripted bot with just the embedded dialog but where is the fun in that? For the demo I am using Watson Sentiment Analysis API to detect the users sentiment during the chat.

*One item that I have purposely left out within the Custom Bot app, in the architecture, is the service that provides access to the data which drives the dialog (i.e. Customer Information based on the Account Number). In the demo a dummy service is used that returns hard coded values for Customer Name when queried using an Account Number.

The main custom bot app Javascript file is available below, right click and save-as to download.

Microsoft Bot Demo App

Enjoy!!

Data Analysis: Dengue Disease Prediction

Delhi suffers from an annual dengue epidemic between the months of July and October. It is only the cooler and drier weather at the start of November that stops the mosquitoes that spread this disease.

The year 2015 was a bad year for dengue and all kinds of records were broken. Thankfully due to increased awareness the death toll did not set any records. In fact it was not as high as it could have been (in my view even 1 death is high!).

So I wanted to try and see if there is a relation between Rainfall and Dengue cases?

Also to see if there is any way of predicting the number of Dengue cases in 2016?

I used the historic data available from: http://nvbdcp.gov.in/den-cd.html
and MCD (Delhi).

Data

Year, Rainfall, Cases

2006, 618.70, 3340

2007, 601.60, 548

2008, 815.00, 1216

2009, 595.50, 1154

2010, 953.10, 6259

2011, 661.80, 1131

2012, 559.40, 2093

2013, 1531.40, 5574

2014, 778.60, 995

2015, 1123.10, 15836

Rainfall vs Dengue
More rain – more water logging leading to more opportunities for mosquitoes to multiply. Therefore there must be some relationship between Rainfall and the number of Dengue cases. Given the dramatic growth of Delhi over the last five years we restrict going as far back as 2010.

Using the available data for rainfall and dengue cases if we fit a straight line and 2nd degree polynomial we get Diagram 1 below.
dengue_rainfall Diagram 1: Rainfall vs Dengue Cases.

We see that for a linear model there is a clear trend of higher number of cases with increasing rainfall. The R-Square value is 0.35 (approx) which is not a good fit but it is expected given the fluctuations.

What is more interesting is the 2nd degree polynomial which gives a R-Square value of 0.94 (approx) which is very good. But this could also point to over-fitting.

Another way of interpreting it is that there is a ‘sweet-spot’ for dengue spreading rapidly. If the rain is below a certain amount – there is not enough water around for dengue vector (mosquito) to breed. If there is too much rain then also there is lack of ‘still’ water to allow mosquitoes to breed.

The ‘sweet spot’ seems to be rain at a certain level that tapers leaving enough amount of ‘stagnant’ water for mosquitoes to breed.

Growth of Dengue over the Years

Diagram 2 shows the growth trend of Dengue over the years. In 1996 the dengue epidemic broke all records. In 2015 it broke all records once again. If we were to plot the number of cases over the years we see that the graph is steadily marching upwards.

If all other factors remain constant we should see about 6000 cases in 2016.

dengue_years

Diagram 2: Dengue growth rate over the years.

This is a very simple analysis of dengue. There are lots of other variables that can be added (for example – growing population, temperature profiles, regional variance). But I wanted to show how even simple analysis can produce some interesting results.

Another important point I wanted to highlight was the lack of easily accessible data on diseases and epidemics. If we had better data then public health initiatives could be better targeted to combat such occurrences.

Quality of Life Reduced Question Set: Bristol Open Data

https://public.tableau.com/javascripts/api/viz_v1.jsThis visualisation operates upon a reduced set of questions from the Quality of Life indicators. This data has been provided by the Bristol City Council under the open data initiative (https://opendata.bristol.gov.uk/).

Using this view the reduced question set can be examined across all the wards as an average of beta for particular question across all wards in Bristol.

Click on a question to focus on it and to examine the beta value across all the wards. A count of wards with positive and negative beta values is also shown. These should correspond to the total green/red marks seen.
The click on a ward to examine the response over time and see the trend line (associated with beta).

Java and Apache Spark used to generate the csv data files.

Link: Dashboard

Criteria for beta calculation: minimum three years data should be available.

Reduced Question Set:

[codesyntax lang=”email”]

% respondents who usually buy fairtrade foods
% people in employment
% respondents book tickets online
Liveability indicator
% respondents who have problem from fly posting
% respondents who are carers 50 plus hours per week
% respondents who have chosen locally grown food to tackle climate change
% respondents who were victims of crime and reported  to the police
% respondents who have used the local tip or household recyling centre
% respondents in receipt of a means tested benefit
% respondents who say there is a problem withdrug dealing in their neighbourhood(includes does not apply/don't know)
% respondents who have been discriminated against or harassed because of ethnicity/race
% respondents who sometimes buy or consume locally grown food
% respondents satisfied with the availability of council/housing association housing
% respondents who are overweight and obese
% who agree that a directly elected mayor is improving the leadership of the city
% respondents who think that the appearance of their area has got better in the last two years
% respondents with problem of cars blocking local pavement
% respondents who find it difficult to get by financially
% respondents who feel locally, antisocial behaviour is a problem
% respondents who think shops have got better in the last two years
% respondents who think schools have got worse in the last two years
% respondents who disagree that the council provides value for money
% respondents who don't have the internet at home
% respondents who are underweight
% respondents who use internet banking
% respondents who agree they can influence decisions that affect public services they use
% respondents unable to use the internet
% respondents who apply for Council or Government services online
% respondents satisfied with the local tip or household recycling centre
% respondents who think drug misuse and drug dealing has got worse in the last two years
% respondents whose combined energy bill per quarter is £300-£399
% respondents satisfied that open public land is kept clear of litter and refuse
% respondents with a problem of abandoned supermarket trolleys in their neighbourhood
% respondents who feel they belong to neighbourhood
% respondents who took 3 or more return long haul flights in the past year
% respondents who are satisfied with the state of repair of their home
% respondents who have a car or van available for use by them or members of the household
% respondents who have volunteered for charity or community every month
% respondents who sometimes buy fairtrade foods
% respondents who say street litter is a problem
% respondents satisfied with their job
% respondents with easy access to a doctor
% respondents satisfied that public land is kept clear of litter and refuse
% economically active respondents who are unemployed and available for work
% respondents who did not take any return long haul flights in the past year
% respondents who have been discriminated against or harassed because of sexual orientation
% respondents who use a public computer
% respondents with parking issues
% respondents satisfied with amount of parks and green spaces
% respondents who live in council or housing association accommodation
% respondents who agree that people take responsibility for their children
% respondents who have been a victims of crime in the last 12 months
% respondents who cook at home using fresh and raw ingredients
% respondents who don't use dental services
% respondents who say graffiti is a problem
% respondents who have volunteered for charity or community at least once a month
% respondents with no educational or technical qualifications
% respondents who have problem from fly tipping
% respondents keen to learn the internet
% respondents who use the internet at work
% respondents who think shops have got worse in the last two years
% respondents who feel safe when outside in their neighbourhood after dark
% respondents who live in rented or tied accommodation
% respondents satisfied with the maintenance of footpaths

% respondents who have been discriminated against or harassed because of religion

% respondents who use the internet at home

% respondents who search the internet

% respondents who say discarded needles and syringes are a problem in their neighbourhood

% respondents who have noise from industrial commercial or construction sites

% respondents who smoke

% respondents who think parks and public spaces has got worse in the last two years
% respondents who think the area they live in will be better in five years time

% who feel police and local public services are successfully dealing with issues of crime and anti-social behaviour in their area
% respondents who are obese
% respondents who have someone use the internet for them
% respondents who think their area is a good place to bring up children
% respondents who use NHS dental services
% of respondents who have access to the internet and use it
% respondents who use digital technology to create content

% respondents who took 2 return long haul flights in the past year

% respondents satisfied with cost and availability of housing
% respondents who have their own garden
% respondents who say traffic congestion is a problem in their neighbourhood
% respondents who have been discriminated against or harassed because of gender
% respondents who shop online
% respondents who usually buy or consume locally grown food
% respondents who agree sexual harassment is an issue in Bristol
% respondents who have been discriminated against or harassed because of age
% respondents who say drug dealing is a problem in their neighbourhood
% respondents who think drunk and rowdy people in public places is a problem
% respondents who think job opportunities has got worse in the last two years
% respondents who use the internet when out and about
% respondents satisfied with jobs in the neighbourhood
% respondents who agree ethnic differences are respected in their neighbourhood
% respondents happy using the internet
% respondents who say drug dealing is a serious problem in their neighbourhood.
% respondents who agree that people treat other people with respect in their neighbourhood
% respondents who eat home cooked 4 times a week
% respondents unemployed
% respondents satisfied with the bus service
% respondents who feel safe when outside their neighbourhood during the day
% respondents satisfied with  local tips / household recycling centres
% respondents who say vandalism is a problem in their neighbourhood
% respondents who say personal safety is a problem in their neighbourhood
% who live in owned private or tied accommodation
% respondents who think drug misuse and drug dealing has got got better in the last two years
% respondents who live in households with a smoker
% respondents who think antisocial behaviour has got got better in the last two years
% respondents who think antisocial behaviour has got worse in the last two years
% respondents who code
% respondents who are dissatisfied with the way the council runs things
% respondents who say they have problem with personal safety in their neighbourhood(includes does not apply/don't know)
% respondents who feel safe indoors after dark
% respondents not interested in using the internet
% respondents who own their own homes
% respondents who eat food grown by themselves or by people they know
% respondents satisfied with academic standards of local schools
% respondents satisfied with markets
% respondents who have access to the internet
% respondents who took 1 return short haul flight in the past year
% respondents who have been discriminated against or harassed because of disability
% respondents who are satisfied with the way the council runs things
% respondents who took 1 return long haul flight in the past year
% respondents who say insensitive development is a problem in their neighbourhood
% respondents with easy access to childcare (adult survey)
% respondents satisfied with leisure facilities/services for children under 12 (adult survey)
% respondents satisfied with libraries
% respondents with easy access to training or education
% respondents who did not take any return short haul flights in the past year
% respondents satisfied with neighbourhood
% respondents who feel dog fouling is a problem in local area
% respondents who took 3 or more return short haul flights in the past year
% respondents who have noise from neighbours
% respondents who feel crime has got worse over last 3 years
% respondents who say neglected or derelict buildings or land is a problem in their neighbourhood
% respondents who think their neighbourhood has got worse over the last 2 years
% respondents who think job opportunities has got got better in the last two years
% respondents who say their neighbourhood is getting better
% respondents with noise from fireworks
% respondents who think the police and council succesfully respond to anti-social behaviour
% respondents who can't afford the internet
% respondents who are willing to call themselves disabled
% respondents who live in households with someone who smokes regularly within the home
% respondents whose combined energy bill per quarter is £400 or more
% respondents satisfied with health services
% respondents who agree people from different backgrounds get on well together
% respondents who don't have a garden or allotment
% respondents buying energy efficient light bulbs
% respondents satisfied with general household waste collection
% respondents who think drug use is a problem in their area

[/codesyntax]
https://public.tableau.com/javascripts/api/viz_v1.js

Javascript: Playing with Prototypes – II

Let us continue the discussion about Prototypes in Javascript and show the different ways in which inheritance can work. Inheritance is very important because whether you are trying to extend the JQuery framework or trying to add custom event sources in Node.JS you will need to extend an existing JS object.

Let us remember the most important mantra in JS – “nearly everything interesting is an object, even functions”

Objects are mutable, primitives (e.g. strings) are NOT!

Let us first introduce the example. There is a base object: Person which has two properties ‘id’ and ‘age’ and getter/setter methods for these. We want to create a child object: Student, which should inherit the id and age properties from Person and add its own read-only ‘student id’ property.

[codesyntax lang=”javascript”]

/*
Base object: Person
*/
function Person(id)
{
  this.id = 0;
  this.age = 0;
}

/*
Add set/get methods for Age and Id
*/
Person.prototype.setId = function(id)
{

  this.id = id;
};

Person.prototype.getId = function()
{
  return this.id;
};

Person.prototype.setAge = function(age)
{

  this.age = age;
};

Person.prototype.getAge = function()
{
  return this.age;
};


/*
Child object Student which should extend properties and methods from Person
*/
function Student(sid)
{
  this.sid = sid;

  /*
  Constructor for Person (to be safe)
  */
  Person.call(this);
  /*
  Student Id getter
  */
  Student.prototype.getSid = function()
  {
    return this.sid;
  }
}

[/codesyntax]

There are different ways (patterns) of implementing ‘inheritance’ based (Inheritance Methods):

Pattern 1: Student.prototype = Object.create(Student);
Pattern 2: Student.prototype = Object.create(Person.prototype);
Pattern 3: Student.prototype = new Person;

Below is the snippet of code we use to probe what happens in each of the three cases. Two instances of Student are created (s1 and s2). Then we examine the prototypes and assign values to some of the properties.

[codesyntax lang=”javascript”]

<Inheritance Method: one of the three options above>

var s1 = new Student(101);
var s2 = new Student(102);

console.log("S1",s1);
console.log("S2",s2);
console.log("Proto S1",Object.getPrototypeOf(s1));
console.log("Proto S2",Object.getPrototypeOf(s2));
if (Object.getPrototypeOf(s1) == Object.getPrototypeOf(s2)) {
  console.log("Compare prototypes:",true);
}

console.log("\n\n");

s1.setId(1);
s1.setAge(30);
console.log("S1",s1.getAge());

s2.setId(2);

console.log("Compare Id S1:S2",s1.getId(),s2.getId());

s2.setAge(20);
console.log("S2 set age 20");

console.log("S1 age",s1.getAge());
console.log("S2 age",s2.getAge());

[/codesyntax]

Let us look at what happens in each case:

1) Student.prototype = Object.create(Student);

Output:

[codesyntax lang=”php”]

S1: { sid: 101, id: 0, age: 0 }
S2: { sid: 102, id: 0, age: 0 }
Proto S1: { getSid: [Function] }
Proto S2: { getSid: [Function] }
Compare prototypes: true


/Users/azaharmachwe/node_code/NodeTest/thisTest.js:73
s1.setId(1);
^
TypeError: Object object has no method 'setId'
at Object.<anonymous> (/Users/azaharmachwe/node_code/NodeTest/thisTest.js:73:4)
at Module._compile (module.js:456:26)
at Object.Module._extensions..js (module.js:474:10)
at Module.load (module.js:356:32)
at Function.Module._load (module.js:312:12)
at Function.Module.runMain (module.js:497:10)
at startup (node.js:119:16)
at node.js:901:3

[/codesyntax]

The surprising result is that an exception is thrown. It seems there is no method ‘setId’ on the Student instance. This means that inheritance did not work. We can confirm this by looking at the prototype of S1 and S2 instance. Only the getter for student id defined in the Student object is present. We have not inherited any of the methods from Person.

But if we look at the list of attributes we see ‘id’ and ‘age’ present. So it seems the attributes were acquired somehow.

If we look at the way we define the Person object we actually add the ‘id’ and ‘age’ attributes to the instance (i.e. we use this.id) where as the accessor methods are added on the prototype. When we create an instance of Student as Student.prototype = Object.create(Student) we correctly set the attributes as they are defined at the instance level.

If the line in bold is removed then you will only see the Student level attribute (‘sid’).

2) Student.prototype = Object.create(Person.prototype);

Output:

[codesyntax lang=”php”]

S1: { sid: 101, id: 0, age: 0 }
S2: { sid: 102, id: 0, age: 0 }
Proto S1: { getSid: [Function] }
Proto S2: { getSid: [Function] }
Compare prototypes: true



S1 30
Compare Id S1:S2 1 2
S2 set age 20
S1 age 30
S2 age 20

[/codesyntax]

No errors this time.

So we see both S1 and S2 instances have the correct attributes (Person + Student) prototypes for both contain the getter defined in Student and both have the same prototype. Something more interesting is the fact that we can set ‘age’ and ‘id’ on them as well showing us that the attribute setters/getters have been inherited from Person.

But why can’t we see the get/set methods for ‘age’ and ‘id’ on the Student prototype? The reason is that with the call to Object.create with the Person.prototype parameter we chain the prototype of Person with that of Student. To see the get/set methods for ‘age’ and ‘id’ that the Student instance is using add the following line to the probe commands:

console.log(“>>”,Student.prototype.__proto__);

This proves that the object is inheriting these methods at the prototype level and not at the object level. This is the recommended pattern for inheritance.

3) Student.prototype = new Person;

This is a method you may see in some examples out there. But this is not the recommended style. The reason is that in this case you are linking the prototype of Student with an instance of Person. Therefore you get all the instance variables of the super-type included in the sub-type.

Output:

[codesyntax lang=”php”]

S1: { sid: 101 }
S2: { sid: 102 }
Proto S1: { id: 0, age: 0, getSid: [Function] }
Proto S2: { id: 0, age: 0, getSid: [Function] }
Compare prototypes: true



S1 30
Compare Id S1:S2 1 2
S2 set age 20
S1 age 30
S2 age 20

[/codesyntax]

Note the presence of ‘id’ and ‘age’ attributes with default values in the prototypes of S1 and S2. If the attributes are array or object type (instead of a primitive type as in this case), we can get all kinds of weird, difficult to debug behaviours. This is the case with frameworks where a base object needs to be extended to add custom functionality. I came across this issue while trying to create a custom Node.JS event source.

Wrong way to extend: A Node.JS example

I have seen many Node.JS custom event emitter examples that use pattern number (3). The correct pattern to use is pattern (2). Let us see why.

The code below extends the Node.JS EventEmitter (in ‘events’ module) to create a custom event emitter. Then two instance of this custom event emitter are created. Different event handling callback functions for the two instances are also defined. This will allow us to clearly identify which instance handled the event.

In the end we cause the custom event to fire on both the instances.

[codesyntax lang=”javascript”]

var ev = require("events");

/*
Create a custom event emitter by extending the Node.JS event emitter
*/
function myeventemitter(id)
{
  this.id = id;
  ev.EventEmitter.call(this);
}
/*
Try different ways of extending
*/

myeventemitter.prototype = new ev.EventEmitter;

myeventemitter.prototype.fire = function()
{
  console.log('\nFire',this.id);
  this.emit('go',this.id);
}

/*
Initialise two instances of the custom event emitter
*/
var myee1 = new myeventemitter("A");
var myee2 = new myeventemitter("B");

/*
Define callbacks on the custom event ('go')
*/
myee1.on('go',function(id)
{
  console.log("My EE1: Go event received from",id);
});

myee2.on('go',function(id)
{
  console.log("My EE2: Go event received from",id);
});

/*
Cause the custom event to fire on both the custom event emitters
*/
myee1.fire();
myee2.fire();

/*
Dump the prototype of our custom event emitter
*/
console.log(myeventemitter.prototype);

[/codesyntax]

Note we are using pattern (3) to extend the EventEmitter:

myeventemitter.prototype = new ev.EventEmitter;

We expect that custom events fired on instance 1 will result in the event handling function on instance 1 being called. The same thing should happen for instance 2. Let us look at the actual output:

[codesyntax lang=”javascript”]

Fire A
My EE1: Go event received from A
My EE2: Go event received from A

Fire B
My EE1: Go event received from B
My EE2: Go event received from B
{ domain: null,
_events: { go: [ [Function], [Function] ] },
_maxListeners: 10,
fire: [Function] }

[/codesyntax]

This looks wrong! When we cause instance 1 to fire its custom event it actually triggers the event handling functions in both the instances! Same happens when we try with instance 2.

The reason as you may have already guessed is that when we use pattern (3) we actually attach the JSON object that holds the individual event handling functions to the prototype (variable name: _events). This can be seen in the above output.

Therefore both instances of the custom event emitter will have the same set of event handling functions registered because there is only one such set.

To correct this just switch the extension patter to (2):

[codesyntax lang=”javascript”]

Fire A
My EE1: Go event received from A

Fire B
My EE2: Go event received from B

{ fire: [Function] }

[/codesyntax]

The output now looks correct. Only the instance specific callback function is called and the prototype does not store the event handling functions. Therefore each instance of the custom event emitter has its own set for storing event handling functions.

Bristol Government: Open Data Initiative

Bristol City Council (BCC) is now publishing some of their data sets online as part of the Open Data initiative.
This is a VERY positive move and I too hope that this leads to the development of ‘new’ solutions to the city’s problems.
More information can be found here: https://opendata.bristol.gov.uk

The Tableau Viz below uses the Quality of Life Indicators data from across Bristol. This is available from the BCC website. The data set has a set of questions (about 540) asked across the different wards in Bristol (about 35) on a yearly basis starting from 2005 till 2013. Obviously data is not available across all the dimensions, for example the question:
“% respondents who travel for shopping by bus” for the Redland ward is available only from 2006-2010.

The raw data from the Open Data website was processed using Apache Spark’s Java Libraries. This was then dumped into a data file which was imported into Tableau.

Link: Dashboard

The heat map below plots the regression slope of the survey results over the years (beta) against the Questions and Wards.
Criteria for beta calculation: minimum three years data should be available.

https://public.tableau.com/javascripts/api/viz_v1.js

Horizontal Web-app Scaling with Nginx and Node.JS

One highly touted advantage of using Node.JS is that it makes applications easy to scale. This is true to an extent especially when it comes to web-apps.

A stateless request-response mechanism lends itself to parallelisation. This is as easy as spinning up another instance of the request handling process on the same or different machine.

Where state-full request-response is required (say to maintain session information) then to scale up the ‘state’ must be shared safely across different instances of the request handling processes. This separates out the ‘functional’ aspects of the request handling mechanism from the side-effect related code.

To tie in all the different web-app instances under a single public address and to load-balance across them we need a ‘reverse-proxy’. We will use Nginx for this.

Software needed:

Nginx (v 1.7.10)
Node.JS (v 0.10.12)

First let us setup the Nginx configuration:

[codesyntax lang=”javascript”]

events {
	worker_connections 768;
}

http {

	upstream localhost {
		server 127.0.0.1:18081;
		server 127.0.0.1:18082;
		server 127.0.0.1:18083;	
}
	server {
		listen 80;
		
		location / {
			proxy_pass http://localhost;
		}
	}
}

[/codesyntax]

More info about setting up and running Nginx – http://wiki.nginx.org/CommandLine

This configuration sets up the public address as localhost:80 with three private serving instances on the same machine at port: 18081, 18082 and 18083.

Let us also create a serving process in Node.JS using the Express framework:

[codesyntax lang=”javascript”]

var express = require("express");

var app = express();

var name = process.argv[2];
var PORT = process.argv[3] || 18080;

console.log("Server online: ",name,":",PORT);

app.get("/", function(request,response)
        {
           console.log("You have been served by: ",name,"on",PORT);

           response.write("Served by :"+name+" on "+PORT);
           response.end();
        });

app.listen(PORT);

[/codesyntax]

This takes in server name and port as the arguments.

We will spin up three instances of this serving process on the same machine with the port numbers as in the Nginx config.

If we name the above as server.js then the instances can be spun up as:

node server.js <server_name> <port>

*Make sure you use the correct port (as provided in the Nginx config file).

Then just point your browser to localhost:80 and you should see:

Press refresh multiple times and you should see your request being served by different instances of web-app. Nginx by default uses ’round-robin’ load-balancing therefore you should see each of the instances being named one after the other as below (almost!).

Scaling out is as simple as spinning up a new instance and adding its IP and port to the Nginx configuration and reloading it.