Java: Getting a webpage using java.net

There is an easy way to get a web page as a HTML string. In fact there are two basic ways to do this using the basic java.net package. Why would we need this?

Some use-cases include:

  • Getting information from a web-page for text aggregation (say from a news site)
  • Creating a ‘page filter’ which pre-processes web-pages to strip out unsafe content

For both the examples we will use the basic java.net.URL and java.net.URLConnection entities in java.net package.

 

Type 1: Encapsulated Side-effects

The first example opens up the URL connection and obtains an InputStream from it. A BufferedReader wraps the InputStream from the URL which is then read into a StringBuilder. Then we return the String containing the HTML page source (if all goes well).

The code listing:

 

java.net.1

Few points to remember:

– Always use a StringBuilder instead of the concat ‘+’ operator, Strings are immutable which means every time you call ‘+’ a new String object is born. StringBuilder uses a character array.

– Always make sure to have proper exception handling and a finally block which closes the InputStream

This code encapsulates the ‘side-effect’ of reading from a remote URL and returns a String data.

 Type 2: Exposed Side-effects

The second example is very simple. It is a sub-set of the code shown in the previous section.

In this second example we simply open the URL connection and obtain an InputStream from it which we return to the calling program. The responsibility of using the InputStream to get the page source is left to the calling function. This is especially useful if you want to work directly with an InputStream instead of a String representation. One such example is when using a parser to parse the page source.

The big disadvantage of using this method is that it exposes the side-effect related code to the main application. For example if the Internet connection goes down or the server goes down while the InputStream is being read the calling application will encounter an error and therefore behave unpredictably.

java.net.2

The way to get the best of both worlds (encapsulated side-effects and providing an InputStream to a calling function) is to use Example 1 and return a String object which can then be converted into a ‘byte stream’.

Living with ActiveMQ JMS – Nice Features and Weird Errors

Apache ActiveMQ is the bad-boy of all the JMS servers out there. Firstly it is ‘free’. Secondly it is very good. For those who think free does not go with good (like I did) – well, lets just say life got a bit sweeter for them.

Auto-failover

ActiveMQ provides built-in failover handling. Failover handling is very important for any kind of JMS application. Failover handling means deciding what to do when something bad happens which is outside your control.

For example your application has subscribed to a Topic but the JMS server drops the connection or the network link goes down. Your application is left with a JMS error and a dead connection to deal with.

There are several things you might want to do in that case, such as:

– Try reconnecting after some time and keep on trying till you can reconnect

– Switch over to another JMS server (if present)

We might also need to configure things like how soon to reconnect, which URL to treat as primary (in case of switching).

As ActiveMQ provides this functionality ‘out-of-the-box’ it makes life easier. The way to implement this is very simple as well. We just add the failover settings to the Naming URI in the JMS Connection settings.

A normal JMS Naming URI looks like: tcp://hostname:port (e.g. tcp://localhost:6666)

If we wanted to use the ActiveMQ specific failover we change the JMS Naming URI as:

For a single server auto-reconnect –

failover://(tcp://hostname:port) or failover:(tcp://hostname:port)
*Try both the versions (with and without the //) and see what works in your specific case.

In this case if the connection goes down for some external reason, ActiveMQ will try and reconnect.

For a backup-server auto-switching –

failover:(tcp://primary:61616,tcp://secondary:61616) or 
failover://(tcp://primary:61616,tcp://secondary:61616)
*Again try both the versions (with and without the //) and see what works in your specific case.

In this case if the primary connections goes down then ActiveMQ will try and switch to the secondary. In case both servers are active (e.g. for load-balancing) we may want to choose randomly between the two or in case of primary-secondary we may want start with primary first and keep the secondary for failover. We can do this by using the ‘randomize’ option:

failover:(tcp://primary:61616,tcp://secondary:61616)?randomize=false

Here randomize=false means the primary URI will be tried first.

Check the URL in the reference section for more configurations.

[Further Reference: http://activemq.apache.org/failover-transport-reference.html]

Weird Errors

javax.jms.JMSException: Failed to build body from bytes. Reason: java.io.StreamCorruptedException: invalid type code: 09

If you see this error in your application when trying to call the getObject method on an ‘Object Message’ it means something has gone wrong when you tried to de-serialize the object at your end.

This error is thrown when the Java IO library does not recognize the type code its found in the object data.

This is most often caused by mismatched ActiveMQ libraries between the sender (which serializes the object) and receiver (which de-serializes it).

If you have no control on the sending application then its time to hold your head in your hands and cry. You just learned a valuable lesson – if you are using JMS to decouple your applications then ALSO use a neutral message format (like XML) instead of serialized objects.

Efficient Data Load using Java with Oracle

Getting your data from its source (where it is generated) to its destination (where it is used) can be very challenging, especially when you have performance and data-size constraints (how is that for a general statement?).

The standard Extract-Transform-Load sequence explains what is involved in any such Source -> Destination data-transfer at a high level.

We have a data-source (a file, a database, a black-box web-service) from which we need to ‘extract’ data, then we need to ‘transform’ it from source format to destination format (filtering, mapping etc.) and finally ‘load’ it into the destination (a file, a database, a black-box web-service).

In many situations, using a commercial third-party data-load tool or a data-loading component integrated with the destination  (e.g. SQL*Loader) is not a viable option. This scenario can be further complicated if the data-load task itself is a big one (say upwards of 500 million records within 24 hrs.).

One example of the above situation is when loading data into a software product using a ‘data loader’ specific to it. Such ‘customized’ data-loaders allow the decoupling of the products’ internal data schema (i.e. the ‘Transform’ and ‘Load’ steps) from the source format (i.e. the ‘Extract’ step).

The source format can then remain fixed (a good thing for the customers/end users) and the internal data schema can be changed down the line (a good thing for product developers/designers), simply by modifying the custom data-loader sitting between the product and the data source.

In this post I will describe some of the issues one can face while designing such a data-loader in Java (1.6 and upwards) for an Oracle (11g R2 and upwards) destination. This is not a comprehensive post on efficient Java or Oracle optimization. This post is based on real-world experience designing and developing such components. I am also going to assume that you have a decent ‘server’ spec’d to run a large Oracle database.

Preparing the Destination 

We prepare the Oracle destination by making sure our database is fully optimized to handle large data-sizes. Below are some of the things that you can do at database creation time:

– Make sure you use BIGFILE table-spaces for large databases. BIGFILE table-spaces provide efficient storage for large databases.

– Make sure you have large enough data-files for TEMP and SYSTEM table-space.

– Make sure the constraints, indexes and primary keys are defined properly as these can have a major impact on performance.

For further information on Oracle database optimization at creation time you can use Google (yes! Google is our friend!).

 

Working with Java and Using JDBC

This is the first step to welcoming the data into your system. We need to extract the data from the source using Java, transforming it and then using JDBC to inject it into Oracle (using the product specific schema).

There are two separate interfaces for the Java component here:

1) Between Data Source and the Java Code

2) Between the Java Code and the Data Destination (Oracle)

Between Data Source and Java Code

Let us use a CSV (comma-separated values) format data-file as the data-source. This will add a bit of variety to the example.

Using the ‘BufferedReader’ (java.io) one can easily read gigabyte size files line by line. This will work best if each line in CSV contains one data row thereby we can read-process-discard the line. Not requiring to store more than a line at a time in memory, will allow your application to have a small memory footprint.

Between the Java Code and the Destination

The second interface is where things get really interesting. Making Java work efficiently with Oracle via JDBC. Here the most important feature while inserting data into the database, that you cannot do without, is batched prepared statement. Using Prepared Statements (PS) without batching is like taking two steps forward and ten steps back. In fact using PS without batching can be worse than using normal statements. Therefore always use PSs, batch them together and execute them as a batch (using executeBatch method).

A point about the Oracle JDBC drivers, make sure the batch size is reasonable (i.e. less than 10K). This is because when using certain versions of the Oracle JDBC driver, if you create a very large batch, the batched insert can fail silently while you are left feeling pleased that you just loaded a large chunk of data in a flash. You will discover the problem only if you check the the row count in the database, after the load.

If the data-load involves sequential updates (i.e. a mix of inserts, updates and deletes) then also batching can be used without destroying the data integrity. Create separate batches for the insert, update and delete prepared statements and execute them in the following order:

  1. Insert batches
  2. Update batches
  3. Delete batches
One drawback of using batches is that if a statement in the batch fails, it fails the whole batch which makes it problematic to detect exactly which statement failed (another reason to use small batch sizes of ~ 100).
Constraints and Primary Keys
The Constraints and Primary Keys (CoPs) on the database act as the gatekeepers at the destination. The data-load program is like a truck driving into the destination with a lot of cargo (data).
CoPs can either be disabled while the data-load is carried out or they can remain on. In case we disabled them during the data-load, when re-enabling them we can have Oracle check the existing data against them or ignore existing data and only enable it for any new operations.
Whether CoPs are enabled or disabled and whether post-load validation of existing data is carried out can have a major affect on the total data-load time. We have three main options when it comes to CoPs and data loading:
  1. Obviously the quickest option, in terms of our data-load, is to drive the truck through the gates (CoPs disabled) and dump the cargo (data) at the destination, without stopping for a check at the gate or after unloading (CoPs enabled for future changes but existing data not validated). This is only possible if the contract with the data-source provider puts the full responsibility for data accuracy with the source.
  2. The slowest option will be if the truck is stopped at the gates (CoPs enabled), unloaded and each cargo item examined by the gatekeepers (all the inserts checked for CoPs violations) before being allowed inside the destination.
  3. A compromise between the two (i.e. the middle path) would be to allow the truck to drive into the destination (CoPs disabled), unload the truck and at the time of transferring the cargo to the destination, check the it (CoPs enabled after load and existing data validated).
The option chosen depends on the specific problem and the various data-integrity requirements. It might be easier to do the data file validation ‘in memory’ before an expensive data-load process is carried out and then we can use the first option.
Indexes
We need indexes and primary keys for performing updates and deletes (try a large table update or delete with and without indexes – then thank God for indexes!).
If your data load consists of only inserts and you are loading data into an empty or nearly empty table (w.r.t. amount of data being loaded), it might be a good idea to drop any indexes on it before starting the load.
This is so because as the data is inserted into a table, then any indexes on it are updated as well which takes additional time. If the table already contains a lot of data as compared to the the size of the new data being loaded then the time saved by dropping indexes will be wasted when trying to rebuild the index.
After the data is loaded we need to rebuild any dropped indexes and re-enable CoPs. Be warned that re-building indexes and re-enabling CoPs can be very time consuming and can take a lot of SYSTEM and TEMP space.
Logging
Oracle, being a ‘safe’ database, maintains a ‘redo’ log so that in case of a database failure we can perform recovery and return the database to its original state. This logging can be disabled by using the nologging option which can lead to a significant performance boost in case of inserts and index creations.
A major drawback of using nologging is that you loose the ability to ‘repeat’ any operations performed while this option is set. When using this option it is very important to take a database backup before and after the load process
Nologging is something that should be used judiciously and with a lot of planning to handle any side-effects.
Miscellaneous 
There are several other exotic techniques for improving large data loads on the Oracle side, such as partitioned tables. But these require more than ‘basic’ changes to the destination database.
Data-loading optimization for ‘big’ data is like a journey without end. I will keep updating this post as I discover new things. Please feel free share your comments and suggestions with me!

 

 

iProcess Connecting to Action Processor through Proxy and JMS – Part 2

Following the first part, I explained the problem scenario and outlined the solution, in this post I present the implementation.

We need to create the following components:

1) Local Proxy – which will be the target for the Workspace Browser instead of the Action Processor which is sitting inside the ‘fence’ and therefore not accessible over HTTP.

2) Proxy for JMS – Proxy which puts the http request in a JMS message and gets the response back to the Local Proxy which returns it to the Workspace Browser.

3) JMS Queues – Queues to act like channels through the ‘fence’.

4) Service for JMS – Service to handle the requests sent over JMS inside the ‘fence’ and to send the response back over JMS.

 

I will group the above into three ‘work-packages’:

1) JMS Queues – TIBCO EMS based queues.

2) Proxy and Service for JMS – implemented using BusinessWorks.

3) Local Proxy – implemented using JSP.

 

Creating the TIBCO EMS Queues

Using the EMS administrator create two queues:

1) iPRequestQueue – to put requests from Workspace Browser.

2) iPResponseQueue – to return response from Action Processor.

Command: create queue <queue name>

 

Proxy and Service for JMS

For both the Proxy and Service to work we will need to store the session information and refer to it when making HTTP requests to the Action Processor. To carry through the session information we use the JMS Header field: JMSCorrelationID.

Otherwise we will get a ‘There is no node context associated with this session, a Login is required.’ error. We use a Shared-Variable resource to store the session information.

Proxy for JMS:

The logic for the proxy is simple.

1) HTTP Receiver process starter listens to requests from the Workspace Browser.

2) Upon receiving a request it sends the request content to a JMS Queue sender which inserts the request in a JMS message and puts it on the iPRequestQueue.

3) Then we wait for the JMS Message on iPResponseQueue which contains the response.

4) The response data is picked up from the JMS Message and sent as response to the Workspace Browser.

5) If the returned response is a complaint about ‘a Login is required’ then remove any currently held session information in the shared variable (so that we can get a fresh session next time).

In the HTTP Receiver we will need to add two parameters ‘action’ and ‘cachecircumvention’ with ‘action’ as a required parameter. The ‘action’ parameter value will then be sent in the body of the JMS Message through the ‘fence’.

In the HTTP Response we will put the response JMS Message’s body as ascii and binary content (convert text to base64), Session information in JMSCorrelationID to Set-Cookie HTTP Header in response, Content-Type Header in response will be “application/xml;charset=utf-8”, Date can be set to the current date and Content-Length to length of the ascii content length (using string-length function).

 

Service for JMS:

The logic for the Service sitting inside the fence waiting for requests from the Proxy, over JMS, is as follows:

1) JMS Queue Receiver process starter is waiting for requests on iPRequestQueue.

2) On receiving a message it sends the request from the JMS Message body to the Action Processor using Send HTTP Request activity.

3) A Get Variable activity gets us the session information to use in the request to the Action Processor.

4) The response is then sent to a JMS Queue Sender activity which sends the response out as a JMS Message on iPResponseQueue.

5) If the session information shared variable is blank then we set the session information received in the response.

The Send HTTP Request will also have two parameters: ‘action’ and ‘cachecircumvention’ (optional). We will populate the ‘action’ parameter with the contents from the received JMS Message’s body. The session information will be fetched from the shared variable and put in the Cookie header field of the request. We will also put the contents of the JMS Message’s body in PostData field of RequestActivityInput. Make sure also to populate the Host, Port and Request URI to point to the ActionProcessor.

An example, if you Action Processor is located at: http://CoreWebServer1:8080/TIBCOActProc/ActionProcessor.servlet  [using the servlet version] then the Host = CoreWebServer1, Port=8080 and RequestURI=/TIBCOActProc/ActionProcessor.servlet. If you expect these values to change, make them into global variables.

 

Local Proxy

This wass the most important, difficult and frustrating component to create. The reason I am using a local proxy based on JSP and not implementing the functionality in the BW Proxy was given in the first part, but to repeat it here in one line: using a Local Proxy allows us to separate the ‘behavior’ of the Action Processor from the task of sending the message through the ‘fence’.

The source jsp file can be found here.

The logic for the proxy is as follows:

1) Receive the incoming request from the Workspace Browser.

2) Forward the request received from the Workspace Browser, as it is, to the BusinessWork Proxy.

3) Receive the response from the BusinessWork Proxy also get any session information from the response.

4) Process the response:

a) Trim the request and remove any newline-carriage returns.

b) Check the type of response and take appropriate action – if response has HTML then set content type in header to “text/html”, if response is an http address then redirect             response and if normal xml then set content type to “application/xml”.

c) Set session information in the header.

5) Pass on the response received from the BusinessWorks Proxy, back to the Workspace Browser.

 

Once everything is in place we need to change the Action Processor location in the Workspace Browser config.xml file. This file is located in <iProcess Workspace Browser Root>/JSXAPPS/ipc/ folder. Open up the XML file and locate the <ActionProcessor> tag. Change the ‘baseUrl’ attribute to point it to the Local Proxy. Start the BW Proxy and Service, iProcess Process Sentinels, Web-Server for the ActionProcessor and the Workspace Browser. Also test whether the Local Proxy is accessible by type out the location in a browser window.

The screenshots given below show the proxy setup in action. The Local Proxy is at: http://glyph:8080/IPR/iprocess.jsp (we would put this location in the ‘baseUrl’). We track the requests in Firefox using Firebug.

In the picture above we can see normal operation of the Workspace Browser. Above we see requests going direct to the ActionProcessor without any proxy.

 

Above we see the same login screen but this time the Workspace Browser is using a proxy. All requests are being sent to the Local Proxy.

 

Above we can the Workspace Browser showing the Work Queues and Work Items (I have blacked out the queue names and work item information on purpose). Tracking it in FireBug we see requests being sent to the Local Proxy (iprocess.jsp).

 

That’s all folks!

 

 

 

iProcess Connecting to Action Processor through Proxy and JMS – Part 1

The iProcess Workspace Browser is a web-based front-end for iProcess. The Workspace Browser is nothing but a web-application which is available in both asp and servlet versions. It does not connect directly to the iProcess engine though. It sends all requests to the iProcess Action Processor which is also a web-application (again available in a servlet and asp version). The Action Processor forwards the request (via TCP/IP) to the iProcess Objects Server which works with the iProcess Engine and processes the request. This arrangement is shown below (with both the Workspace Browser and Action Processor deployed in the same web-server).

Now this setup is fine in an ideal scenario but in most organizations web-servers are isolated (‘fenced’) from the core infrastructure (such as databases and enterprise servers). Usually the access to the core infrastructure is through a single channel (e.g. through a messaging queue server) with direct TCP/IP connections and port 80 requests from outside blocked. In that case you will need to deploy the Action Processor inside the ‘fence’ with the core infrastructure and setup a proxy system to communicate with the Workspace Browser (which will be sitting outside the ‘fence’). The proxy system will transport the HTTP request over the allowed channel (JMS in this example) and return the response. An example is shown below using JMS.

 To implement the above we need to create the following components:

1) Local Proxy – which will be the target for the Workspace Browser instead of the Action Processor which is sitting inside the ‘fence’ and therefore not accessible over HTTP.

2) Proxy for JMS – Proxy which puts the http request in a JMS message and gets the response back to the Local Proxy which returns it to the Workspace Browser.

3) JMS Queues – Queues to act like channels through the ‘fence’.

4) Service for JMS – Service to handle the requests sent over JMS inside the ‘fence’ and to send the response back over JMS.

You might ask why do we need a local proxy and why not call the BW Proxy directly. The reason is very simple. The BW Proxy and Service should be as uncluttered as possible, ideally their only task is to carry the request through the ‘fence’ and bring out the response. Any processing of the request and response should be done somewhere else (and as we shall see in the example there is a lot of processing required).

As we don’t want to fiddle with the internals of the iProcess Workspace Browser, we simply add a Local Proxy which does the processing of the request and response. Then we set the Workspace Browser to send all Action Processor requests to the Local Proxy. This means that the Local Proxy will ‘behave’ exactly like the Action Processor as far as the Workspace Browser is concerned.

To put it in one line: using a Local Proxy allows us to separate the ‘behavior’ of the Action Processor from the task of sending the message through the ‘fence’.

 

In the example to follow, we have:

1) JSP based Local Proxy (easy to code – no compiling required!).

2) BusinessWorks based Proxy for JMS

3) TIBCO EMS Server based queues.

4) BusinessWorks based Service for JMS

 

In the next part, the example!

 

Deploying to a Business Work Service Container

There are three ‘locations’ or ‘containers’ that a Business Work EAR can be deployed to. These are:

1) Business Work Standalone Service Engine

2) Business Work Service Engine Implementation Type (BWSE-IT) within an ActiveMatrix Node

3) Business Work Service Container (BW-SC)

The first two scenarios do not require any special effort during deployment and usually can be done through the admin interfaces (bw-admin for standalone and amx-admin for BWSE-IT). But if one wishes to deploy an EAR to a Service Container then we need to setup the container and make a change in the Process Archive. This tutorial is for a Windows-based system.

Before we get into all that let us figure out what a BW Service Container (BW-SC) and why one would want to use it.

A BW-SC is a virtual machine which can host multiple processes and services within individual process engines. Each EAR deployed to a BW-SC gets its own process engine. The number of such process engines that can be hosted by a container depends on the running processes and the deployment configurations. To give an analogy, the load that an electric supply (service container) can take depends on not just the number of devices (i.e. process engines) on it but also how electricity each device requires (processes running within each engine).

Keeping in mind the above, when using BW-SC, it becomes even more important to have proper grouping of processes and services within an EAR.

The standard scenario when you would use a BW-SC is for fault-tolerance and load-balancing. In other words, to deploy the same service (fault-tolerance) and required backend processes (load balancing) on multiple containers.  Also Service Containers can be used to group related services together to create a fire-break for a failure-cascade.

The first step to deploying to a BW-SC is to enable the hosting of process engines in a container. The change has to be made in the bwengine.xml file found in the bw/<version>/bin directory. Locate the following entry (or add it if you cannot find it):

<property>
<name>BW Service Container</name>
<option>bw.container.service</option>
<default></default>
<description>Enables BW engine to be hosted within a container</description>
</property>

The second step  is to start a service container to which we can deploy our EARs. Go to the command line and drill down to the  bw/<version>/bin directory. There run the following command:

bwcontainer –deploy <Container Name>

Here the <Container Name> value, supplied by you, will uniquely identify the container when deploying EARs. Make sure  that the container name is recorded properly. In the image below you can see an example of starting a container called Tibco_C1.

Starting Container

 

The third step is to deploy our application to the container (Tibco_C1). Log in to the BusinessWork Administrator and upload the application EAR. In the image below the test application EAR has been uploaded and awaits deployment.

EAR Uploaded

The fourth step is to point the process archive towards the container we want to deploy to. Click on the Process Archive.par and select the ‘Advanced’ tab. Go down the variable list and locate the bw.container.service variable which should be blank if you are already not deploying to a container.

Property Set

Type the container name EXACTLY as it was defined during startup. TIBCO will NOT validate the container name so if you set the wrong name you will NOT get a warning, you will just be left scratching your head as to why it didn’t work. In our example we enter ‘Tibco_C1’ in the box (see below).

  Property Defined

 Save the variable value and click on Deploy. Once the application has been deployed, start the service instance. That is it.

To verify that your application is running on the container, once the service instances enter the ‘Running’ state, go back to the command line and the bin directory containing bwcontainer.exe. There execute the following:

bwcontainer –list

This command will list the process engines running in any active containers on the local machine. The output from our example can be seen below.

Command Line Listing

We can see the process archive we just deployed, running in the Tibco_C1 container.

If you have any other containers they will also show up in the output.

Remember one important point: If a service container goes down, all the deployed applications also go down. These applications have to be re-started manually through the Administrator, after the container has been re-started.

 

Continuing with Java

The continue keyword in the Java programming language provides functionality to skip the current iteration of a loop. Unlike the break keyword which ‘break’ the execution flow out of the loop, continue allows us to skip an iteration. This functionality is often replicated using an if block as below:

for(int i=0;i<100;i++) {
     if(i%2==0){
          // Execute the loop only for even values of i otherwise skip the iteration
      }
}

The above piece of code can be re-written using the continue keyword:

for(int i=0;i<100;i++) {
     if(i%2!=0){ 
          continue; //Skip the loop if i is odd
      }

     //Loop continues as normal if i is even
}

 

While this simple example might not bring out the power of the humble continue keyword, imagine if there were several conditions under which the iteration should be suspended. That would mean having a complicated set of nested if-else blocks to check the flow of execution within the iteration. Whereas with continue we just need to check the condition for skipping the iteration. There is no need for the else part.

Another advantage of using continue  is that it allows the control to be transferred to a labelled statement, as we shall see in a bit.

Before going further let me state the two different ways continue can be used:

Form 1:   continue;

Form 2:   continue <label>;

Example:

The scenario: Imagine you need to extract keywords from a text source. This is a multi-step process with the first step being converting the text into tokens (i.e. breaking down the paragraphs and sentences to individual words based on some rules). Then the set of tokens is filtered and common usage words, such as ‘the’, are removed (stop word removal). Following this you may again want to filter the set using a domain specific set of common use words or you may want to ignore the text source all-together (i.e. not process it) if it has certain words in it. For example if you are interested only in processing articles about Apple products you would want to ignore articles which talk about the fruit.

As you might have guessed, the above process requires several iterations over different sets of data. This is where a continue statement would be most effective. The code given below shows how to use continue. Obviously there are several other ways of implementing the desired functionality (including not using continue).

// Stop and ignore word lists

static String stopWordList[] = { “at”, “in”, “the”, “and”, “if”, “of”,“am”, “who” };

static String ignoreWordList[] = { “king”, “queen” };

——————————

// Sample strings

String samples[] = {“I am the king of the world!”, “For I am the Red Queen”, “A man who wasn’t there” };

outer: for (String sample : samples) {

    System.out.println(“Original String: “ + sample + “\n”);

    // Create tokens

    String[] tokens = tokenize(sample);

    System.out.println(“Unfiltered Tokens:”);

    printToken(tokens);

    // Filter tokens on stop words

   ArrayList<String> filteredTokens = new ArrayList<String>();

    for (String token : tokens) {

        if (filterStopWord(token)) {

           continue; // —————- (1)

       }

        if (filterIgnoreWord(token)) {

        System.out.println(“Ignore – “ + sample + “\n\n”);

          continue outer; // ———- (2)

       }

       filteredTokens.add(token); // — (3)

    }

    // Print filtered tokens
   System.out.println(“Filtered Tokens:”);

   printToken(filteredTokens);

   System.out.println(“\n”);
}// End of outer: for

The full .java file can be found here (right click and save-as).

The logic flow is as follows:

1) Initialise a set of samples (can be any source – taken simple sentences for this example).

2) The for loop labelled ‘outer’ iterates through the set of samples.

3) Create unfiltered token set from the sample.

4) Print the token set (unfiltered).

5) Initialise array list to store the filtered set of tokens.

6) Iterate through the unfiltered token set to filter out tokens.

7) Within the iteration if the current token is a ‘stop word’ then skip the inner loop (using continue – line (1)) as it should not be added to filtered set.

8) If the current token is not a ‘stop word’ then the current iteration will continue as normal.

9) Next we check if the token is on the ‘ignore’ list, if it is then we stop processing the sample and skip the iteration of the outer for loop (using labelled continue – line (2)).

10) If the token is not on the ignore list then we continue with the current iteration and add the token to the filtered set.

If we run the above program with the required methods in place we will see the following output:

Original String: I am the king of the world!
Unfiltered Tokens:
[I]  [am]  [the]  [king]  [of]  [the]  [world!]
Ignore - I am the king of the world!

Original String: For I am the Red Queen
Unfiltered Tokens:
[For]  [I]  [am]  [the]  [Red]  [Queen]
Ignore - For I am the Red Queen

Original String: A man who wasn't there
Unfiltered Tokens:
[A]  [man]  [who]  [wasn't]  [there]
Filtered Tokens:
[A]  [man]  [wasn't]  [there]

As per the logic, the first two samples are ignored and the third one is processed.

Changing TIBCO Messaging Encoding from ISO8859-1 to UTF-8

If your project has messaging encoding set to UTF-8 but your TIBCO Administrator is set to ISO8859-1 then you will not be able to deploy your project.
The TIBCO recommended encoding to use is UTF-8. But in case you made the mistake of not using UTF-8 (either in your project or in TIBCO Administrator) then you will need to change the encoding.

To change project encoding:
– Click on the root project folder.
– In the configuration panel you will see three tabs [Configuration, Project Settings and Design Time Libraries].
– Click on Project Settings where you will find a drop down titled ‘TIBCO Messagin Encoding’ with two values: ISO8859-1 and UTF-8.
– Select UTF-8 and click Apply and then save your project.

To change it for the TIBCO Administrator:
– Locate the tibcoadmin_.tra file. It should be in: \administrator\domain\\bin
– Open this file in a text-editor and locate the entry ‘tibcoadmin.client.encoding’ and ‘repo.encoding’.
– Change them both to UTF-8 if they are not UTF-8.
– Save the file.
– Restart both the Hawk Agent and the TIBCO Administrator service on the machine.

If the Administrator fails to deploy a UTF-8 encoded project after making the changes above and in the log you see a ‘com.tibco.infra.repository.OperationFailedException: Can not change encoding from ISO8859-1 to UTF-8 because of other existing connection with different encoding’ exception try and restart your Hawk Agent and any other TIBCO related service.

Developing SOAP over JMS Web-Services using TIBCO BusinessWorks and Designer

Version of Designer: 5.5.2.2

Version of TIBCO EMS: 4.4.3

Some time ago I did a post about developing web-services using TIBCO BusinessWorks. In this post I would like to discuss how to develop a web-service which uses JMS as the SOAP transport instead of HTTP. The problem with developing a web-service bound to a JMS Queue instead of an HTTP transport, is that it can be used only in a homogeneous TIBCO environment.  In other words we need to have TIBCO at both (client and server) ends if we are using a web-service bound to a JMS Queue.

This is so because the WSDL representation of the binding is proprietary to TIBCO (more on this later) as there is no agreed standard for binding SOAP to JMS. Although when I was digging around I did find a ‘working draft’ at W3.org for SOAP over JMS  (http://www.w3.org/TR/soapjms/) so something is being done to plug this gap!

Why all this hassle for SOAP over JMS you ask? Why not stick with good old SOAP over HTTP? Well simply because JMS transport is whole lot more robust and can be scaled up easily without affecting QoS etc.

Introducing the Example

The web-service we are going to create in this example is a relatively simple one. It will take in two integers and return their sum. A fairly simple example but this post is about using SOAP over JMS so that is what we will concentrate on.

The schema for the request and response messages is given below:

<?xml version="1.0" encoding="UTF-8"?>

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
	 xmlns="http://www.tibco.com/schemas/WebServiceTest/Schema/Schema.xsd"
	 targetNamespace="http://www.tibco.com/schemas/WebServiceTest/Schema/Schema.xsd"
	 elementFormDefault="qualified"
	 attributeFormDefault="unqualified">
	<xs:element name="add">                   <!-- Request - two integers a and b to be added -->
		<xs:complexType>
			<xs:sequence>
				<xs:element name="a" type="xs:int"/>
				<xs:element name="b" type="xs:int"/>
			</xs:sequence>
		</xs:complexType>
	</xs:element>
	<xs:element name="result" type="xs:int"/>  <!-- Response - the sum of a and b -->
</xs:schema>
Now the next thing we need to do is to setup the Service Resource. I won't go into the details of how it is done as I have already covered most of the steps in a different post

The reason I don’t need to go into details is because web-services are designed to decouple the operation from the ways of accessing that operation (i.e. the binding). Obviously as binding = message format + transport AND each operation can have different bindings, the only thing that will be different when setting up the Service Resource will be the Binding Section. Furthermore as we are still using SOAP as the message format the only difference that you will see in the Service Resource, as compared to SOAP over HTTP configuration, will be in the Transport sub-tab (see image below).

 

Service Resource SOAP over JMS



In the Transport sub-tab, if instead of selecting a HTTP connection, a JMS connection is selected in the Transport box (see image above), then you will get options to setup the JMS transport.

Setting up the JMS Transport

Setting up the Transport in case of JMS is bit more involved than HTTP. For the sake of clarity we will use Queues for our web-service instead of Topics. There are four main things to setup once you have selected a JMS connection in the Transport box. These settings are similar to those in the JMS activities such as JMS Queue Sender.

1) JMS Destination – the queue or topic which will contain the JMS message carrying the SOAP as payload.

2) JMS Destination Type – Queue or Topic (depending on what kind of interaction is required).

3) JMS Message Type – Text or Bytes message – we go for Text in the example so that we can examine the SOAP message being sent over the EMS.

4) Acknowledgement Mode – Auto for the example otherwise all the standard and TIBCO EMS specific options are available for selection.

If you select ‘Topic’ as the JMS Destination Type then you can also decide which of the Operations have a ‘durable subscription’.

That is the only difference in changing from SOAP over HTTP to SOAP over JMS as far as the Service Resource is concerned.

Looking at the WSDL

Once everything is setup navigate to the WSDL Source tab in the Service Resource configuration to look at the WSDL which has been generated for the web-service.

<?xml version="1.0" encoding="UTF-8"?>
<!--Created by TIBCO WSDL-->
<wsdl:definitions xmlns:wsdl="http://schemas.xmlsoap.org/wsdl/" xmlns:tns="http://xmlns.example.com/1301947961037" xmlns:soap="http://schemas.xmlsoap.org/wsdl/soap12/" xmlns:jms="http://www.tibco.com/namespaces/ws/2004/soap/binding/JMS" xmlns:jndi="http://www.tibco.com/namespaces/ws/2004/soap/apis/jndi" xmlns:ns0="http://www.tibco.com/schemas/WebServiceTest/Schema/Schema.xsd" name="Untitled" targetNamespace="http://xmlns.example.com/1301947961037">
    <wsdl:types>
        <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns="http://www.tibco.com/schemas/WebServiceTest/Schema/Schema.xsd" targetNamespace="http://www.tibco.com/schemas/WebServiceTest/Schema/Schema.xsd" elementFormDefault="qualified" attributeFormDefault="unqualified">
            <xs:element name="add">
                <xs:complexType>
                    <xs:sequence>
                        <xs:element name="a" type="xs:int"/>
                        <xs:element name="b" type="xs:int"/>
                    </xs:sequence>
                </xs:complexType>
            </xs:element>
            <xs:element name="result" type="xs:int"/>
        </xs:schema>
    </wsdl:types>
    <wsdl:service name="JMSAddService">
        <wsdl:port name="AddPortEndpoint1" binding="tns:AddPortEndpoint1Binding">
            <soap:address location=""/>
            <jndi:context>
                <jndi:property name="java.naming.provider.url" type="java.lang.String">tibjmsnaming://localhost:7222</jndi:property>
                <jndi:property name="java.naming.factory.initial" type="java.lang.String">com.tibco.tibjms.naming.TibjmsInitialContextFactory</jndi:property>
            </jndi:context>
            <jms:connectionFactory>QueueConnectionFactory</jms:connectionFactory>
            <jms:targetAddress destination="queue">inQueue</jms:targetAddress>
        </wsdl:port>
    </wsdl:service>
    <wsdl:portType name="AddPort">
        <wsdl:operation name="AddOperation">
            <wsdl:input message="tns:InMessage"/>
            <wsdl:output message="tns:OutMessage"/>
        </wsdl:operation>
    </wsdl:portType>
    <wsdl:binding name="AddPortEndpoint1Binding" type="tns:AddPort">
        <soap:binding style="document" transport="http://www.tibco.com/namespaces/ws/2004/soap/binding/JMS"/>
        <jms:binding messageFormat="Text"/>
        <wsdl:operation name="AddOperation">
            <soap:operation style="document" soapAction="/Connections/JMSAddService.serviceagent/AddPortEndpoint2/AddOperation" soapActionRequired="true"/>
            <wsdl:input>
                <soap:body use="literal" parts="part1"/>
            </wsdl:input>
            <wsdl:output>
                <soap:body use="literal" parts="part1"/>
            </wsdl:output>
        </wsdl:operation>
    </wsdl:binding>
    <wsdl:message name="InMessage">
        <wsdl:part name="part1" element="ns0:add"/>
    </wsdl:message>
    <wsdl:message name="OutMessage">
        <wsdl:part name="part1" element="ns0:result"/>
    </wsdl:message>
</wsdl:definitions>

Let get back to the issue of lack of standards for SOAP over JMS and why we need TIBCO at both ends.

For that we need to focus down into the Binding and Service elements of the WSDL.

Looking at the Service element (see below), where the method of connecting to the web-service is defined. We find that it contains information about the EMS server (from the Connection resource we set in the Transport box) as well as the queue name we set in the Transport sub-tab.

  <wsdl:service name="JMSAddService">
        <wsdl:port name="AddPortEndpoint1" binding="tns:AddPortEndpoint1Binding">
            <soap:address location=""/>
            <jndi:context>
                <jndi:property name="java.naming.provider.url" type="java.lang.String">tibjmsnaming://localhost:7222</jndi:property>
                <jndi:property name="java.naming.factory.initial" type="java.lang.String">com.tibco.tibjms.naming.TibjmsInitialContextFactory</jndi:property>
            </jndi:context>
            <jms:connectionFactory>QueueConnectionFactory</jms:connectionFactory>
            <jms:targetAddress destination="queue">inQueue</jms:targetAddress>
        </wsdl:port>
    </wsdl:service>
We also find two strange new namespaces being used - jms and jndi. Let us see what these namespace prefixes stand for. Scroll right up to the top of the WSDL and you will see the following entries:
xmlns:jms="http://www.tibco.com/namespaces/ws/2004/soap/binding/JMS" 
xmlns:jndi="http://www.tibco.com/namespaces/ws/2004/soap/apis/jndi" 
These two namespaces have been defined by TIBCO so they are internal and are not 'standardized' as are other namespaces in the WSDL such as xs ( xmlns:xs="http://www.w3.org/2001/XMLSchema") for the schema in Types  or soap (xmlns:soap="http://schemas.xmlsoap.org/wsdl/soap12/") for SOAP related properties in Binding.
Thus if you are a non-TIBCO client you will have no idea what jms:targetAddress element means in the WSDL. 
Once there is a standard for SOAP over JMS then instead of TIBCO specific namespaces we will see a prefix like soapjms with the definition xmlns:soapjms = "http://www.w3.org/2010/soapjms/" [1] .
Next we look at the Binding element (see below). Here also we find the TIBCO specific JMS namespace as well as SOAP over JMS transport definition (in bold).

<wsdl:binding name="AddPortEndpoint1Binding" type="tns:AddPort">
        <soap:binding style="document" transport="http://www.tibco.com/namespaces/ws/2004/soap/binding/JMS"/>
    <jms:binding messageFormat="Text"/>
        <wsdl:operation name="AddOperation">
            <soap:operation style="document" soapAction="/Connections/JMSAddService.serviceagent/AddPortEndpoint2/AddOperation" soapActionRequired="true"/>
            <wsdl:input>
                <soap:body use="literal" parts="part1"/>
            </wsdl:input>
            <wsdl:output>
                <soap:body use="literal" parts="part1"/>
            </wsdl:output>
        </wsdl:operation>
    </wsdl:binding>

Again once we have a standardized way of binding SOAP to JMS then instead of the TIBCO specific listing in transport attribute we will have something like http://www.w3.org/2010/soapjms/&#8221;[1].

If we compare the Service and Binding elements above to those in the same web-service but using HTTP instead of JMS we that all namespaces being used to define the connection and binding properties are standardized. That is what makes SOAP over HTTP web-services independent of vendors and implementation languages.

Next we test the web-service. Make sure you save the WSDL Source (i.e. the concrete WSDL) so that our test client can use it.

Testing

To test the web-service we will create a client using BusinessWorks. We will use a SOAP Request Reply activity to test the web-service. The images below show how to configure the activity to access the web-service.

SOAP Request Reply JMS Config Main Pane

In the configuration simply select the namespace from the concrete WSDL file we saved for the client. As we are using TIBCO to create the client once you set the WSDL everything will be auto-populated. Go to the Transport Details tab (see below) and there you will see the JNDI and JMS sub-tabs which have also been auto-populated from the WSDL. This is so because TIBCO understands the jms and jndi namespaces and knows what to do with the information in the WSDL.

JNDI Sub-tab:

JNDI Sub-tab in Transport Details

 

JMS Sub-tab:
JMS Sub-tab in Transport Details

After loading the WSDL and saving the changes the SOAP Request Reply activity will ask you for an input (the two integers to be added).

Test JMS Input

Save everything and load the relevant processes. On starting the test you should see the Request being fired. If you monitor the relevant queue you will see a message being posted on the queue. The message will be consumed by the web-service and it will return the result back to the queue which in turn will be consumed by the client and you will see the output in the process. As we provided ‘3’ and ‘4’ as the two integers to be added in the input (see image above) the result we get is ‘7’ (see below).

 

<?xml version = "1.0" encoding = "UTF-8"?>
<outputMessage>
	<ns0:result xmlns:SOAP-ENV = "http://www.w3.org/2003/05/soap-envelope" xmlns:ns0 = "http://www.tibco.com/schemas/WebServiceTest/Schema/Schema.xsd">7</ns0:result>
</outputMessage>
If you want to take a look at the actual messages being sent in the JMS Message you can always stop the server before sending the request or after sending the request stop the client. The request or response message will remain in the queue and you can view the content (as we are using JMS Message Type of Text) by browsing the queue.
That is the end of the tutorial. Let me know if I have made any mistakes or if you have any suggestions.
Thank you for reading!