Importing data - working with input sources
| Note Before you begin working through this article, please ensure you have:
|
Purpose
To use a knowledge model to process data imported from an input source.
Pre-requisites
In order to complete this exercise, it is necessary to have worked through the Starting the Erudine Behaviour Engine exercise so that it is possible to deploy a knowledge model for querying. If it is not possible to work through the exercise, the completed version of the exercise can be found in the TrainingComplete folder located in the C:\Program Files\Erudine\Training\Development folder. It is also advised that the Processing data exercise is completed before beginning this exercise. While it is not strictly necessary, this previous exercise introduced concepts that will be built upon in this exercise.
Theory
In this exercise, you will learn how to use an input source to process large quantities of data through a knowledge model. Input sources may take various forms. For instance, an input source may be an XML file, a database or a JMS queue. The only thing that input sources are guaranteed to have in common is that they serve up situations for processing by a knowledge model. This exercise will use a simple knowledge model that compares a given number with five and classifies it as either 'greater', 'lesser' or 'equal'. An input source will be created that reads numbers from a file containing comma separated variables (a CSV file). This input source will be used to process the contents of the CSV file and process the results.
Actions
Examining the existing knowledge mode
This model uses the CompareWithNumberFive knowledge model that has been used in the exercise Processing data.
Writing code to create an input source
To create code that creates an input source:
- Open Eclipse.
- Expand the src/java folder.
- Expand the exercise.importingData package.
- Open the CompareNumberWithFiveCsvProcessor class.
This java file contains two classes: the public CompareNumberWithFiveCsvProcessor class and the default access CsvInputSource class. The first uses the second to process data from CSV files through the knowledge model. Before looking at the way in which the input source is used, you will implement the input source and so will concentrate initially on the CsvInputSource class.
This class contains some pre-completed methods so that this exercise can focus on the steps involved in creating an input source.
The method createSituations is used to parse a CSV file, extract the numbers and use those numbers to build up a list of situations for the input source.
The method createSituation is used to create an instance of the Situation class, where each instance has an id unique within the input source and a conceptual graph containing a single concept with type 'Number' and a literal value comprising the given integer.
The getSituations method allows the input source's list of situations to be retrieved. This would typically be used for retrieving situation ids.
The getNextSituationId method is responsible for allocating unique ids to the situations. - Delete the following lines from the hasNext and next methods of the CsvInputSource class:
// TODO - Remove the line below when starting this exercise return null;
These lines were included in the supplied code to avoid compiler errors.
- Collapse the src/java folder and expand the resources folder.
- Open the CsvInputSource_HasNext.java file.
- Copy the following text from the file into the hasNext method of the CsvInputSource class:
return processedSituationsIndex < situations.size();Any input source must implement the IInputSource interface. This interface requires implementing classes to override the hasNext , next and remove methods. These methods are inherited from the standard Java interface Iterator and their implementation must satisfy the same contract.
The hasNext method should return true if an input source has any more elements. In this instance, the processedSituationIndex is used to keep track of which situations have been processed. If this index is greater than or equal to the number of situations in the input source, then all the situations have been processed. Therefore, the method returns a boolean value determined by comparing the number of situations processed with the total number of situations. - Open the CsvInputSource_Next.java file.
- Copy the following text from the file into the next method of the CsvInputSource class:
if ( hasNext() ) { int endRange; if ( processedSituationsIndex + BATCH_SIZE > situations.size() ) { endRange = situations.size(); } else {endRange = situations.size() + BATCH_SIZE;} List<Situation> nextBatch = situations.subList( processedSituationsIndex, endRange ); processedSituationsIndex = endRange + 1; return nextBatch; } throw new NoSuchElementException();
The first thing that occurs is a check that there are situations yet to be processed. If not a NoSuchElementException is thrown in accordance with the Iterator contract. Assuming that there are situations remaining, a batch is created from the remaining situations. This batch is the next BATCH_SIZE number of situations that have not been processed (or all remaining situations if there are not that many). The processedSituationIndex is then updated to indicate which situations have been processed and the batch of situations is returned.

Note
The remove method has been intentionally left empty, as it will not be required for the input source.
Writing code to process data using a knowledge model and an input source
To create code that processes data using a knowledge model and an input source:
- Examine the CompareNumberWithFiveCsvProcessor class.
This class will use the CsvInputSource class to process data from a CSV file with the knowledge model. This class contains some pre-completed methods so that this exercise can focus on the steps involved in processing data with the knowledge model.
The constructor CompareNumberWithFiveCommand uses the StartEbeCommand class to create an instance of the Behaviour Execution Engine and deploy the CompareNumberWithFive knowledge model. In this exercise, the Behaviour Execution Engine will be used to push data through the knowledge model. - Delete the following lines from the processData method of the CompareNumberWithFiveCsvProcessor class:
// TODO - Remove the line below when starting this exercise return null;
These lines were included in the supplied code to avoid compiler errors.
- Open the CompareNumberWithFiveCsvProcessor_ProcessData.java file.
- Copy the following lines from the file into the processData method of the CompareNumberWithFiveCsvProcessor class:
Map<String, String> results = new HashMap<String, String>(); IExecutor executor = knowledgeModel.getExecutor(); CsvInputSource csvInputSource = new CsvInputSource( knowledgeModel, csvFile );
The first line creates a new Map in which the results of processing the given CSV file will be stored. Each entry will map a position in the CSV file to the result obtained. For example if the CSV file contained the following string:
3,5,456
Then the map should contain the following results:
0 -> Lesser
1 -> Equal
2 -> Greater
The second line fetches the executor from the knowledge model. The executor is the object that is responsible for pushing data through a knowledge model. The third line creates an instance of the CsvInputSource that you have just created. - Copy the following line from the file into the processData method of the CompareNumberWithFiveCsvProcessor class:
IExecutionResult executionResults = executor.executeSynchronously( csvInputSource, INPUT_NODE );
This line uses the executor to process all of the situations available from the input source. The executor allows data to be processed either synchronously or asynchronously. That is, it will either block and wait for a result or simply start the processing and continue. In this case, the results from the processing are required, so the data must be processed synchronously. The INPUT_NODE is the node in the knowledge model where the data should start to be processed, in this case, the Input node.
The executor calls the hasNext and next methods of the CsvInputSource to fetch batches of situations and process them through the knowledge model until no unprocessed situations remain and the hasNext method returns false.
Note
The results obtained from the executor are not in the format that you require. In fact, the IExecutionResult returned will rarely provide the results exactly as needed and it will be necessary to process the results acquired. - Copy the following lines from the file into the processData method of the CompareNumberWithFiveCsvProcessor class:
for ( Situation situation : csvInputSource.getSituations() ) { String situationId = situation.getId(); String resultNode = executionResults.getSituationResults( situationId ) .getResults().keySet().iterator().next(); results.put( situationId, resultNode ); } return results;
The first step in processing the results is to obtain the original list of situations from the input source. These situations can then be correlated with their result. Every situation has an id. In this case, the id corresponds to the position of the situation's value in the original CSV file. This id can be used to find the results for a specific situation.
The results returned by the executor for each situation provide a lot of detail. In this instance, we are only interested in which node the situation ended up in. In other circumstances, for example, you might wish to see the conceptual graph contained in the final node. As we know that a situation can only result in a single node, the first node can be selected as the result node.
The id and the result node are placed in the results map for each situation. The results map is returned when all situations have been processed. - Press Ctrl + Shift + O to organise the imports.
Testing the code that processes data using a knowledge model and an input source
Once the code has been created, it is necessary to test it. The most obvious way to test the code is to compare the results achieved from using your class with those you would expect by running the same data through the knowledge model using the Behaviour Authoring Environment. This process can be mechanised using JUnit tests.
To test the code that queries a knowledge model:
- Examine the files CompareNumberWithFiveTest1.csv , CompareNumberWithFiveTest2.csv and CompareNumberWithFiveTest3.csv .
These files will form the inputs for testing the CSV input source. In this exercise we are not interested in writing a fully functional CSV parser, so the focus of the tests will be that the classes will be able to process the values correctly rather than ensuring that every type of CSV file can be handled correctly.
Note
While the first two CSV files contain valid data, the third CSV file contains a number of invalid pieces of data. For example, character and zero-length values. - Collapse the resources folder and expand the src/unit folder.
- Expand the exercise.importingData package.
- Open the CompareNumberWithFiveCsvProcessorTest class to display its contents.
- Right click over the Java code and select Run As > JUnit Test from the context menu.
This runs the tests contained in CompareNumberWithFiveCsvProcessorTest class against the CompareNumberWithFiveCsvProcessor class that has been created in this exercise.
If the tests are successful, confirming that your function has been correctly implemented, the JUnit bar in the top left of the screen will go green. - Examine the testProcessDataWithFirstFile and testProcessDataWithSecondFile methods.
This method tests that the processData method returns the same values that would be expected when processing the data contained in the CSV files using the knowledge model through the Behaviour Authoring Environment. The tests check that the comparison works for a range of integers including negative numbers and zero as well as those greater than, less than and equal to five. - Examine the testProcessDataWithInvalidFile method.
This method tests that the processData method is able to handle the unexpected values contained in the third CSV file. The CompareNumberWithFiveTest3.csv file contains values that are strings, values containing special characters and zero-length values. The test checks that when these invalid values are provided, the situations are left in the Decision node and the knowledge model does not attempt to categorise them incorrectly. The test also checks that valid values interleaved with the invalid ones are handled correctly.
The decision to leave invalid situations in the Decision node was made purely for simplicity. The invalid nodes could just as easily have been placed in an Error node should the behaviour have been introduced into the knowledge model. You may like to implement and test such behaviour. - Close Eclipse.
Platform: all
EBE Version: 2.4
Category: Development Training Guide
Author: Patrick Peisker