"How to create TOS components" Tutorial : Part 8

Component Creation - Part 8

We previously created an input component and we learnt how to deal with metadata.
We saw how to send records to the outgoing connections and how to set values for the various fields.

This time we will focus on reading from an incoming connection at the beginning and finally we will expand the exercise by applying some simple transofrmation to the incoming data (in the next lesson) and sending it out to the outgoing connections.

As we already did before, copy your tTutorial1 folder with its content into tTutorial8 (it will initially be an output component, then we will transform it into a processing component, so we will use a "generic" name, "8" is the progressive lesson number within the tutorial).
Rename all the files as we did in the previous lessons, so that they belong to the brand new tTutorial8 component, finally refresh the component_project in the designer and install to the palette, just to make sure everything works.

Clean up the XML descriptor, add the return par NB_LINE as we did in our last component and the relative java code in the begin and end section.
If you want you can download here the starting point, just to make sure we start exactly with the same code.

Output components

Components that receive data (excluding those that multiply records, not in the scope of this lesson as there is a specific methodology for them) do not have a cycle between the begin and the end section because the record cycle is handled already by the startable component which is at the beginning of the subjob.
If this is unclear, you can probably go back to part 2 of this tutorial and review how the different components interact when linked together in a subjob.

    A components that needs to read data from an incoming connection (Output or processing component) cannot be set as startable

So, let's modify the XML descriptor accordingly, in the header section we need to set STARTABLE="false".
Second thing, assuming we want to configure the component ot be an OUTPUT component only we need to specify which connections are possible, this is also done in the XML descriptor, in the CONNECTORS section.
Our component will read some data from a connection and it will outoput it to a text file in a xml format (cool, no?), so it must have one (and only one) input conenction (to receive data) and zero output connections.

<?xml version="1.0" encoding="UTF-8"?>

Do we need a SCHEMA_TYPE parameter? No, we don't, as a matter of fact our component is not going to have any associated metadata.
This may sound confusing to you, but if you think, the only metadata involved in the process here is the one defined in the incoming connection which is the metadata of the component sending us the records.
To start, let's plan what we want to achieve :
The component will create a text xml file whose root element will be <TABLE> ,this element will contain two elements : <METADATA> and <DATA> .
Obviously "metadata" will contain a list of fields and data a list of records.
We can assume all records and all fields will be processed, no trasformations performed, column names and type will be those defined in the incoming connection.
Does it sound reasonable? I think it should be ok for the exercise.

I assume (and hope) you know at this point how to read the outputFile variable in the template of the begin javajet section and how to assign a nice label in the messages.properties file, so, go ahead and do what's needed, I'll wait you here :).

int nb_line_<%=cid %> = 0;
java.io.FileWriter fw_<%=cid %> = new java.io.FileWriter( <%=outputFile %> );
java.io.PrintWriter pw_<%=cid %> = new java.io.PrintWriter( fw_<%=cid %> );
pw_<%=cid %>.println("<?xml version=\"1.0\" encoding=\"UTF-8\"?>"); // default xml header
pw_<%=cid %>.println("<TABLE>"); // root element
pw_<%=cid %>.println(" <METADATA>");
pw_<%=cid %>.println(" </METADATA>");
pw_<%=cid %>.println(" <DATA>");
in the end section
pw_<%=cid %>.println(" </DATA>");
pw_<%=cid %>.println("</TABLE>");
pw_<%=cid %>.close();
globalMap.put("<%=cid %>_NB_LINE",nb_line_<%=cid %>);

The code above updates the begin section creating and opening an output text file, plus it also writes a line to the file with the default xml header, finally in the end section we close the file.
I also added the XML sections we described before, as a skeleton for our output file.
Just to make sure we did not do any typo, let's install the component and test it (you need to specify an output file before running the job).

Noticed anything strange? If everything went well, you did not get any error message, however your file was NOT created.
Why? Remember that our componet is defined as STARTABLE="false"? So, add a tRowGenerator in your job, define a simple schema for it and connect its row output to tTutorial8, then run again the job. This time your file should be there, containing the xml header.

Ok, next step is to write the METADATA block, we need to do it once, before outputting the records, so the best place to do it is the begin section, right after outputting the header and the skeleton.

To retrieve the metadata of the incoming connection we need to define a "IMetadataTable preMetadata" variable in the jet code and then we will set it by getting the metadata of the first (and unique since we specified MAX_INPUT="1" in the FLOW connector) incoming connection :

IMetadataTable preMetadata = null;
if ((node.getIncomingConnections()!=null)&&(node.getIncomingConnections().size()>0))
 preMetadata = node.getIncomingConnections().get(0).getMetadataTable();
// ...

Place this code between the METADATA and /METADATA output, we need to cycle all the columns, normally you should know how by now as we already did it in the previous lesson.

IMetadataTable preMetadata = null;
if ((node.getIncomingConnections()!=null)&&(node.getIncomingConnections().size()>0))
 preMetadata = node.getIncomingConnections().get(0).getMetadataTable();
List<IMetadataColumn> columns = preMetadata.getListColumns();
for (int i = 0; i < columns.size(); i++)
 IMetadataColumn column = columns.get(i);
// ....we will place some java output code here
pw_<%=cid %>.println(" </METADATA>");

For each column in the list (ah, btw, you need to have "java.util.List" in the imports) we need to write a COLUMN element in our metadata and for each column we will write the name , the type and length.
Note: the IMetadataColumn implementations can provide you a great deal of useful information, please refer to here for the javadoc. We will not take advantage of all the possible and extremely interesting attributes (i.e. isKey, isNullbale etc) because this is just an exercise, however feel free to expand the metadata section of the component if you like.

The needed java output code is something like the following :

pw_<%=cid %>.println(" <COLUMN>");
pw_<%=cid %>.println(" <NAME><%=column.getLabel() %></NAME>");
pw_<%=cid %>.println(" <TYPE><%=column.getTalendType() %></TYPE>");
pw_<%=cid %>.println(" <LENGTH><%=column.getLength() %></LENGTH>");
pw_<%=cid %>.println(" </COLUMN>");

Notice I used the TalendType instead of the java type, but if you check what we did in the previous lesson, you should be able to get the java type instead if you wish.
Install the component, set the legnth for the columns in thge tRowGenerator and test it, you should get a beautifully geenrated XML file with the metadata of the input connection.It was easy, no?

Now we need to deal with the incoming records, in a output or processing component this can only be done in the main section.

    An output or processing component receives records in the main section which is executed one time per each incoming record

The main section is going to be quite simple, this time we will not recycle much from the previous component because this one is really different :

[ java jet imports ]
CodeGeneratorArgument codeGenArgument = (CodeGeneratorArgument) argument;
INode node = (INode)codeGenArgument.getArgument();
String cid = node.getUniqueName();
IMetadataTable preMetadata = null;
if ((node.getIncomingConnections()!=null)&&(node.getIncomingConnections().size()>0))
preMetadata = node.getIncomingConnections().get(0).getMetadataTable();
String inRowName = node.getIncomingConnections().get(0).getName();
%> pw_<%=cid %>.println("  <RECORD>");
<% List<IMetadataColumn> columns = preMetadata.getListColumns();
for (int i = 0; i < columns.size(); i++)
 IMetadataColumn column = columns.get(i);
pw_<%=cid %>.println("  <<%=column.getLabel() %>>"+<%=inRowName %>.<%=column.getLabel() %>+"</<%=column.getLabel() %>>");
pw_<%=cid %>.println("  </RECORD>");
nb_line_<%=cid %>++;

As you can see we get the preMetadata exactly in the same way we did in the begin section, plus we also get the name of the incoming connection (inRowName).
Then, before cycling through the columns, we write (in java output code) the <RECORD> tag and the </RECORD> at the end, finally nb_line is incremented to count the processed records.
The key line is :

pw_<%=cid %>.println("  <<%=column.getLabel() %>>"+<%=inRowName %>.<%=column.getLabel() %>+"</<%=column.getLabel() %>>");

This line outputs the value of the column field, let's imagine that the incoming connection is called "row1" and that the columns are called newColumn and newColumn2, when we are in the first iteration of the cycle column.getLabel() wull return in the java output code "newColumn" and inRowName is "row1", so the created java output line is :

pw_tTutorial8_1.println("  <newColumn>"+row1.newColumn;+"</newColumn>");

Run the job after installing the updated component, check that the xml output file is generated correctly then check the job java code and see how the main section of the component has been rendered.

This ends this lesson, however we said at the beginning that we would have transformed this component into a processing one, I will give you a few hints and you will try to perform this task on your own, then we will check together in the next lesson.
As usual you can download my version of the component Here.

A processing component

Basically a processing component is something that is located in the middle of a subjob, this means it has input and output connections.
That's your first hint : update accordingly the XML descriptor.
Also, if you have output comnnections, you need to have a SCHEMA_TYPE parameter.
You wil not need a cycle to output the records, instead, for each record received in the input, you will. output a record to the output connection.
Most of the processing components assume that the input and output schema are compatible, meaning that a column in position X in the input can be copied into the column in position X of the output connection (which translates in "input and output have the same number of columns and their columns have the same or compatible types).

What you need to do is :
- Get inRowName (we have it)
- Get outRowName
- Get metadata and outputColumn - Set <%=outRowName %>.<%=outputColumn.getLabel() %> = <%=inRowName %>.<%=inputColumn.getLabel() %>

Notice that your output connection will have 1 record per each record received in the incoming. This would be a limitation in Talend, due to the design (the main section is executed once per record, triggered by the cycle of the component that starts the subjob), however there is a simple workaround to "silence" the output connection when needed, we will see this one in the next lesson.

Part 7  Part 9