"How to create TOS components" Tutorial : Part 6

Component Creation - Part 6

Connectors allow interaction between components, in this tutorial we will focus on "flow" connectors that transfer data (records).

Components can be linked in different ways : they can be activated by a trigger event, such as "on Subjob Ok" or "on component error" etc, or they can respond to a data flow.
When referring to data connection, they can have a number of incoming connections and/or a number of outgoing connections.
There are basically three different cases :
1) Input Component : The component has only outgoing connections
2) Output Component : The component has only incoming connections
3) Processing Component : The component has both incoming and outgoing conenctions

When creating a component we normally know upfront in which of these categories it will fall, so we are normally able to specify which kind of connections should be accepted and if a maximum or minimum number of them is set as a limit.

For this reason in the XML file it is possible to specify for data connector four optional parameters : MAX_INPUT,MIN_INPUT,MAX_OUTPUT and MIN_OUTPUT.

    Data connectors are defined in the XML file as ctype="FLOW" and minimum/maximum connection number can be specified for incoming and outgoing connections

Currently the XML file describing tTutorial1 defines one connector :


The fact that we have one single connector does not mean we have a single connection, actually we define in the XML file a connector per each connection type we want the component to have, so our tTutorial1 will accept 0 to n incoming and 0 to n outgoing connections.

In order to test this,you ned to :
1) create a new empty job
2) add a tTutorial1 and two tLogRow
3) connect the two tLogRow to the outout main connection of tTutorial1.

You should not be able to run the job, that's because we never defined which data should be sent to the output.
Think about it : there is no input, nor we defined a data structure for tTutorial1, which kind of data should be sent to the output?
Let's assume now that our component will produce the data, acting as an input component, what we need to do to enable such behavior is to define its metadata, the schema of the data it will generate.
This can be achieved by adding a SCHEMA_TYPE parameter in the XML file :


    Input components must define a parameter of type FIELD="SCHEMA_TYPE".This defines the output metadata.

The defined schema will be automatically attributed to the outgoing data connections.

Push the updated component to the palette and try again to execute the job, this time it will run without error, however all records sent to the logrow components will be null.

Data for the output will be generated by the java code of the component, let's write it.

Our component already has a cycle between the begin and end sections, the actual data output happens in the main section, even if it does not exist.
Basically each time we enter in the end section, data is sent to the output connections.
Before that happens, at each record, we want to load a record buffer -for each outgoing connection- containing values for each field defined in the schema.

I hope you already experimented with tJavaFles or tJavaRow components, where you assigned values to the record buffer like row1.field1="xxxx";
In the component we do exactly the same thing, but we need first to discover how many "row" connections we have and how they are (uniquely) named.
This is possible in the template, so assuming we will load the buffers in the main section, we need to retrieve this information there :

<%@ jet
<% CodeGeneratorArgument codeGenArgument = (CodeGeneratorArgument) argument;
INode node = (INode)codeGenArgument.getArgument();
String cid = node.getUniqueName();
List<IMetadataTable> metadatas = node.getMetadataList();
if ((metadatas != null) && (metadatas.size() > 0)) {//b
 IMetadataTable metadata = metadatas.get(0);
 if (metadata != null) {//a
  List<IMetadataColumn> columns = metadata.getListColumns();

The new lines of java jet code are used to get the metadata of the component, defined int he SCHEMA_TYPE parameter.
Notice that theoretically it is possible to have more than one SCHAMA_TYPE parameter so the method getMetadataList() returns a list of MetadataTable, each one of them holds a schema.
In our case we know we have only one, so we extract it from the list with the method metadatas.get(0).
Finally we get the list of columns from the schema (metadata) Now we need to scan all the outgoing connections :

  List< ? extends IConnection> outConns = node.getOutgoingConnections();
  for (IConnection conn : outConns)
  { //2
   if (conn.getLineStyle().equals(EConnectionType.FLOW_MAIN)||conn.getLineStyle().equals(EConnectionType.FLOW_MERGE))
   { //3
    String outputConnName = conn.getName();

With this last part of code we get the list of outgoing connections and cycle through them, getting the name in the string variable outputConnName, this string will hold the "rowxx" name.
We used the line style to filter only the data type connections and we accepted FLOW_MAIN and FLOW_MERGE types.The first one is the common data connection, the second is the data connection used in conjunction wiht a tUnite component.
Please notice that we had to import "org.talend.core.model.process.EConnectionType" in order to perform this selection.

The reason why there are commented numbers, letters etc when opening and closing blocks is that when mixing jet code and java output code, it can become really cumbersome to read the source code, one way to make this task easier is by adding some short comments to the blocks-

The last bit of information we have to deal with are the columns, which we saved in the columns list variable, we need to cycle through them and finally output to the java output code.For simplicty we will assume that all the columns in the schema are strings which will be fileld with "I am the main section and myvas is "+myvar_<%=cid %>.

    for (int i = 0; i < columns.size(); i++)
     IMetadataColumn column = columns.get(i);
   <%=outputConnName %>.<%=column.getLabel() %> = "I am the main section and myvas is "+myvar_<%=cid %>;

Notice that in the for cycle "4" we exit form the jet template and enter in the java output code domain to write one line, then we go back to the jet
How many lines will be written?That depends on the two cycles "2" and "4", there will be one line per each column (4) of each outgoing connection (2).

Hopefully you should be able to setup correctly the main section of tTutorial1, copying & pasting the code from this tutorial, once you are done, try to install the component.
If it did not work, don't worry, I packaged the final result and you can downlaod it here

To test the result, you need the tTutorial1 component connected in output with one or more tLogRow, as we did at the beginning of this lesosn, you also to define one or more STRING fields in its schema.
Again, should you have issues with the job, you can download mine here

After executing the job, check its java code, you will notice that none of the logic related to the discovery of the connections and of the columns was transferred there, that's applied only when compiling the template.
This should also allow you to understand that in Talend it is not possible to have a dynamic schema, since this is defined and retrieved only at compilation level and cannot be modified at runtime.
[Edit : Since version 4.1 some dynamic schema functionality has been added in a few Database components, supported in TIS only, I never tried it so far]

There are plenty of things still to be discussed about connections and metadata, the exercise we did in this lesson is still a bit far from a real life example, however it is crucial that you completely understood the different steps as they will be used in basically all components you will create.
The next lesson will focus more in detail on metadata, we will also see how to pre-define a schema for the component so that we can be sure of which fields are there.
As usual, if anything was not clear or you want to provide some feedback, feel free to contact us using the feedback module.

Part 5  Part 7