PowerUp
"How to create TOS components" Tutorial : Part 9

Component Creation - Part 9

Hello, if you arrived here after lesson 8, I hope you managed to complete your assignment.
You were asked to transform tTutorial8 we created together into a processing component, having both input and output connections.
If you did not manage to get it done (shame on you if it was only for lazyness! :) ), don't worry, I am going to do the exercise now because I need to show you a couple of things.

As we need both input and output connections (1 input, n outputs) we need to set the FLOW CONNECTOR parameter accordingly in the XML descriptor :

<CONNECTOR CTYPE="FLOW" MIN_INPUT="1" MAX_INPUT="1" MIN_OUTPUT="1"/>



We basically require at least one output, which will have a schema so we also add :

<PARAMETER NAME="SCHEMA" FIELD="SCHEMA_TYPE" REQUIRED="true" NUM_ROW="1"/>



I am going to install these changes because I want to show you something, so, either use the component you modified for the exercise, either modify the one from the previous lesson (which can be downloaded here) by adjusting the XML descriptor as explained above

Create e new empty project, add the tTutorial8 component, then a tRowGenerator and a tLogRow, but DO NOT connect them just yet.
Define a simple schema for the tRowGenerator, then connect it to tTutorial8.
If you now inspect the schema of tTutorial8, you will see that it inherited by default the schema of the component it has in input (tRowGenerator). Connecting the output of tTutorial8 to the tLogrow will have the same effect on it.
In most cases this is ok as most of the components will not alter the schema within the subjob, so having it filled by default is normally helpful.
This behavior is controlled by a HEADER PARAMETER called SCHEMA_AUTO_PROPAGATE which defaults to "true" if omitted.


    The SCHEMA_AUTO_PROPAGATE header parameter (which defaults to "true") enbables a component that contains a schema to inherit by default the schema of the incoming connection


To see the difference, in your XML descriptor file add SCHEMA_AUTO_PROPAGATE="false" in the header, then :
  • Run the job so that the copmponent is reloaded
  • Disconnect tRowGenerator from tTutorial8
  • Clear the schema in the tTutorial8 component
  • Reconnect tRowGenerator to tTutorial8
This time the schema of the tTotrial8 should be empty, as visible in the picture here on the side.


You can set SCHEMA_AUTO_PROPAGATE="true" now, as it's normally quite handy.

Now, let's suppose our input and ouput schema are identical or at least compatible, in your exercise you were supposed to copy values from the input connection to the output one.
You probably (and hopefully) achieved the task using some java code in the main section of the component, however, there is another header parameter that can help in (the quite rare) cases where data is not modified at all : DATA_AUTO_PROPAGATE="true".


    The DATA_AUTO_PROPAGATE header parameter instructs the component to copy all the field values received in the incoming connection to the outgoing connections


Try to set it to true and to false, and check if tLogRow in your sample job receives the values (temporarily disbale the java code that does it if you are using the component of your previous assignment).

Now make sure you have an outgoing connection from tTutorial8, something like this :



In my case the outgoing connection is called "row5".
Please check the java code of the job, and search for row5Struct (substitute row5 with the name of your connection).
You will find that the studio automatically created a class with that name, with all the fields you defined as properties and few getters/setters methods.
Normally you don't need to care much of those rowxStruct classes, they definitely help you in understanding how TOS moves data from a component to another, but you almost never need to use them explicitly.

    TOS generates a rowxStruct classes for rowx connections.The rowx object is an instance of the rowxStruct class.


There is, however at least one case where you need to rebuild the rowx object with a rowx = new rowxStruct(); command, this happens when you destroyed it and need to recreate it.
Why would you want to destroy it in the first place?
Remember when we said that a limitation is that a processing component, in the main section, it sends out one and only one record per each record it receives from the incoming connections?
That's generally true unless the outgoing rowx connection object is null (it is "destroyed").

    By setting to null a rowx object, no record will be generated for it


Beware: just not setting the values for the fields will still generate a record if rowx is a valid rowxStruct, eventually it will just have a set of null values in all the fields.

So, to use this feature, you will have some kind of condition in your component that at each record received from the incoming connection it will define if you need to send a record to the outgoing rowx connection or not.
We are going to implement a simple example : we will output only records whose count (nb_line_<%=cid %>) is even.
To achieve that we need to :
  • 1 - Per each record destroy or build the rowx object
  • 2 - In the input metadata cycle if we need to output the record, assign values to rowx.columnZ = rowy.columnZ




...
String outRowName = "";
List< ? extends IConnection> outConns = node.getOutgoingConnections();
%>
pw_<%=cid %>.println(" <RECORD>");

<%
   for (IConnection conn : outConns)
   { //oc1 Open
    if (conn.getLineStyle().equals(EConnectionType.FLOW_MAIN)||conn.getLineStyle().equals(EConnectionType.FLOW_MERGE))
    { //oc2 Open
     outRowName = conn.getName();
%>

  if (nb_line_<%=cid %> % 2 == 0)
   <%=outRowName %> = new <%=outRowName %>Struct();
  else
   <%=outRowName %> = null;

<%
    } //oc2 Close
   } //oc1 Close

List<IMetadataColumn> columns = preMetadata.getListColumns();

....



We are obviously working in the main section of the component, notice in the code above I added two variables : a String to hold the Output connection name and a List to cycle all the existing outgoing data connections.
Then, at he beginning of the record I cycle all those outgoing connections filtering only the data ones, I get the name of the "rowx" object and finally, in java output code (records are counted only in java output code!) I conditionally create or destroy the rowx object.
Next comes the output part, where we assign values to the fields of the output connection IF the record has to be generated (else we would try to set a property of a NULL object!!).



...
List<IMetadataColumn> columns = preMetadata.getListColumns();

for (int i = 0; i < columns.size(); i++)
{ // column cycle
 IMetadataColumn column = columns.get(i);
%>

 if (nb_line_<%=cid %> % 2 == 0) // or also test if <%=outRowName %> is not null
 { // even
<% 
   for (IConnection conn : outConns)
   { //oc1 Open
    if (conn.getLineStyle().equals(EConnectionType.FLOW_MAIN)||conn.getLineStyle().equals(EConnectionType.FLOW_MERGE))
    { //oc2 Open
     outRowName = conn.getName();
%>

 <%=outRowName %>.<%=column.getLabel() %> = <%=inRowName %>.<%=column.getLabel() %>;

<%
    } //oc2 Close
   } //oc1 Close
%>

 } // even
pw_<%=cid %>.println("  <<%=column.getLabel() %>>"+<%=inRowName %>.<%=column.getLabel() %>+"</<%=column.getLabel() %>>");
<%
} // column cycle
%>

pw_<%=cid %>.println(" </RECORD>");
....




We assumed again that metadata is identical between the input and the output, so we did not need to cycle the output metadata, we simply cycled the various connections and assigned the values to the fields (column names retrieved from the input metadata).
Notice that the record will be anyway written to the output xml file, it's just the output flow part being conditionally controlled.

If you update your component, install it and try to run it, you will discover that it does not work.
That's because there are some pre-requisites for a component to be able to play with connection objects in that way :

    A processing (or output) component that outputs less records than the ones it receives must be declared DATA_AUTO_REPLICATE = "false" and HAS_CONDITIONAL_OUTPUTS="true" in its header


Update accordingly the header section of the XML descriptor, then install again the component and test it, the result should be something like the one visible in the image here on the right.

Notice that the row2 records are half in number of the row1 records : 100 rows in row1 and 50 rows in row2.

You can experiment changing the condition, in my example I just used a simple and quite useless one, but if you want to see a more complex example you can check my tTimeGenerator component where I implemented this methodology to output years and months only when the next day to be sent to output was belonging to a month or a year that was not generated yet.
It's probably a rare case where you need to "silence" output connections outside of a processing component. There is needed because I have 3 outpu connections and they have a different aggregation level (number of records).
Notice that in that component there are also a few things I did not explain yet on the schema definition, we will get back to it for them.


This ends lesson 9, I hope you enjoyed it and that you were able to follw it easily.I realize this is about the point where things start getting a bit "messy" in component creation, so you may need to go over this one a few times to nail the concepts.
When readng jet source code at this point (debugging yours, reverse engineering existing components, following my examples...) it is key to visualize imemdiately which part belongs to the java output code and which to the template code.
I tend to comment each block I open and close with the same tag (i.e. // oc1 open - // oc1 close) that helps a lot in figuring out the overall structure of the code.

Feel free to send me some feedback if you have questions or suggestions.

As usual, the code of the component created in this lesson can be downloaded here

Part 8  Part 10