PowerUp
"How to create TOS components" Tutorial : Part 3

Component Creation - Part 3


Welcome to the third part of this tutorial, today we will finally create our first component, it will be simple, but we will see the main things you have to do each time you start a component project.

Are there specific tools, like an IDE to help in this task? Uhm, the official answer is : Yes, there is the Component Designer which can help a bit in the process, but personally I am not counting too much on it.
I believe that the component creation process is an area with many improvement opportunities for Talend, maybe we -component developers, and you will be one at least at the end of this tutorial, should you decide to follow it- could help out.
You will judge for yourself, however I'd like to warn you to be ready for a few...ehm... "not completely polished" parts of the process.

Ok, to start, let's just decide where to put your new component, ok? If you followed the previous parts of the tutorial, you learnt that a component txxxx has a directory txxxx stored somewhere in your hard disk in a path with funny names.
That applies to standard components and to your custom built components, but we will keep them in a separate folder, so our first task will be to create a "Custom Components" folder somewhere handy on your disk.

    Custom components are stored in a spearate folder dedicated to them


Now that we created the folder, we need to tell Talend where to find it, so, in the studio we will set a couple of parameters :

In Window-> Preferences-> Talend-> Components
we will copy the full path of the folder we created.

We will also want to use the Component Designer, so we need to copy the same path to another configuration setting, which is found in the same preferences window, under -> Component Designer




Done? Good, your IDE is now ready to work on a new component.
The parts (views) that build up the component designer are already loaded in the studio (in Eclipse), and normally linked to a separatre Perspective.Basically in this tutorial you will work with two perspectives : 1) Data Integration and 2) Component Designer.
You can switch from one to the other with the small table-like icon on the top right part, if you click it you will se a list of available perspective.
If you cannot find this icon, the same menu is accessible from the menu "Window->Perspective".

Go ahead and select the Component Designer Perspective, a pane should appear (the position may vary according the way you did setup Eclipse) with the list of your custom components.
Don't worry if the box is empty,if it is, that's because you did not develop any component yet, but that's going to change soon.

You should, however, be able to see the root folder called "COMPONENT_PROJECT", that's a representation of the "custom Components" folder you created before


Your first component

Your first component will be called tTutorial1, that's the first information we need because we will create a directory with the same name in our Custom Components folder, directly in the file system.
Note : It is possible to do it from the Component Designer instead, but for now we will not use this functionality because the wizard would create automatically a serie of artifacts for us, which, for the purpose of this tutorial, we prefer to create manually, one by one.

Once you created the tTutorial1 folder, in the component designer select the COMPONENT_PROJECT root folder and hit F5 (or right click with the mouse and select "refresh" from the menu).
Normally your new folder should now appear recognized as a "component wannabe".

Components, when installed, are placed in the Palette and from there they become accessible for our Data Integration projetcs.
If you select the newly appeared tTutorial1 node, and right click with your mouse, you will notice in the menu an item "Push Components to Palette".

    To install a custom component, from the Component Designer, the "push components to palette" is used.The studio will attempt the installation of ALL the components visible under the COMPONENT_PROJECT root node.


This is one of the reasons why you really want to keep the Custom Component folder as clean as possible.

We can also try to push our component, but it's empty, so nothing will be installed.
This is because a component has some minimal requirements to "exist", way less than what Descartes would expect, not even "compilo ergo sum"... at a minimum we need to describe our component.
I was never able to find any documentation about which parts of a component are mandatory and which are not, so I did run some tests adding and removing things and I came out with some empirical rules which I am about to discuss here.

    Each (java) component must have a descriptor XML file stored in the component folder, this file must have the same name of the component, plus a _java suffix and .xml extension


In our case we will create tTutorial1_java.xml. This element is really key in a component, it allows the studio to understand what the component is supposed to do and how it can interact with other components.
The structure of the xml descriptor file is defined and some parts are mandatory, again I used empirical tests to find how which ones, we will start with them.


<?xml version="1.0" encoding="UTF-8"?>
<COMPONENT>
 <HEADER AUTHOR="Me"
   COMPATIBILITY="ALL"
   PLATEFORM="ALL"
   RELEASE_DATE="20110907A"
   SERIAL=""
   STARTABLE="true"
   STATUS="BETA"
   VERSION="0.1">
   <SIGNATURE/>
  </HEADER>
 <FAMILIES>
  <FAMILY>Tutorial</FAMILY>
 </FAMILIES>
 <DOCUMENTATION>
   <URL>http://www.powerupbi.com/talend/componentCreation_3.html</URL>
  </DOCUMENTATION>
  <CONNECTORS>
   <CONNECTOR CTYPE="FLOW"/>
  </CONNECTORS>
  <PARAMETERS>
  </PARAMETERS>
  <CODEGENERATION/>
    <RETURNS>
   </RETURNS>
 </COMPONENT>



Copy&paste this code into a text file called tTutorial1_java.xml in your Custom Components/tTutorial1 folder
If you try to push now... you will see that nothing happens again :)
From what I discovered in my experiments, you need at least another file in order to make the component visible in the palette : the message properties file.As you propbably guessed, it is a java .properties file and it is mandatory, it will look initially something like this :


#
#Fri Mar 25 10:00:50 CET 2011
HELP=org.talend.help.tTutorial1
LONG_NAME=Tutorial component 1


Copy&paste this code into another text file in the same directory, that you will save as tTutorial1_messages.properties, and now push the components to the palette.
In the bottom right part of your application window you should see a message saying Generation engine initialization in progress, get used to it because you are going to see it quite a few times.

Wait for that process to finish, once done the palette should be refreshed.If you cannot see the palette, simply create an empty project (you may need to switch back to the Data Integration Perspective, depending on your configuration).
You may have an error message if you mistyped something and your xml is not valid or some files have the wrong name etc, if that happens you will start enjoying the component debugging.
Remember when I told you there are few areas in the process where we can imagine some opportunities for improvement? You just found one.

To be sure that everything worked, not having error messages is not enough, you cannot assume you will always get an error message when your component has something wrong, sometimes the process will fail silently, sic!
Finally, it the process worked, your component should be visible in the palette.
See that "  <FAMILY>Tutorial</FAMILY>" we defined in the xml descriptor? With that line we asked the Studio to place the component in the palette, in a family called "Tutorial", so you should be able to find it there, if you cannot find it, it means something went wrong.
Notice that the XML file also has a parameter in the header called "PLATEFORM", don't correct it, it is supposed to be that way, I wasted enough time already with XML files rejected because I wrote "PLATFORM".

Congratulations!! you have just installed your first component!! ok, now, don't get too excited, if you try to use it on a project, it will generate a ugly null pointer assignment and most likely will destroy your project as well (if you try, DO NOT save the project afterwards, instead, close and reopen it).

So, what's causing the null pointer error now? What's missing? That should be visible directly in the palette : your component is missing an icon (which, for a graphical tool is quite important).

That's something we can fix easily, you can get an icon here and save it as tTutorial1_icon32.png in the component folder, I am sure your icon creation skills are much better than mine, so I will not discuss how to create an icon, ok?

Now, I guess you are in the Data Integration Perspective, with a blank project in front of you, if this is not the case, please try to get to that point.
This time we will use a "shortcut" to push the component : press Ctrl + Shift + F3 (If you are a Mac user you probably need to squeeze some fruit and tap your credit card on your iPhone until your iCat purrs as a sign of confirmation.Sorry, I am pretty sure there is a key combination there as well, which might be slighty different from the one I suggested).

Again, you should see the Generation engine working for you and finally your component should be visible in the palette with its brand new icon.
This time, if you drag it into a project, nobody is going to complain with error messages, you should even be able to run the project which will do... nothing.But hey, it's a start no?

    A component needs at a minimum 1) The -properly formatted- xml descriptor file 2) The messages properties file 3) An icon file, to be succesfully installed in the palette


We will be able soon to make our component do something useful but for now we need to analyse the files we used so far.

The XML descriptor

This file contains a few important sections nested under the "COMPONENT" root element, which makes sense since we are creating a component after all.
The header normally contains some key parameters, almost none of which is mandatory and therefore listed in our file, instead we provided values for other mandatory parameters which have no effect at all on the component (i.e. if you change the name of the Author, you are not going to improve or worsen the chances that a component executes properly, nor it will affect it in any way, still the author is mandatory).

The only parameter we really need to care of here, is STARTABLE, this is really affecting dramatically the usage of a component.

To understand this point we need to stop for one second and think what we want to achieve : what is the task that will be accomplished by our component?
Imagine a component placed in a project, it will be connected to other components to perform complex tasks within a subjob (see part 2 of the tutorial for details on this). In a subjob a component might be placed at the beginning, in the middle or at the end of the component chain.

Where our component will be placed depends on the function it has to perform, as an example if it connects to a database to fetch data from a table or query and then injects this data in a "row" connection, we can say it is an input component because it brings into the job some data.
This kind of components are normally located at the beginning of a subjob because they do not receive input from other components, basically they need to be able to start a subjob.

    A data input component, or anyway a component which will not get data from other components is normally placed at the beginning of a subjob and it MUST have the parameter STARTABLE=true in its XML descriptor


You probably noticed we also needed to define a connector. Connectors are the (main) means for components to interact between them and they can be of different types.
The most common ones are CTYPE=FLOW ones, and are related to data, to the orange "rowx" (row1, row2...) connections that tie together components in a subjob.

We also have a PARAMETERS section, this is used to allow the component to expose some input parameters that can be used to customize the component itslef.
Typical examples are the file name and directory if the component is supposed to read or write a file, it can be the host, username and password to connect to a database etc.

We will not discuss the CodeGeneration part really soon, so I am skipping this one for now, instead we will use the RETURNS.
Returns are "output" parameters, something that the component will output to be eventually used afterwards.A typical return parameter is the NB_LINE providing how many data lines (records) have been handled by the component.


In this tutorial we did almost everything in the "manual" way, I think it was better, to allow you to "touch" the details of the process, but the component designer can obviously help a lot in making this process a bit less tedious.
As a personal preference, I am not using it much, typically I just copy the files from a "template" component and start from there.
You can find in the web some resource specific to the component designer, I highly recommend this one : http://bekwam.blogspot.com/2011/03/tutorial-component-designer-in-talend.html, it's from Carl's blog dedicated to Data Integration, you can find a lot of interesting tutorials there.
I will not dig into the usage of the component designer, so you may want to check his tutorial on that part instead.

In the next part we will add some parameters and some basic logic to the component.
Hope you enjoyed this tutorial, if you have questions, ideas, complaints etc, you can contact us using the feedback module on this site.

Part 2  Part 4