In this article, today we are going to discuss the need of link partitioner and link collector in DataStage. Already, we have discussed more DataStage Overview, hope you know about it!
Where Do We Use Link Partitioner and Link Collector?
Link Partitioner – Used for partitioning the data.
Link Collector – For collecting the partitioned data.
What Does Link Collector Mean?
The Link Collector stage is an active stage which takes up to 64 inputs and allows you to collect data from this links and route it along with a single output link. The stage expects the output link to use the same Metadata as the input links.
The Link Collector stage can be used in conjunction with a Link Partitioner stage to enable you to take advantage of a multi-processor system and have data processed in parallel. The Link Partitioner stage partitions data, it is processed in parallel, and then the Link Collector stage collects it together again before writing it to a single target. To really understand the benefits you need to know a bit about how DataStage jobs are run as processes, see “DataStage Jobs and Processes”.
The Properties tab allows you to specify two properties for the Link Collector stage:
- Collection Algorithm. Use this property to specify the method the stage uses to collect data. Choose from:
- Round-Robin. This is the default method. Using the round-robin method the stage will read a row from each input link in turn.
- Sort/Merge. Using the sort/merge method the stage reads multiple sorted inputs and writes one sorted output.
- Sort Key. This property is only significant where you have chosen a collecting algorithm of Sort/Merge. It defines how each of the partitioned data sets is known to be sorted and how the merged output will be sorted. Also, the key has the following format:
The Link Partitioner stage is an active stage which takes one input and allows you to distribute partitioned rows to up to 64 output links. The stage expects the output links to use the same Metadata as the input link.
Partitioning your data enables you to take advantage of a multi-processor system and have the data processed in parallel. Furthermore, it can be used in conjunction with the Link Collector stage to partition data, process it in parallel, and then collect it together again before writing into a single target. Also, to really understand the benefits you need to know a bit about how DataStage jobs are run as processes, see “DataStage Jobs and Processes”.
Before-Stage and After-Stage Subroutines
The General tab on the Stage page contains optional fields that allow you to define routines to use which are executed before or after the stage has processed the data.
- Before-stage subroutine and Input Value. Contain the name (and value) of a subroutine that is executed before the stage starts to process any data. For example, you can specify a routine that prepares the data before processing starts.
- After-stage subroutine and Input Value. And this contains the name (and value) of a subroutine that is executed after the stage has processed the data. For example, you can specify a routine that sends an electronic message when the stage has finished.
These are the few things you need to know about Link Partitioner and Collector.