Cullen Programming

JAVA Pipelines

Users Guide

JAVA Pipelines is a JAVA application port of IBM's CMS Pipelines by John P. Hartmann that runs on the IBM Virtual Machine operating system platforms. It attempts to closely emulate a significant amount of the functionality that has been provided for years to VM REXX language programmers and CMS users. Each filter's functionality and limitations are documented in the filter's description and use page.

Of course CMS, CP, TSO and Assembly filter stages will not be provided due to their specific VM and MVS operating system nature, but other filter stages that remain operating system independent will have ported functions.

JAVA Pipelines will operate on any platform that supports the JAVA Run Environment, and therefore JAVA and NetRexx programmers can now enjoy much of the same programmatic facilities and advantages found using Pipelines on the IBM VM platform.

What Is a Pipeline?
JAVA pipelines are like the pipelines used in plumbing. Instead of water flowing through pipes, however, data flows through programs. Data records enters the pipe from a device (such as a disk), flows through the pipeline, and eventually exits to another device (such as a file or console display).

Programs, like pieces of pipe, can be fit together to solve complex text manipulation problems. Each program, or stage, in the pipeline changes or manipulates the data that passes through it. As data flows through the stages it is transformed, step-by-step, into the results you need. The data flows from left to right. JAVA Pipelines lets you combine programs so that the output of one program serves as the input to the next.

Pipe Stages
In a pipeline, the output of one stage is the input to the next. The data itself is in the form of discrete records. Note that it is records that flow through the pipeline, not a continuous stream of bytes. A record is simply a string of characters; perhaps a line of a file or a line entered at the terminal. A string of data that is terminated by either a newline character or carriage return/linefeed pair. Imagine a stage as a small factory through which a conveyor moves records. Records enter the stage on the left, and leave on the right.

While within the stage, the records can be modified, discarded, or split apart. Practically any manipulation can happen to them. Precisely what happens depends on the stage that is being used. Many stages write one output record for each input record. Some, however, do not.

The records entering a stage are called its input stream. The records leaving a stage are called its output stream.

Stages can use more than one input stream or output stream. You can use these secondary streams to write complex multistream pipelines.

Execution Characteristics
Since each stage is assigned to run in it's own Thread of Execution, the work processed among the stages is performed CONCURRENTLY as records become available to each stage. So while the PIPE is reading the last records on its input stage, it may be writing the first records to its output stage, and processing the middle records in the intermediate stages.

This multi-threading design exploits CPU cycles that may have otherwise gone wasted while waiting. This design greatly increasing the throughput performance over single-threaded operations.

The PIPE Interfaces
JAVA Pipelines can be used from a supplied graphical User Command Interface or invoked from JAVA or NetRexx programming using the Application Programming Interface (API). A Primary and optionally 1 Secondary input (or output) is currently supported for those filters that are described as using such.

The Pipelines Command Interface

Informational messages and error messages will appear in their separate sub-windows on the Command Interface. Informational messages will contain information regarding the execution process. Error messages will contain information about syntax, semantics, input or output errors.

The Command Interface allows you to enter any length PIPE. The buttons permit you to start and stop the pipe execution; and clear the command line.

Every pipe entered is archived in the pulldown combobox and also to a disk file named PIPES.log.

Pipe output will appear on the "Terminal Command Screen".

Application Programming Interface for JAVA programmers

pipe apipe = new pipe();
String pipestring = String "< test.data | filter1 | filter2 | ... | > output.file";
rc = apipe.apiprocess1(pipestring); --Return Code is rc

Application Programming Interface for NetRexx programmers

apipe = pipe()
pipestring = "< test.data | filter1 | filter2 | ... | > output.file"
rc = apipe.apiprocess1(pipestring) --Return Code is rc

Pipe Return Codes

API for NetRexx programmers using STEM input.

LINE = "STEM." --Set Default ID for Rexx index string
loop ix = 1 to 10 --Populate the STEM variables.
LINE[ix] = 'This is record '||ix
end
LINE[0] = ix - 1 --Mark number of index strings in array
apipe = pipe() --Get a new pipe program instance
apiArg = Rexx '' --Rexx index string holds pipeline
apiArg[0] = 5 --Number of stages (index 1 through n)
apiArg[1] = "STEM" --Stage 1 is tells pipe that input is a STEM
apiArg[2] = LINE --Name of Rexx index string serving as STEM
apiArg[3] = "CHANGE /A/a/* --Change all 'A' to lowercase
apiAarg[4] = "> OUTPUT.FILE" --Output
rc = apipe.apiprocess2(apiArg) --Return Code is rc

API for NetRexx programmers using STEM output.

OUTLINE = "STEM." --Set Default ID for Rexx output index string
apipe = pipe() --Get a new pipe program instance
apiArg = Rexx '' --Rexx index string holds pipeline
apiArg[0] = 4 --Number of stages (index 1 through n)
apiArg[1] = "LITERAL this is the time for all good men." --Stage 1 input
apiArg[2] = "COUNT WORDS" --Stage 3 is tells pipe to count the words
apiArg[3] = "STEM" --Stage 4 is tells pipe that output is a STEM
apiArg[4] = OUTLINE --Name of Rexx index string serving as STEM
OUTLINE = apipe.apiprocess2(apiArg) --STEM output will be returned as a result
if OUTLINE < 0 then --Check return code
do
say 'Error, RC='OUTLINE
end

API for NetRexx programmers using STEM input and STEM output.

OUTLINE = "STEM." --Set Default ID for Rexx output index string
LINE = "STEM." --Set Default ID for Rexx input index string
loop ix = 1 to 10 --Populate the STEM variables.
LINE[ix] = 'This is record '||ix
end
LINE[0] = ix - 1 --Mark number of index strings in array
apipe = pipe() --Get a new pipe program instance
apiArg = Rexx '' --Rexx index string holds pipeline
apiArg[0] = 5 --Number of stages (index 1 through n)
apiArg[1] = "STEM" --Stage 1 is tells pipe that input is a STEM
apiArg[2] = LINE --Name of Rexx index string serving as STEM
apiArg[3] = "HEXLATE" --Stage 3 is tells pipe to translate contents to hex
apiArg[4] = "STEM" --Stage 4 is tells pipe that output is a STEM
apiArg[5] = OUTLINE --Name of Rexx index string serving as STEM
OUTLINE = apipe.apiprocess2(apiArg) --STEM output will be returned as a result
if OUTLINE < 0 then --Check return code
do
say 'Error, RC='OUTLINE
end

Programmers that employ the API should ensure that both the PIPELINES.JAR and NetRexxR.JAR files appear in their CLASSPATH when compiling and executing their program.

The PIPE Command String
To run a pipeline, use the PIPE command string. This is entered from the PIPEGUI command line . PIPE accepts one or more pipelines as operands. The PIPE command string operands can consist of a single pipeline or multiple pipelines.

In a pipeline, stages are separated by a character called a Stage Separator (the default is the | ):
(OPTIONS) stage_1 | stage_2 | stage_3 | stage_4 | stage_5 | ... | stage_n [ another stream of stages ]

Examples:

(trace) < test.data | locate /LUCKY/ | change /LUCKY/Loser/* | console

< /home/temp/file1.txt | b: concat | > combined.txt ? < /home/temp/file2.txt | b:

Do not place stage separators prior to the first stage or after the last stage.

For the default stage separator, the PIPE command expects the character X'4F' ( | ). You must determine which key on your terminal generates the character X'4F'. It is a solid vertical bar (|) on most computers keyboards. Some workstation programs map the solid vertical bar to the split vertical bar. The solid vertical bar is the "LOGICAL OR" operator in JAVA and NetRexx programs. In a pipeline, it indicates where one stage ends and another one begins.

Device Drivers
Device drivers are stages that interact with devices or other system resources. The simplest pipelines consist of two device drivers. Data read from one device moves through the pipeline to the other device. For example, to copy data from a file to your terminal, enter the following command (change TEST.DATA to the name of an existing file):

< test.data | LOCATE -mymatch- | COUNT | console

< test.data | LOCATE /mymatch/ | count | > output.file

Filters
A filter reads data records from its input stream, does some work using that data, and writes the results to its output stream. The difference between a filter and a device driver is that a filter does not interact with devices or other system resources, whereas a device driver lets you get data in and out of a pipeline.
The filters are stages that work on data records already in the pipeline. The COUNT stage used in the above example pipeline is a filter. It counts every record that flows into it from its input stream. Then it writes one record containing that count to its output stream. The LOCATE stage is also a filter. It examines the records from its input stream, looking for those that match a specified string. It also provides for Regular Expressions. If the record matches, LOCATE writes the record to its output stream. LOCATE discards records that do not match.

What is a Pipeline stall ?

The first pipeline below may cause a stall. By placing the stage ELASTIC in the second pipe segment will remediate the possible stall.

(example 1) < test.data | a: fanout | b: fanin | console ? a: | b:

(example 2) < test.data | a: fanout | b: fanin | console ? a: | elastic | b:

Terminology

SOURCE

This is a file, stem variables or literal that serves as data input.

SINK

This is an file or console that final output records are directed towards.

The currently available filter stages are:

ALL
APPEND
BETWEEN
BUFFER
CHANGE
CDELETE
CHOP
CINSERT
CONCAT
COPY
CONSOLE
COVERLAY
COUNT
DROP
DROPNULL
DUPLICATE
ELASTIC
FANIN
FANINANY
FANOUT
FIND
HEXLATE
HOLE
JOIN
LITERAL
PICK
LOCATE
NLOCATE
NOTALL
PAD
PREPEND
REVERSE
SHOWALL
SORT
SPECS
SPLIT
STEM
STRAIN
STRFIND
STRIP
TAKE
TIMESTAMP
Unix2Win
Win2Unix
WHITEOUT
XLATE
< (read file input (source))
> (replace or create file output (sink), or CONSOLE)
>> (write and append to sink file output)
File as primary source
File or CONSOLE as sinks
| (the default Stage Separator)
? (the default Pipe End or Pipe Segment End Character)
(...) (Options: TRACE STAGESEP ENDCHAR ESCAPE)

Pipeline:
stagecmd stagesep stagecmd

Stage:
stagecmd

Label Group:
label:
label stagecmd

Opt A:
(ENDCHAR char)
(ESCAPE)
(STAGESEP char)
(TRACE)

Use the PIPE command or the PIPELINES script to invoke JAVA Pipelines.

Operands

stagesep
is the stage separator character, which separates a stage from a
following stage. By default, the stage separator character is the
character on your terminal with a value of X'7C' on ASCII systems
and X'4F' on EBCDIC systems. (It is the solid
vertical bar on most terminals.) However, you can use the STAGESEP or
SEPARATOR option to assign a different stage separator character. You
cannot specify left parenthesis, right parenthesis, asterisk (*),
period, colon (:), or blank for the stage separator character.

endchar
is the pipeline end character defined by the ENDCHAR option. Use
endchar to separate multiple pipelines on a single PIPE command. You
must specify the ENDCHAR option to use endchar. You cannot specify
left parenthesis, right parenthesis, asterisk (*), period, colon (:),
or blank for endchar.

escape
assigns an escape character, char, that can be used to override the
processing of a character that has special meaning to the PIPE command. These special characters include the stage separator character, the pipeline end character (if defined), and the escape character (if defined). Left parenthesis,
right parenthesis, asterisk (*), period and colon (:) may have a special meaning,
depending on their placement. You must place the escape character IMMEDIATELY
before the character that you do not want treated as a special character. The escape character must be specified if used as a single character.
You cannot specify left parenthesis, right parenthesis, asterisk (*), period,
colon (:), or blank for the escape character. There is no default escape character.
You cannot specify the ESCAPE option for an individual stage.

label
is a label that identifies where a stream enters or leaves a particular stage that
has multiple input or output streams.
The first occurrence of a label is called a
label definition. It establishes the potential for intersecting pipelines to be attached at the position
in the pipeline where it is specified. Each subsequent use of the same label is
called a label reference.
Use label references to define additional input and output
streams for the stage. To use a label reference, specify a stage
containing only label with no stage.

A label is a string of up to 8 alphanumeric characters. A label must
be immediately followed by a stream identifier or a colon with no
intervening blanks.

Example: the following will input two files into separate streams, then after each have read a record the two records will be concatenated and placed as a single record in the output file. This will continue for all records.

< /home/temp/file1.txt | b: concat | > combined.txt ? < /home/temp/file2.txt | b:

operands
are any operands valid for the specified built-in stage or
user-written stage.

I/O direction indicators
Use of both <, >, and >> must be space delimited and separated from any
operands on either side.

Options
You can specify options in two ways on a PIPE command:

1. You can specify options immediately after the command name, PIPE. In this case, the scope of the options is global to the entire PIPE command.

2. You can specify options at the beginning of a stage. In this case, the scope of the options is limited to the stage on which the options are specified. If a label definition is specified, the options must follow the label definition.

The following options cannot be specified at the beginning of a stage:
ENDCHAR, ESCAPE, NAME, STAGESEP, and SEPARATOR.

Options specified on a stage override options specified globally on a PIPE command. You must enclose options in parentheses.

ENDCHAR char
defines the pipeline end character. You can specify the character as a single character, char, or the 2-character hexadecimal representation of a character, hexchar. Do not enclose the hexadecimal representation in quotation marks.

You cannot specify left parenthesis, right parenthesis, asterisk (*), period, colon (:), or blank as the pipeline end character. You cannot specify ENDCHAR as an option for an individual stage.

ESCAPE char
assigns an escape character, char, that can be used to override the processing of a character that has special meaning to the PIPE command. These special characters include the stage separator character, the pipeline end character (if defined), and the escape character (if defined). Left parenthesis, right parenthesis, asterisk
(*), period and colon (:) may have a special meaning, depending on their placement. You must place the escape character immediately before the character that you do not want treated as a special character. The escape character can be specified as a single character, char.

You cannot specify left parenthesis, right parenthesis, asterisk (*), period, colon (:), or blank for the escape character. There is no default escape character. You cannot specify the ESCAPE option for an individual stage.

STAGESEP char
assigns the stage separator character. Use the stage separator character to separate the specification of a stage from a subsequent stage. The character can be specified as a single character, char, or the 2-character hexadecimal representation of a character, hexchar. Do not enclose the hexadecimal representation in quotation marks.

If you change the definition of the stage separator to a character other than the default stage separator, you can use the default stage separator as an argument of a stage.

You cannot specify left parenthesis, right parenthesis, asterisk (*), period, colon (:), or blank as the STAGESEP or SEPARATOR character. You cannot specify STAGESEP or SEPARATOR as an option for an individual stage.

TRACE
displays trace information. TRACE is useful for debugging pipeline application programs. This option can cause a large amount of data to be displayed.

Usage Notes

1. Specifying a stage defines both the primary input and primary output streams for that stage. Using label references defines additional input and output streams for a stage.

2. The stages of a PIPE command can write records up to 2**(7) - 1 bytes in length.

3. If a PIPE command is too long to type conveniently on the command line you could put each stage on a separate line adhering to the JAVA or NetRexx continuation rules.

4. The escape character, stage separator character, and pipeline end character have no effect within the specification of options. These characters take effect only outside the parentheses in which the options are enclosed.

For example, in the following PIPE command, only the third and fourth occurrences of the @ character are treated as a stage separator character.

(stagesep @) < INPUT FILE @ locate /||/ @ console

This command displays any lines of the file INPUT FILE that contain the character string, ||. The first occurrence of the @ character defines @ as the stage separator. Because the second occurrence of
the @ character appears within the specification of the options, the second @ is treated as part of the name assigned to the pipeline, @PIPE1.

5. When specifying the ENDCHAR, ESCAPE, SEPARATOR, or STAGESEP option, you can use the 2-character hexchar form to define the character. However, when the character is subsequently used in the pipeline, a single character must be specified.

(stagesep 6C) < test.file % find ABcd % console

Note that 6C is the hexadecimal value of the percent sign (%).

6. A colon may NOT be used in a file name.

For more information about using pipelines, see the IBM CMS Pipelines Reference and Pipelines User's Guide at the IBM website .

[Return to Index]

[Return to Cullen Programming Home Page]