Pipelines by tenfiftytwo |
Pipelines v2.1 |
Pipelines is inspired by
and based upon CMS Pipelines; an enterprise systems utility originally designed
and developed by IBM.
Pipelines, which
executes under the umbrella of ooRexx, allows you to modify the contents of a text/data
file or files, quickly and easily. You can specify that only certain sections
of a file are to be changed; you can confine those changes to a column, word or
field range, translate words and phrases, discard or insert new lines of data.
You can perform a whole range of operations on a file or files, using only a
simple set of commands. You may find Pipelines useful for data-mining or
updating extremely large log-files; searching for and replacing values based on
simple pattern matching or complex expressions using regular expression
parsing, in reverse record-order if necessary. A pipeline can call third-party
WIN32 programs, issue system (CMD) and Powershell commands; capturing console
output in order to operate on the data. You can connect multi-purpose pipelines
together to quickly construct an on the fly solution to a wide range of
transformation problems that might otherwise consume a great deal of your time.
Pipelines build’s on the concept of directing the output of one process
to the input of another, commonly known as pipelining. It is an old idea and
almost all operating systems support an implementation of varying degree of
usefulness. In general they support the linear, single-stream model; where if
you lay each process out in a straight line, data starts in the first process,
passing into the next where it is changed in some way, and so on down the
pipeline chain in a sequential fashion until it reaches a sink. For example:
stage1 | stage2 | stage3 | ... | sink
Pipelines builds on this mechanism; allowing you to create multi-stream pipelines,
where the topology is no longer horizontal and linear, but two-dimensional;
where the records travel up and down the pipeline chain through intersecting
joints which control the flow of data. Multi-stream pipelines allow you to
select and operate on specific sets of records; routing unselected records
through a joint into and out of other sections of the pipeline.
● |
Pipelines treats its input data as lines or records,
reading them one at a time from its input and writing them one at a time to its
output. As such and unless the entire input needs to be loaded into
memory-storage; Pipelines only consumes a fraction of the memory that might
otherwise be required, as only a handful of records are ever in the pipeline at any one time. |
● |
Pipelines allows you to operate on files of any
size in a single pass; isolating sections of the file without having to
needlessly buffer or sort the data simply in order to maintain the relative record order.
Consider the following simple pipeline, which, utilising only 6 stages, reads the file: myfile.txt and in a single-pass,
changes the word hello to goodbye only in records that contain
the word friend. **** Top of file **** 01 Address Rxpipe 02 03 ‘pipe (endchar ?)’, 04 ‘< myfile.txt’, 05 ‘| a: locate /friend/’, 06 ‘| change /hello/ /goodbye/ 07 ‘| b: faninany‘, 08 ‘| > myfile.txt’, 09 ‘?’, 10 ‘a: 11 ‘| take *’, 12 ‘| b:’ 13 14 Exit 0 **** End of file ****
|
● |
Pipelines comprises a range of input, output, selection and
transformation stages which provide
a number of useful manipulation functions, including; splitting records,
stripping characters, joining records, collating and sorting and more. On the
whole, similar operations are performed by a single stage; which means that you do not have to remember the names of
an unnecessarily lengthy list of stages.
For example; stripping characters from a record, Pipelines provides a single stage called STRIP which removes
characters from the beginning and/or the end of a record. |
● |
Pipelines also has a number of sub-commands; PEEKTO, READTO and OUTPUT that can be used
to create ooRexx scripts that work as user-defined stages. |
● |
Pipelines is general purpose; it has not been developed with any particular field
in mind, it is simply a line/record orientated textual processing utility
that is useful for manipulating data. The design of Pipelines is essentially
a compromise between speed and flexibility. A bespoke, dedicated program may
out-perform Pipelines, However, with a dedicated program; each time your
requirements change that means altering the source code (if it is compiled;
that means re-building it as well). This is not a problem when the program is
small or simplistic. But, when we start to talk about pattern, field, word
and column selection, recursive sorting, collating, splitting and joining
records from multiple input files, possibly large files, then we have a
different scenario. Pipelines is designed with this
type of processing in mind; it is intended to offer a quick and efficient
processing utility that can help you manipulate data into a format that suits
your needs. |
● |
Pipelines itself is extensible; it comprises an a stage command Visual Studio/VC++ (VS8 VC++ 9) Stage command API library which contains all the stage initialisation
parsing functions and runtime extraction routines that support the current
set of builtin stage filters. The
API allows you to create new stage DLL’s that augment the current builtin set. The API addresses’ most
of the needs that a stage might reasonably require; console locking and
synchronisation, multi-stream connectivity, multiple column, word and field
isolation, pre-process functionality, character range expansions, input and
output record availability and more. Pipelines ships with a DEBUG and RELEASE
API library version. The Pipelines Stage command API utilises the Microsoft
Foundation Class (MFC) CString class extensively
and other MFC specific classes under the covers, as and when required. |
● |
Pipelines supports third-party non-API WIN32 console applications/modules
through the SHELLEXECUTE stage command. SHELLEXECUTE will load and service
any WIN32 application; reading input records from that process’ STDOUT and STDERR
I/O streams; writing records to the SHELLEXECUTE stages’ primary and
secondary output streams, respectively. |
● |
Pipelines
provides a convenient and easy way to create a new
ooRexx script; simply right-click anywhere on your desktop or within a
folder, to access to the ‘New->Pipelines file’ option. Selecting this
option will create a very simple skeleton ooRexx file; ext (.REX). File
associations under Windows can be a troublesome, especially when you try to
re-name a file by extension - using this method; you can create a new ooRexx
file with the minimum of effort. |
You test may well find that
ooRexx/Pipelines will help you solve your problem, quickly and easily, saving
you time and effort that could be better spent on other tasks. Pipelines is free; there are no evaluation caveats, you may
download it and use it as you please.
Pipelines is designed and maintained by me and you can download
it here.
|