Command Pipelines

Monday, May 31, 2010

Command Pipelines

Pipes are easy. The Unix shells provide mechanisms which you can use them to allow you to generate remarkably sophisticated `programs' out of simple components. We call that a pipeline. A pipeline is composed of a data generator, a series of filters, and a data consumer. Often that final stage is as simple as displaying the final output on stdout, and sometimes the first stage is as simple as reading from stdin. I think all shells use the "|" character to separate each stage of a pipeline. So:

data-generator | filter | ... | filter | data-consumer

Each stage of the pipeline runs in parallel, within the limits which the system permits. Hey, look closely, because that last phrase is important. Are you on a uni-processor system because if you are, then obviously only one process runs at a time, although that point is simply nitpicking. But pipes are buffers capable of holding only finite data. A process can write into a pipe until that pipe is full. When the pipe is full the process writing into it blocks until some of the data already in the pipe has been read. Similarly, a process can read from a pipe until that pipe is empty. When it's empty the reading process is blocked until some more data has been written into the pipe.