Input streams—and how to create, use, and detect the end of them—and filtered input streams, which can be nested to great effect
Output streams, that are mostly analogous to (but the inverse of) input streams

You'll also learn about two stream interfaces that make the reading and writing of typed streams much easier (as well as about several utility classes used to access the file system). Let's begin with a little history behind the invention of streams.

A pipe is an uninterpreted stream of bytes that can be used for communicating between programs (or other "forked" copies of your own program) or for reading and writing to arbitrary peripheral devices and files.

One of the early inventions of the UNIX operating system was the pipe. By unifying many disparate ways of communicating into a single metaphor, UNIX paved the way for a whole series of related inventions, culminating in the abstraction known as streams.

A stream is a path of communication between the source of some information and its destination.

This information, an uninterpreted byte stream, can come from any "pipe source," the computer's memory, or even from the Internet. In fact, the source and destination of a stream are completely arbitrary producers and consumers of bytes, respectively. Therein lies the power of the abstraction. You don't need to know about the source of the information when reading from a stream, and you don't need to know about the final destination when writing to one.

General-purpose methods that can read from any source accept a stream argument to specify that source; general methods for writing accept a stream to specify the destination. Arbitrary processors (or filters) of data have two stream arguments. They read from the first, process the data, and write the results to the second. These processors have no idea of either the source or the destination of the data they are processing. Sources and destinations can vary widely: from two memory buffers on the same local computer, to the ELF transmissions to and from a submarine at sea, to the real-time data streams of a NASA probe in deep space.

By decoupling the consuming, processing, or producing of data from the sources and destinations of that data, you can mix and match any combination of them at will as you write your program. In the future, when new, previously nonexistent forms of source or destination (or consumer, processor, or producer) appear, they can be used within the same framework, with no changes to your classes. In addition, new stream abstractions, supporting higher levels of interpretation "on top of" the bytes, can be written completely independently of the underlying transport mechanisms for the bytes themselves.

The foundations of this stream framework are the two abstract classes, InputStream and OutputStream. If you turn briefly to the diagram for java.io in Appendix B, you'll see that below these classes is a virtual cornucopia of categorized classes, demonstrating the wide range of streams in the system, but also demonstrating an extremely well-designed hierarchy of relationships between these streams, one well worth learning from. Let's begin with the parents and then work our way down this bushy tree.

Input Streams

All the methods you will explore today are declared to throw IOExceptions. This new subclass of Exception conceptually embodies all the possible I/O errors that might occur while using streams. Several subclasses of it define a few, more specific exceptions that can be thrown as well. For now, it is enough to know that you must either catch an IOException, or be in a method that can "pass it along," to be a well-behaved user of streams.

The abstract Class InputStream

InputStream is an abstract class that defines the fundamental ways in which a destination (consumer) reads a stream of bytes from some source. The identity of the source, and the manner of the creation and transport of the bytes, is irrelevant. When using an input stream, you are the destination of those bytes, and that's all you need to know.

read()

The most important method to the consumer of an input stream is the one that reads bytes from the source. This method, read(), comes in many flavors, and each is demonstrated in an example in today's lesson.

Each of these read() methods is defined to "block" (wait) until all the input requested becomes available. Don't worry about this limitation; because of multithreading, you can do as many other things as you like while this one thread is waiting for input. In fact, it is a common idiom to assign a thread to each stream of input (and for each stream of output) that is solely responsible for reading from it (or writing to it). These input threads might then "hand off" the information to other threads for processing. This naturally overlaps the I/O time of your program with its compute time.

Here's the first form of read():

InputStream  s      = getAnInputStreamFromSomewhere();
byte[]       buffer = new byte[1024];   // any size will do
if (s.read(buffer) != buffer.length)
    System.out.println("I got less than I expected.");

Note: Here, and throughout the rest of today's lesson, assume that either an import java.io.* appears before all the examples or that you mentally prefix all references to java.io classes with the prefix java.io.

This form of read() attempts to fill the entire buffer given. If it cannot (usually due to reaching the end of the input stream), it returns the actual number of bytes that were read into the buffer. After that, any further calls to read() return -1, indicating that you are at the end of the stream. Note that the if statement still works even in this case, because -1 != 1024 (this corresponds to an input stream with no bytes in it all).

Note: Don't forget that, unlike in C, the -1 case in Java is not used to indicate an error. Any I/O errors throw instances of IOException (which you're not catching yet). You learned on Day 17 that all uses of distinguished values can be replaced by the use of exceptions, and so they should. The -1 in the last example is a bit of a historical anachronism. You'll soon see a better approach to indicating end of the stream using the class DataInputStream.

You can also read into a "slice" of your buffer by specifying the offset into the buffer, and the length desired, as arguments to read():

s.read(buffer, 100, 300);

This example tries to fill in bytes 100 through 399 and behaves otherwise exactly the same as the previous read() method. In fact, in the current release, the default implementation of the former version of read() uses the latter:

public int  read(byte[]  buffer) throws IOException {
    return  read(buffer, 0, buffer.length);
}

Finally, you can read in bytes one at a time:

InputStream  s = getAnInputStreamFromSomewhere(); 

byte         b;
int          byteOrMinus1;
while ((byteOrMinus1 = s.read()) != -1) {
     b = (byte) byteOrMinus1;
     . . .    // process the byte b
}
. . .    // reached end of stream

Note: Because of the nature of integer promotion in Java in general, and because in this case the read() method returns an int, using the byte type in your code may be a little frustrating. You'll find yourself constantly having explicitly to cast the result of arithmetic expressions, or of int return values, back to your size. Because read() really should be returning a byte in this case, I feel justified in declaring and using it as such (despite the pain)—it makes the size of the data being read clearer. In cases where you feel the range of a variable is naturally limited to a byte (or a short) rather than an int, please take the time to declare it that way and pay the small price necessary to gain the added clarity. By the way, a lot of the Java class library code simply stores the result of read() in an int. This proves that even the Java team is human—everyone makes style mistakes.

skip()

What if you want to skip over some of the bytes in a stream, or start reading a stream from other than its beginning? A method similar to read() does the trick:

if (s.skip(1024) != 1024)
    System.out.println("I skipped less than I expected.");

This skips over the next 1024 bytes in the input stream. skip() takes and returns a long integer, because streams are not required to be limited to any particular size. The default implementation of skip in this release simply uses read():

public long  skip(long n) throws IOException {
    byte[]  buffer = new byte[(int) n];
    return  read(buffer);
}

Note: This implementation does not support large skips correctly, because its long argument is cast to an int. Subclasses must override this default implementation if they want to handle this more properly. This won't be as easy as you might think, because the current release of the Java system does not allow integer types larger than int to act as array subscripts.

available()

If for some reason you would like to know how many bytes are in the stream right now, you can ask:

if (s.available() < 1024)
    System.out.println("Too little is available right now.");

This tells you the number of bytes that you can read() without blocking. Because of the abstract nature of the source of these bytes, streams may or may not be able to tell you a reasonable answer to this question. For example, some streams always return 0. Unless you use specific subclasses of InputStream that you know provide a reasonable answer to this question, it's not a good idea to rely upon this method. Remember, multithreading eliminates many of the problems associated with blocking while waiting for a stream to fill again. Thus, one of the strongest rationales for the use of available() goes away.

mark() and reset()

Some streams support the notion of marking a position in the stream, and then later resetting the stream to that position to reread the bytes there. Clearly, the stream would have to "remember" all those bytes, so there is a limitation on how far apart in a stream the mark and its subsequent reset can occur. There's also a method that asks whether or not the stream supports the notion of marking at all. Here's an example:

InputStream  s = getAnInputStreamFromSomewhere();
if (s.markSupported()) {    // does s support the notion?
    . . .        // read the stream for a while
    s.mark(1024);
    . . .        // read less than 1024 more bytes
    s.reset();
    . . .        // we can now re-read those bytes
} else {
    . . .                   // no, perform some alternative
}

When marking a stream, you specify the maximum number of bytes you intend to allow to pass before resetting it. This allows the stream to limit the size of its byte "memory." If this number of bytes goes by and you have not yet reset(), the mark becomes invalid, and attempting to reset() will throw an exception.

Marking and resetting a stream is most valuable when you are attempting to identify the type of the stream (or the next part of the stream), but to do so, you must consume a significant piece of it in the process. Often, this is because you have several black-box parsers that you can hand the stream to, but they will consume some (unknown to you) number of bytes before making up their mind about whether the stream is of their type. Set a large size for the read limit above, and let each parser run until it either throws an error or completes a successful parse. If an error is thrown, reset() and try the next parser.

close()

Because you don't know what resources an open stream represents, nor how to deal with them properly when you're finished reading the stream, you should (usually) explicitly close down a stream so that it can release these resources. Of course, garbage collection and a finalization method can do this for you, but what if you need to reopen that stream or those resources before they have been freed by this asynchronous process? At best, this is annoying or confusing; at worst, it introduces an unexpected, obscure, and difficult-to-track-down bug. Because you're interacting with the outside world of external resources, it's safer to be explicit about when you're finished using them:

InputStream  s = alwaysMakesANewInputStream();
try {
    . . .     // use s to your heart's content
} finally {
    s.close();
}

Get used to this idiom (using finally); it's a useful way to be sure something (such as closing the stream) always gets done. Of course, you're assuming that the stream is always successfully created. If this is not always the case, and null is sometimes returned instead, here's the correct way to be safe:

InputStream  s = tryToMakeANewInputStream();
if (s != null) {
    try {
        . . .
    } finally {
        s.close();
    }
}

All input streams descend from the abstract class InputStream. All share in common the few methods described so far. Thus, stream s in the previous examples could have been any of the more complex input streams described in the next few sections.

ByteArrayInputStream

The "inverse" of some of the previous examples would be to create an input stream from an array of bytes. This is exactly what ByteArrayInputStream does:

byte[]  buffer = new byte[1024];
fillWithUsefulData(buffer);
InputStream  s = new ByteArrayInputStream(buffer);

Readers of the new stream s see a stream 1024 bytes long, containing the bytes in the array buffer. Just as read() has a form that takes an offset and a length, so does this class's constructor:

InputStream  s = new ByteArrayInputStream(buffer, 100, 300);

Here, the stream is 300 bytes long and consists of bytes 100-399 from the array buffer.

Note: Finally, you've seen your first examples of the creation of a stream. These new streams are attached to the simplest of all possible sources of data, an array of bytes in the memory of the local computer.

ByteArrayInputStreams simply implement the standard set of methods that all input streams do. Here, however, the available() method has a particularly simple job—it returns 1024 and 300, respectively, for the two instances of ByteArrayInputStream you created previously, because it knows exactly how many bytes are available. Finally, calling reset() on a ByteArrayInputStream resets it to the beginning of the stream (buffer), no matter where the mark is set.

FileInputStream

One of the most common uses of streams, and historically the earliest, is to attach them to files in the file system. Here, for example, is the creation of such an input stream on a UNIX system:

InputStream  s = new FileInputStream("/some/path/and/fileName");

Caution: Applets attempting to open, read, or write streams based on files in the file system can cause security violations (depending on the paranoia level set by the user of the browser). Try to create applets that do not depend on files at all, by using servers to hold shared information. If that's impossible, limit your applet's I/O to a single file or directory to which the user can easily assign file access permission. (Stand-alone Java programs have none of these problems, of course.)

You also can create the stream from a previously opened file descriptor:

int          fd = openInputFileInTraditionalUNIXWays();
InputStream  s  = new FileInputStream(fd);

In either case, because it's based on an actual (finite length) file, the input stream created can implement available() precisely and can skip() like a champ (just as ByteArrayInputStream can, by the way). In addition, FileInputStream knows a few more tricks:

FileInputStream  aFIS = new FileInputStream("aFileName");
int  myFD = aFIS.getFD();
/* aFIS.finalize(); */  // will call close() when automatically called by GC

Tip: To call the new methods, you must declare the stream variable aFIS to be of type FileInputStream, because plain InputStreams don't know about them.

The first is obvious: getFD() returns the file descriptor of the file on which the stream is based. The second, though, is an interesting shortcut that allows you to create FileInputStreams without worrying about closing them later. FileInputStream's implementation of finalize(), a protected method, closes the stream. Unlike in the contrived call in comments, you almost never can nor should call a finalize() method directly. The garbage collector calls it after noticing that the stream is no longer in use, but before actually destroying the stream. Thus, you can go merrily along using the stream, never closing it, and all will be well. The system takes care of closing it (eventually).

You can get away with this because streams based on files tie up very few resources, and these resources cannot be accidentally reused before garbage collection (these were the things worried about in the previous discussion of finalization and close()). Of course, if you were also writing to the file, you would have to be more careful. (Reopening the file too soon after writing might make it appear in an inconsistent state because the finalize()—and thus the close()—might not have happened yet). Just because you don't have to close the stream doesn't mean you might not want to do so anyway. For clarity, or if you don't know precisely what type of an InputStream you were handed, you might choose to call close() yourself.

FilterInputStream

This "abstract" class simply provides a "pass-through" for all the standard methods of InputStream. It holds inside itself another stream, by definition one further "down" the chain of filters, to which it forwards all method calls. It implements nothing new but allows itself to be nested:

InputStream        s  = getAnInputStreamFromSomewhere();
FilterInputStream  s1 = new FilterInputStream(s);
FilterInputStream  s2 = new FilterInputStream(s1);
FilterInputStream  s3 = new FilterInputStream(s2);
... s3.read() ...

Whenever a read is performed on the filtered stream s3, it passes along the request to s2; then s2 does the same to s1, and finally s is asked to provide the bytes. Subclasses of FilterInputStream will, of course, do some nontrivial processing of the bytes as they flow past. The rather verbose form of "chaining" in the previous example can be made more elegant:

s3 = new FilterInputStream(new FilterInputStream(new FilterInputStream(s)));

You should use this idiom in your code whenever you can. It clearly expresses the nesting of chained filters, and can easily be parsed and "read aloud" by starting at the innermost stream s and reading outward—each filter stream applying to the one within—until you reach the outermost stream s3.

Note: FilterInputStream is called "abstract," rather than abstract, because it is not actually declared to be abstract. This means that, as useless as they are, you can create instances of FilterInputStream directly. The same will hold for its output stream "brother" class, described later today.

Now let's examine each of the subclasses of FilterInputStream in turn.

BufferedInputStream

This is one of the most valuable of all streams. It implements the full complement of InputStream's methods, but it does so by using a buffered array of bytes that acts as a cache for future reading. This decouples the rate and the size of the "chunks" you're reading from the more regular, larger block sizes in which streams are most efficiently read (from, for example, peripheral devices, files in the file system, or the network). It also allows smart streams to read ahead when they expect that you will want more data soon.

Because the buffering of BufferedInputStream is so valuable, and it's also the only class to handle mark() and reset() properly, you might wish that every input stream could somehow share its valuable capabilities. Normally, because those stream classes do not implement them, you would be out of luck. Fortunately, you already saw a way that filter streams can wrap themselves "around" other streams. Suppose that you would like a buffered FileInputStream that can handle marking and resetting correctly. Et voilà:

InputStream  s = new BufferedInputStream(new FileInputStream("foo"));

You have a buffered input stream based on the file "foo" that can mark() and reset().

Now you can begin to see the power of nesting streams. Any capability provided by a filter input stream (or output stream, as you'll see soon) can be used by any other, basic stream via nesting. Of course, any combination of these capabilities, and in any order, can be as easily accomplished by nesting the filter streams themselves.

DataInputStream

All the methods that instances of this class understand are defined in a separate interface, which both DataInputStream and RandomAccessFile (another class in java.io) implement. This interface is general-purpose enough that you might want to use it yourself in the classes you create. It is called DataInput.

The DataInput Interface

When you begin using streams to any degree, you'll quickly discover that byte streams are not a really helpful format into which to force all data. In particular, the primitive types of the Java language embody a rather nice way of looking at data, but with the streams you've been defining thus far in this book, you could not read data of these types. The DataInput interface specifies a higher-level set of methods that, when used for both reading and writing, can support a more complex, typed stream of data. Here are the set of methods this interface defines:

void  readFully(byte[]  buffer)                           throws IOException;
void  readFully(byte[]  buffer, int  offset, int  length) throws IOException;
int   skipBytes(int n)                                    throws IOException;
boolean  readBoolean()       throws IOException;
byte     readByte()          throws IOException;
int      readUnsignedByte()  throws IOException;
short    readShort()         throws IOException;
int      readUnsignedShort() throws IOException;
char     readChar()          throws IOException;
int      readInt()           throws IOException;
long     readLong()          throws IOException;
float    readFloat()         throws IOException;
double   readDouble()        throws IOException;
String   readLine()          throws IOException;
String   readUTF()           throws IOException;

The first three methods are simply new names for skip() and the two forms of read() you've seen previously. Each of the next ten methods reads in a primitive type, or its unsigned counterpart (useful for using every bit efficiently in a binary stream). These latter methods must return an integer of a wider size than you might think; because integers are signed in Java, the unsigned value does not fit in anything smaller. The final two methods read a newline ('\r', '\n', or "\r\n") terminated string of characters from the stream—the first in ASCII, and the second in Unicode.

Now that you know what the interface that DataInputStream implements looks like, let's see it in action:

DataInputStream  s = new DataInputStream(getNumericInputStream());
long  size = s.readLong();    // the number of items in the stream
while (size— > 0) {
    if (s.readBoolean()) {    // should I process this item?
        int     anInteger     = s.readInt();
        int     magicBitFlags = s.readUnsignedShort();
        double  aDouble       = s.readDouble();
        if ((magicBitFlags & 0100000) != 0) {
            . . .    // high bit set, do something special
        }
        . . .    // process anInteger and aDouble
    }
}

Because the class implements an interface for all its methods, you can also use the following interface:

DataInput  d = new DataInputStream(new FileInputStream("anything"));
String     line;
while ((line = d.readLine()) != null) {
    . . .     // process the line
}

The EOFException

One final point about most of DataInputStream's methods: when the end of the stream is reached, they throw an EOFException. This is tremendously useful and, in fact, allows you to rewrite all the kludgy uses of -1 you saw earlier today in a much nicer fashion:

DataInputStream  s = new DataInputStream(getAnInputStreamFromSomewhere());
try {
    while (true) {
        byte  b = (byte) s.readByte();
        . . .    // process the byte b
    }
} catch (EOFException e) {
    . . .    // reached end of stream
}

This works just as well for all but the last two of the read methods of DataInputStream.

Caution: skipBytes() does nothing at all on end of stream, readLine() returns null, and readUTF() might throw a UTFDataFormatException, if it notices the problem at all.

LineNumberInputStream

In an editor or a debugger, line numbering is crucial. To add this valuable capability to your programs, use the filter stream LineNumberInputStream, which keeps track of line numbers as its stream "flows through" it. It's even smart enough to remember a line number and later restore it, during a mark() and reset(). You might use this class as follows:

LineNumberInputStream  aLNIS;
aLNIS = new LineNumberInputStream(new FileInputStream("source"));
DataInputStream  s = new DataInputStream(aLNIS);
String           line;
while ((line = s.readLine()) != null) {
    . . .    // process the line
    System.out.println("Did line number: " + aLNIS.getLineNumber());
}

Here, two filter streams are nested around the FileInputStream actually providing the data—the first to read lines one at a time and the second to keep track of the line numbers of these lines as they go by. You must explicitly name the intermediate filter stream, aLNIS, because if you did not, you couldn't call getLineNumber() later. Note that if you invert the order of the nested streams, reading from the DataInputStream does not cause the LineNumberInputStream to "see" the lines.

You must put any filter streams acting as "monitors" in the middle of the chain and "pull" the data from the outermost filter stream so that the data will pass through each of the monitors in turn. In the same way, buffering should occur as far inside the chain as possible, because it won't be able to do its job properly unless most of the streams that need buffering come after it in the flow. For example, here's a silly order:

new BufferedInputStream(new LineNumberInputStream(
            _new DataInputStream(new FileInputStream("foo"));

and here's a much better order:

new DataInputStream(new LineNumberInputStream(
            _new BufferedInputStream(new FileInputStream("foo"));

LineNumberInputStreams can also be told to setLineNumber(), for those few times when you know more than they do.

PushbackInputStream

The filter stream class PushbackInputStream is commonly used in parsers, to "push back" a single character in the input (after reading it) while trying to determine what to do next—a simplified version of the mark() and reset() utility you learned about earlier. Its only addition to the standard set of InputStream methods is unread(), which as you might guess, pretends that it never read the byte passed in as its argument, and then gives that byte back as the return value of the next read().

The following is a simple implementation of readLine() using this class:

public class  SimpleLineReader {
    private FilterInputStream  s;
    public  SimpleLineReader(InputStream  anIS) {
        s = new DataInputStream(anIS);
    }
    . . .    // other read() methods using stream s
    public String  readLine() throws IOException {
        char[]  buffer = new char[100];
        int     offset = 0;
        byte    thisByte;
        try {
loop:        while (offset < buffer.length) {
                switch (thisByte = (byte) s.read()) {
                    case '\n':
                        break loop;
                    case '\r':
                        byte  nextByte = (byte) s.read();
                        if (nextByte != '\n') {
                            if (!(s instanceof PushbackInputStream)) {
                                s = new PushbackInputStream(s);
                            }
                            ((PushbackInputStream) s).unread(nextByte);
                        }
                        break loop;
                    default:
                        buffer[offset++] = (char) thisByte;
                        break;
                }
            }
        } catch (EOFException e) {
            if (offset == 0)
                return null;
        }
        return String.copyValueOf(buffer, 0, offset);
    }
}

This demonstrates numerous things. For the purpose of this example, readLine() is restricted to reading the first 100 characters of the line. In this respect, it demonstrates how not to write a general-purpose line processor (you should be able to read any size line). It also reminds you how to break out of an outer loop, and how to produce a String from an array of characters (in this case, from a "slice" of the array of characters). This example also includes standard uses of InputStream's read() for reading bytes one at a time, and of determining the end of the stream by enclosing it in a DataInputStream and catching EOFException.

One of the more unusual aspects of the example is the way PushbackInputStream is used. To be sure that '\n' is ignored following '\r' you have to "look ahead" one character; but if it is not a '\n', you must push back that character. Look at the next two lines as if you didn't know much about the stream s. The general technique used is instructive. First, you see whether s is already an instanceof some kind of PushbackInputStream. If so, you can simply use it. If not, you enclose the current stream (whatever it is) inside a new PushbackInputStream and use this new stream. Now, let's jump back into the context of the example.

The line following wants to call the method unread(). The problem is that s has a "compile-time type" of FilterInputStream, and thus doesn't understand that method. The previous two lines have guaranteed, however, that the run-time type of the stream in s is PushbackInputStream, so you can safely cast it to that type and then safely call unread().

Note: This example was done in an unusual way for demonstration purposes. You could have simply declared a PushbackInputStream variable and always enclosed the DataInputStream in it. (Conversely, SimpleLineReader's constructor could have checked whether its argument was already of the right class, the way PushbackInputStream did, before creating a new DataInputStream.) The interesting thing about this approach of "wrapping a class only when needed" is that it works for any InputStream that you hand it, and it does additional work only if it needs to. Both of these are good general design principles.

All the subclasses of FilterInputStream have now been described. It's time to return to the direct subclasses of InputStream.

PipedInputStream

This class, along with its "brother" class PipedOutputStream, are covered later today (they need to be understood and demonstrated together). For now, all you need to know is that together they create a simple, two-way communication conduit between threads.

SequenceInputStream

Suppose you have two separate streams, and you would like to make a composite stream that consists of one stream followed by the other (like appending two Strings together). This is exactly what SequenceInputStream was created for:

InputStream  s1 = new FileInputStream("theFirstPart");
InputStream  s2 = new FileInputStream("theRest");
InputStream  s  = new SequenceInputStream(s1, s2);
... s.read() ...   // reads from each stream in turn

You could have "faked" this example by reading each file in turn—but what if you had to hand the composite stream s to some other method that was expecting only a single InputStream? Here's an example (using s) that line-numbers the two previous files with a common numbering scheme:

LineNumberInputStream  aLNIS = new LineNumberInputStream(s);
... aLNIS.getLineNumber() ...

Note: Stringing together streams this way is especially useful when the streams are of unknown length and origin, and were just handed to you by someone else.

What if you want to string together more than two streams? You could try the following:

Vector  v = new Vector();
. . .   // set up all the streams and add each to the Vector
InputStream  s1 = new SequenceInputStream(v.elementAt(0), v.elementAt(1));
InputStream  s2 = new SequenceInputStream(s1, v.elementAt(2));
InputStream  s3 = new SequenceInputStream(s2, v.elementAt(3));
. . .

Note: A Vector is a growable array of objects that can be filled, referenced (with elementAt()) and enumerated.

However, it's much easier to use a different constructor that SequenceInputStream provides:

InputStream  s  = new SequenceInputStream(v.elements());

It takes an enumeration of all the streams you wish to combine and returns a single stream that reads through the data of each in turn.

StringBufferInputStream

StringBufferInputStream is exactly like ByteArrayInputStream, but instead of being based on a byte array, it's based on an array of characters (a String):

String       buffer = "Now is the time for all good men to come...";
InputStream  s      = new StringBufferInputStream(buffer);

All comments that were made about ByteArrayInputStream apply here as well. (See the earlier section on that class.)

Note: StringBufferInputStream is a bit of a misnomer, because this input stream is actually based on a String. It should really be called StringInputStream.

Output Streams

Output streams are, in almost every case, paired with a "brother" InputStream that you've already learned. If an InputStream performs a certain operation, the "brother" OutputStream performs the inverse operation. You'll see more of what this means soon.

The abstract Class OutputStream

OutputStream is the abstract class that defines the fundamental ways in which a source (producer) writes a stream of bytes to some destination. The identity of the destination, and the manner of the transport and storage of the bytes, is irrelevant. When using an output stream, you are the source of those bytes, and that's all you need to know.

write()

The most important method to the producer of an output stream is the one that writes bytes to the destination. This method, write(), comes in many flavors, each demonstrated in an example below.

Note: Every one of these write() methods is defined to "block" (wait) until all the output requested has been written. You don't need to worry about this limitation—see the note under InputStream's read() method if you don't remember why.

OutputStream  s      = getAnOutputStreamFromSomewhere();
byte[]        buffer = new byte[1024];    // any size will do
fillInData(buffer);    // the data we want to output
s.write(buffer);

You also can write a "slice" of your buffer by specifying the offset into the buffer, and the length desired, as arguments to write():

s.write(buffer, 100, 300);

This writes out bytes 100 through 399 and behaves otherwise exactly the same as the previous write() method. In fact, in the current release, the default implementation of the former version of write() uses the latter:

public void  write(byte[]  buffer) throws IOException {
    write(buffer, 0, buffer.length);
}

Finally, you can write out bytes one at a time:

while (thereAreMoreBytesToOutput()) {
    byte  b = getNextByteForOutput();
    s.write(b);
}

flush()

Because you don't know what an output stream is connected to, you might be required to "flush" your output through some buffered cache to get it to be written (in a timely manner, or at all). OutputStream's version of this method does nothing, but it is expected that subclasses that require flushing (for example, BufferedOutputStream and PrintStream) will override this version to do something nontrivial.

close()

Just like for an InputStream, you should (usually) explicitly close down an OutputStream so that it can release any resources it may have reserved on your behalf. (All the same notes and examples from InputStream's close() method apply here, with the prefix In replaced everywhere by Out.)

All output streams descend from the abstract class OutputStream. All share the previous few methods in common.

ByteArrayOutputStream

The inverse of ByteArrayInputStream , which creates an input stream from an array of bytes, is ByteArrayOutputStream, which directs an output stream into an array of bytes:

OutputStream  s = new ByteArrayOutputStream();
s.write(123);
. . .

The size of the (internal) byte array grows as needed to store a stream of any length. You can provide an initial capacity as an aid to the class, if you like:

OutputStream  s = new ByteArrayOutputStream(1024 * 1024);  // 1 Megabyte

Note: You've just seen your first examples of the creation of an output stream. These new streams were attached to the simplest of all possible destinations of data, an array of bytes in the memory of the local computer.

Once the ByteArrayOutputStream s has been "filled," it can be output to another output stream:

OutputStream           anotherOutputStream = getTheOtherOutputStream(); 
ByteArrayOutputStream  s = new ByteArrayOutputStream();
fillWithUsefulData(s);
s.writeTo(anotherOutputStream);

It also can be extracted as a byte array or converted to a String:

byte[]  buffer              = s.toByteArray();
String  bufferString        = s.toString();
String  bufferUnicodeString = s.toString(upperByteValue);

Note: The last method allows you to "fake" Unicode (16-bit) characters by filling in their lower bytes with ASCII and then specifying a common upper byte (usually 0) to create a Unicode String result.

ByteArrayOutputStreams have two utility methods: one simply returns the current number of bytes stored in the internal byte array, and the other resets the array so that the stream can be rewritten from the beginning:

int  sizeOfMyByteArray = s.size();
s.reset();     // s.size() would now return 0
s.write(123);
. . .

FileOutputStream

One of the most common uses of streams is to attach them to files in the file system. Here, for example, is the creation of such an output stream on a UNIX system:

OutputStream  s = new FileOutputStream("/some/path/and/fileName");

Caution: Applets attempting to open, read, or write streams based on files in the file system can cause security violations. See the note under FileInputStream for more details.

You also can create the stream from a previously opened file descriptor:

int           fd = openOutputFileInTraditionalUNIXWays();
OutputStream  s  = new FileOutputStream(fd);

FileOutputStream is the inverse of FileInputStream, and it knows the same tricks:

FileOutputStream  aFOS = new FileOutputStream("aFileName");
int  myFD = aFOS.getFD();
/* aFOS.finalize(); */  // will call close() when automatically called by GC

Note: To call the new methods, you must declare the stream variable aFOS to be of type FileOutputStream, because plain OutputStreams don't know about them.

The first is obvious: getFD() simply returns the file descriptor for the file on which the stream is based. The second, commented, contrived call to finalize() is there to remind you that you may not have to worry about closing this type of stream—it is done for you automatically. (See the discussion under FileInputStream for more.)

FilterOutputStream

This "abstract" class simply provides a "pass-through" for all the standard methods of OutputStream. It holds inside itself another stream, by definition one further "down" the chain of filters, to which it forwards all method calls. It implements nothing new but allows itself to be nested:

OutputStream        s  = getAnOutputStreamFromSomewhere();
FilterOutputStream  s1 = new FilterOutputStream(s);
FilterOutputStream  s2 = new FilterOutputStream(s1);
FilterOutputStream  s3 = new FilterOutputStream(s2);
... s3.write(123) ...

Whenever a write is performed on the filtered stream s3, it passes along the request to s2. Then s2 does the same to s1, and finally s is asked to output the bytes. Subclasses of FilterOutputStream, of course, do some nontrivial processing of the bytes as they flow past. This chain can be tightly nested—see its "brother" class, FilterInputStream for more.

Now let's examine each of the subclasses of FilterOutputStream in turn.

BufferedOutputStream

BufferedOutputStream is one of the most valuable of all streams. All it does is implement the full complement of OutputStream's methods, but it does so by using a buffered array of bytes that acts as a cache for writing. This decouples the rate and the size of the "chunks" you're writing from the more regular, larger block sizes in which streams are most efficiently written (to peripheral devices, files in the file system, or the network, for example).

BufferedOutputStream is one of two classes in the Java library to implement flush(), which pushes the bytes you've written through the buffer and out the other side. Because buffering is so valuable, you might wish that every output stream could somehow be buffered. Fortunately, you can surround any output stream in such a way as to achieve just that:

OutputStream  s = new BufferedOutputStream(new FileOutputStream("foo"));

You now have a buffered output stream based on the file "foo" that can be flush()ed.

Just as for filter input streams, any capability provided by a filter output stream can be used by any other basic stream via nesting and any combination of these capabilities, in any order, can be as easily accomplished by nesting the filter streams themselves.

DataOutputStream

All the methods that instances of this class understand are defined in a separate interface, which both DataOutputStream and RandomAccessFile implement. This interface is general-purpose enough that you might want use it yourself in the classes you create. It is called DataOutput.

The DataOutput Interface

In cooperation with its "brother" inverse interface, DataInput, DataOutput provides a higher-level, typed-stream approach to the reading and writing of data. Rather than dealing with bytes, this interface deals with writing the primitive types of the Java language directly:

void  write(int i)                                    throws IOException;
void  write(byte[]  buffer)                           throws IOException;
void  write(byte[]  buffer, int  offset, int  length) throws IOException;
void  writeBoolean(boolean b) throws IOException;
void  writeByte(int i)        throws IOException;
void  writeShort(int i)       throws IOException;
void  writeChar(int i)        throws IOException;
void  writeInt(int i)         throws IOException;
void  writeLong(long l)       throws IOException;
void  writeFloat(float f)     throws IOException;
void  writeDouble(double d)   throws IOException;
void  writeBytes(String s) throws IOException;
void  writeChars(String s) throws IOException;
void  writeUTF(String s)   throws IOException;

Most of these methods have counterparts in the interface DataInput.

The first three methods mirror the three forms of write() you saw previously. Each of the next eight methods write out a primitive type. The final three methods write out a string of bytes or characters to the stream: the first one as 8-bit bytes; the second, as 16-bit Unicode characters; and the last, as a special Unicode stream (readable by DataInput's readUTF()).

Note: The unsigned read methods in DataInput have no counterparts here. You can write out the data they need via DataOutput's signed methods because they accept int arguments and also because they write out the correct number of bits for the unsigned integer of a given size as a side effect of writing out the signed integer of that same size. It is the method that reads this integer that must interpret the sign bit correctly; the writer's job is easy.

Now that you know what the interface that DataOutputStream implements looks like, let's see it in action:

DataOutputStream  s    = new DataOutputStream(getNumericOutputStream());
long              size = getNumberOfItemsInNumericStream();
s.writeLong(size);
for (int  i = 0;  i < size;  ++i) {
    if (shouldProcessNumber(i)) {
        s.writeBoolean(true);     // should process this item
        s.writeInt(theIntegerForItemNumber(i));
        s.writeShort(theMagicBitFlagsForItemNumber(i));
        s.writeDouble(theDoubleForItemNumber(i));
    } else
        s.writeBoolean(false);
}

This is the exact inverse of the example that was given for DataInput. Together, they form a pair that can communicate a particular array of structured primitive types across any stream (or "transport layer"). Use this pair as a jumping-off point whenever you need to do something similar.

In addition to the interface above, the class itself implements one (self-explanatory) utility method:

int  theNumberOfBytesWrittenSoFar = s.size();

Processing a File

One of the most common idioms in file I/O is to open a file, read and process it line-by-line, and output it again to another file. Here's a prototypical example of how that would be done in Java:

DataInput   aDI = new DataInputStream(new FileInputStream("source"));
DataOutput  aDO = new DataOutputStream(new FileOutputStream("dest"));
String      line;
while ((line = aDI.readLine()) != null) {
    StringBuffer  modifiedLine = new StringBuffer(line);
    . . .      // process modifiedLine in place
    aDO.writeBytes(modifiedLine.toString());
}
aDI.close();
aDO.close();

If you want to process it byte-by-byte, use this:

try {
    while (true) {
        byte  b = (byte) aDI.readByte();
        . . .      // process b in place
        aDO.writeByte(b);
    }
} finally {
    aDI.close();
    aDO.close();
}

Here's a cute two-liner that just copies the file:

try { while (true) aDO.writeByte(aDI.readByte()); }
finally { aDI.close(); aDO.close(); }

Caution: Many of the examples in today's lesson (and the last two) assume that they appear inside a method that has IOException in its throws clause, so they don't have to "worry" about catching those exceptions and handling them more reasonably. Your code should be a little less cavalier.

PrintStream

You may not realize it, but you're already intimately familiar with the use of two methods of the PrintStream class. That's because whenever you use these method calls:

System.out.print(. . .)
System.out.println(. . .)

you are actually using a PrintStream instance located in System's class variable out to perform the output. System.err is also a PrintStream, and System.in is an InputStream.

Note: On UNIX systems, these three streams will be attached to standard output, standard error, and standard input, respectively.

PrintStream is uniquely an output stream class (it has no "brother"). Because it is usually attached to a screen output device of some kind, it provides an implementation of flush(). It also provides the familiar close() and write() methods, as well as a plethora of choices for outputting the primitive types and Strings of Java:

public void  write(int b);
public void  write(byte[]  buffer, int  offset, int  length);
public void  flush();
public void  close();
public void  print(Object o);
public void  print(String s);
public void  print(char[]  buffer);
public void  print(char c);
public void  print(int i);
public void  print(long l);
public void  print(float f);
public void  print(double d);
public void  print(boolean b);
public void  println(Object o);
public void  println(String s);
public void  println(char[]  buffer);
public void  println(char c);
public void  println(int i);
public void  println(long l);
public void  println(float f);
public void  println(double d);
public void  println(boolean b);
public void  println();   // output a blank line

PrintStream can also be wrapped around any output stream, just like a filter class:

PrintStream  s = new PrintStream(new FileOutputStream("foo"));
s.println("Here's the first line of text in the file foo.");

If you provide a second argument to the constructor for PrintStream, it is a boolean that specifies whether the stream should auto-flush. If true, a flush() is sent after each character is written (or for the three-argument form of write(), after a whole group of characters has been written.)

Here's a simple example program that operates like the UNIX command cat, taking the standard input, line-by-line, and outputting it to the standard output:

import java.io.*;   // the one time in the chapter we'll say this
public class  Cat {
    public static void  main(String argv[]) {
        DataInput  d = new DataInputStream(System.in);
        String     line;
     try {  while ((line = d.readLine()) != null)
            System.out.println(line);
        } catch (IOException  ignored) { }
    }
}

PipedOutputStream

Along with PipedInputStream , this pair of classes supports a UNIX-pipe-like connection between two threads, implementing all the careful synchronization that allows this sort of "shared queue" to operate safely. To set up the connection:

PipedInputStream   sIn  = PipedInputStream();
PipedOutputStream  sOut = PipedOutputStream(sIn);

One thread writes to sOut, and the other reads from sIn. By setting up two such pairs, the threads can communicate safely in both directions.

A RandomAccessFile is created given a file, a filename, or a file descriptor. It combines in one class implementations of the DataInput and DataOutput interfaces, both tuned for "random access" to a file in the file system. In addition to these interfaces, RandomAccessFile provides certain traditional UNIX-like facilities, such as seek()ing to a random point in the file.

Finally, the StreamTokenizer class takes an input stream and produces a sequence of tokens. By overriding its various methods in your own subclasses, you can create powerful lexical parsers.

You can learn more about any and all of these other classes from the full (online) API descriptions in your Java release.

Summary

Today, you learned about the general idea of streams and met input streams based on byte arrays, files, pipes, sequences of other streams, and string buffers, as well as input filters for buffering, typed data, line numbering, and pushing-back characters.

You also met the analogous "brother" output streams for byte arrays, files, and pipes, and output filters for buffering and typed data, and the unique output filter used for printing.

Along the way, you became familiar with the fundamental methods all streams understand (such as read() and write()), as well as the unique methods many streams add to this repertoire. You learned about catching IOExceptions—especially the most useful of them, EOFException.

Finally, the twice-useful DataInput and DataOutput interfaces formed the heart of RandomAccessFile, one of the several utility classes that round out Java's I/O facilities.

Java streams provide a powerful base on which you can build multithreaded, streaming interfaces of the most complex kinds, and the programs (such as HotJava) to interpret them. The higher-level Internet protocols and services of the future that your applets can build upon this base are really limited only by your imagination.

Q&A

Q: In an early read() example, you did something with the variable byteOrMinus1 that seemed a little clumsy. Isn't there a better way? If not, why recommend the cast later?

A: Yes, there is something a little odd about those statements. You might be tempted to try something like this instead:

while ((b = (byte) s.read()) != -1) {
. . . // process the byte b
}

The problem with this short-cut occurs when read() returns the value 0xFF (0377). Since this value is signed-extended before the test gets executed, it will appear to be identical to the integer value -1 that indicates end of stream. Only saving that value in a separate integer variable, and then casting it later, will accomplish the desired result. The cast to byte is recommended in the note for orthogonal reasons—storing integer values in correctly sized variables is always good style (and besides, read() really should be returning something of byte size here and throwing an exception for end of stream).

Q: What input streams in java.io actually implement mark(),c, and markSupported()?

A: InputStream itself does—and in their default implementations, markSupported() returns false, mark() does nothing, and reset() throws an exception. The only input stream in the current release that correctly supports marking is BufferedInputStream, which overrides these defaults. LineNumberInputStream actually implements mark() and reset(), but in the current release, it doesn't answer markSupported() correctly, so it looks as if it does not.

Q: Why is available() useful, if it sometimes gives the wrong answer?

A: First, for many streams, it gives the right answer. Second, for some network streams, its implementation might be sending a special query to discover some information you couldn't get any other way (for example, the size of a file being transferred by ftp). If you were displaying a "progress bar" for network or file transfers, for example, available() will often give you the total size of the transfer, and when it does not—usually by returning 0—it will be obvious to you (and your users).

Q: What's a good example use of the DataInput/DataOutput pair of interfaces?

A: One common use of such a pair is when objects want to "pickle" themselves for storage or movement over a network. Each object implements read and write methods using these interfaces, effectively converting itself to a stream that can later be reconstituted "on the other end" into a copy of the original object.