IO 字节流

1. 框架

2. ByteArrayInputStream/ByteArrayOutputStream

2.1 ByteArrayInputStream

ByteArrayInputStream 是字节数组输入流，它继承于InputStream。
ByteArrayInputStream 包含一个内部缓冲区，该缓冲区包含从流中读取的字节，本质就是通过字节数组来实现的

public class ByteArrayInputStream extends InputStream {
    // 保存字节输入流数据的字节数组
    protected byte buf[];
    // 下一个会被读取的字节的索引
    protected int pos;
    // 标记的索引
    protected int mark = 0;
    // 字节流的长度
    protected int count;
    
    // 构造函数：创建一个内容为buf的字节流
    public ByteArrayInputStream(byte buf[]) {
        this.buf = buf;
        // 初始化“下一个要被读取的字节索引号为0”
        this.pos = 0;
        // 初始化“字节流的长度为buf的长度”
        this.count = buf.length;
    }

    // 构造函数：创建一个内容为buf的字节流，并且是从offset开始读取数据，读取的长度为length
    public ByteArrayInputStream(byte buf[], int offset, int length) {
        this.buf = buf;
        this.pos = offset;
        this.count = Math.min(offset + length, buf.length);
        // 初始化“标记的字节流读取位置”
        this.mark = offset;
    }
}

2.2 ByteArrayOutputStream

ByteArrayOutputStream 是字节数组输出流。它继承于OutputStream
ByteArrayOutputStream 中的数据被写入一个 byte 数组。缓冲区会随着数据的不断写入而自动增长。可使用 toByteArray() 和 toString() 获取数据
通过ByteArrayOutputStream()创建的“字节数组输出流”对应的字节数组大小是32。

3. PipedInputStream/PipedOutputStream

PipedOutputStream和PipedInputStream分别是管道输出流和管道输入流。它们的作用是让多线程可以通过管道进行线程间的通讯
PipedOutputStream是基于PipedInputStream实现，内部持有PipedInputStream对象
PipedOutputStream和PipedInputStream必须配套使用，使用connect连接
PipedOutputStream写入数据，实际调用的是PipedInputStream的receive()方法，PipedInputStream内部缓存区默认大小为1024个字节。当一次性buffer写入1024个byte后，会先notifyAll()，再wait(1000)，目的就是把刚才写入的内容被读出，然后再继续写入

//PipedOutputStream写入数据
public void write(byte b[], int off, int len) throws IOException {
    if (sink == null) {	//PipedInputStream输入流是否存在
        throw new IOException("Pipe not connected");
    } else if (b == null) {
        throw new NullPointerException();
    } else if ((off < 0) || (off > b.length) || (len < 0) ||
               ((off + len) > b.length) || ((off + len) < 0)) {
        throw new IndexOutOfBoundsException();
    } else if (len == 0) {
        return;
    }
    sink.receive(b, off, len);	//调用的PipedInputStream的方法
}

//PipedInputStream接收字节数组，写入缓冲区buffer
//in - 下一个写入字节的位置   out - 下一个读取字节的位置
synchronized void receive(byte b[], int off, int len)  throws IOException {
    checkStateForReceive();	//判断连接是否关闭
    writeSide = Thread.currentThread();
    int bytesToTransfer = len;//写入数据总长度
    while (bytesToTransfer > 0) {
        if (in == out)	//若“写入管道”的数据正好全部被读取完，则等待。
            awaitSpace();
        int nextTransferAmount = 0;//此次写入数据长度
        if (out < in) {// 如果“管道中被读取的数据，少于写入管道的数据”；
            nextTransferAmount = buffer.length - in;
        } else if (in < out) { // 如果“管道中被读取的数据，大于写入管道的数据”
            if (in == -1) {//初始化
                in = out = 0;
                nextTransferAmount = buffer.length - in;
            } else {	//控制in不超过out，否则覆盖写入的数据
                nextTransferAmount = out - in;
            }
        }
        if (nextTransferAmount > bytesToTransfer)
            nextTransferAmount = bytesToTransfer;
        assert(nextTransferAmount > 0);
        System.arraycopy(b, off, buffer, in, nextTransferAmount); //将数据写入到缓冲中
        bytesToTransfer -= nextTransferAmount;
        off += nextTransferAmount;
        in += nextTransferAmount;
        if (in >= buffer.length) {
            in = 0;
        }
    }
}

PipedInputStream读取数据，每次read()前，都会判断缓存区是否有数据（依据in变量判断），如果没有的话就先让出当前的锁（即让写的进程先运行），然后再去读

// 从管道(的缓冲)中读取数据，并将其存入到数组b中
public synchronized int read(byte b[], int off, int len)  throws IOException {
    if (b == null) {
        throw new NullPointerException();
    } else if (off < 0 || len < 0 || len > b.length - off) {
        throw new IndexOutOfBoundsException();
    } else if (len == 0) {
        return 0;
    }

    /* possibly wait on the first character */
    int c = read();
    if (c < 0) {
        return -1;
    }
    b[off] = (byte) c;
    int rlen = 1;//已读取一个字节
    while ((in >= 0) && (len > 1)) {

        int available;

        if (in > out) {
            available = Math.min((buffer.length - out), (in - out));
        } else {
            available = buffer.length - out;//读取out后面所有字节
        }

        // A byte is read beforehand outside the loop
        if (available > (len - 1)) {
            available = len - 1;
        }
        System.arraycopy(buffer, out, b, off + rlen, available);//复制缓冲区的数据到指定byte[]
        out += available;
        rlen += available;
        len -= available;

        if (out >= buffer.length) {
            out = 0;
        }
        if (in == out) {
            /* now empty */
            in = -1;
        }
    }
    return rlen;
}
//从管道(的缓冲)中读取一个字节，并将其转换成int类型
public synchronized int read()  throws IOException {
    if (!connected) {
        throw new IOException("Pipe not connected");
    } else if (closedByReader) {
        throw new IOException("Pipe closed");
    } else if (writeSide != null && !writeSide.isAlive()
               && !closedByWriter && (in < 0)) {
        throw new IOException("Write end dead");
    }

    readSide = Thread.currentThread();
    int trials = 2;
    while (in < 0) {	//缓冲区是否有数据
        if (closedByWriter) {
            /* closed by writer, return EOF */
            return -1;
        }
        if ((writeSide != null) && (!writeSide.isAlive()) && (--trials < 0)) {
            throw new IOException("Pipe broken");
        }
        /* might be a writer waiting */
        notifyAll();
        try {
            wait(1000);
        } catch (InterruptedException ex) {
            throw new java.io.InterruptedIOException();
        }
    }
    int ret = buffer[out++] & 0xFF;
    if (out >= buffer.length) {
        out = 0;
    }
    if (in == out) {
        /* now empty */
        in = -1;
    }

    return ret;
}

4. ObjectInputStream/ObjectOutputStream

ObjectInputStream 和 ObjectOutputStream 的作用是，对基本数据和对象进行序列化操作支持
只能从流中读取支持java.io.Serializable或java.io.Externalizable接口的对象

5. FileInputStream/FileOutputStream

FileInputStream 是文件输入流，可以从某个文件中获得输入字节
FileOutputStream 是文件输出流，将数据写入 File 或 FileDescriptor 的输出流

6. FileDescriptor

FileDescriptor 是“文件描述符”，可以被用来表示开放文件、开放套接字等
不能直接通过FileDescriptor对该文件进行操作；若需要通过FileDescriptor对该文件进行操作，则需要新创建FileDescriptor对应的FileOutputStream，再对文件进行操作
in、out、err，标准输入输出的句柄，一般不直接使用；Java封装好了相应接口，可以使用System.in, System.out, System.err

public static final FileDescriptor in = standardStream(0);//标准输入(键盘)的描述符

public static final FileDescriptor out = standardStream(1);//标准输出(屏幕)的描述符

public static final FileDescriptor err = standardStream(2);//标准错误输出(屏幕)的描述符

源码如下：

private static native long set(int d);

private static FileDescriptor standardStream(int fd) {
    FileDescriptor desc = new FileDescriptor();
    desc.handle = set(fd);
    return desc;
}

可以看出in/out/err就是一个FileDescriptor对象，只是其handle不同（long类型），通过set(fd)来设置其handle。“fd=0”代表了“标准输入”，“fd=1”代表了“标准输出”，“fd=2”代表了“标准错误输出”

7. FilterInputStream/FilterOutputStream

FilterInputStream/FilterOutputStream 的作用是用来“封装其它的输入输出流，并为它们提供额外的功能”。常用的子类有BufferedInputStream/BufferedOutputStream和DataInputStream/DataOutputStream，PrintStream。
BufferedInputStream/BufferedOutputStream的作用就是为输入/输出流提供缓冲功能，为输入流提供mark()和reset()功能。
DataInputStream/DataOutputStream 是用来装饰其它输入输出流，它“允许应用程序以与机器无关方式从底层输入流中读取基本 Java 数据类型”。应用程序可以使用DataOutputStream(数据输出流)写入由DataInputStream(数据输入流)读取的数据。
PrintStream 是用来装饰其它输出流。它能为其他输出流添加了功能，使它们能够方便地打印各种数据值表示形式。

8. BufferedInputStream/BufferedOutputStream

8.1 BufferedInputStream

BufferedInputStream 是缓冲输入流。它继承于FilterInputStream
为另一个输入流添加一些功能，例如，提供“缓冲功能”以及支持“mark()标记”和“reset()重置方法”
。例如，在新建某输入流对应的BufferedInputStream后，当我们通过read()读取输入流的数据时

8.1.1 原理

本质上是通过一个内部缓冲区数组实现的。创建BufferedInputStream时，我们会通过它的构造函数指定某个输入流为参数。BufferedInputStream会将该输入流数据分批读取，每次读取一部分到缓冲中；操作完缓冲中的这部分数据之后，再从输入流中读取下一部分的数据到缓冲区中。为什么需要缓冲呢？原因很简单，效率问题！缓冲中的数据实际上是保存在内存中，而原始数据可能是保存在硬盘或NandFlash等存储介质中；而我们知道，从内存中读取数据的速度比从硬盘读取数据的速度至少快10倍以上。

8.1.2 源码分析

read()

//读取下一个字节
public synchronized int read() throws IOException {
    if (pos >= count) {	//判断是否读完buffer中的数据
        fill();
        if (pos >= count)
            return -1;
    }
    return getBufIfOpen()[pos++] & 0xff;
}

fill()

//从输入流中读取数据，并填充到buffer中
private void fill() throws IOException {
    byte[] buffer = getBufIfOpen();
    if (markpos < 0)	//没有标志，直接读取数据到buffer中
        pos = 0;            /* no mark: throw away the buffer */
    else if (pos >= buffer.length)  //buffer没有多余空间
        if (markpos > 0) {  /* can throw away early part of the buffer */
            int sz = pos - markpos;
            System.arraycopy(buffer, markpos, buffer, 0, sz);	//复制markpos - buffer.length的数据到 0 - sz中
            pos = sz;
            markpos = 0;
        } else if (buffer.length >= marklimit) {
            markpos = -1;   /* buffer got too big, invalidate mark */
            pos = 0;        /* drop buffer contents */
        } else if (buffer.length >= MAX_BUFFER_SIZE) {
            throw new OutOfMemoryError("Required array size too large");
        } else {            /* grow buffer */
            int nsz = (pos <= MAX_BUFFER_SIZE - pos) ?
                    pos * 2 : MAX_BUFFER_SIZE;
            if (nsz > marklimit)
                nsz = marklimit;
            byte nbuf[] = new byte[nsz];	//扩容操作。随着读取次数的增多，buffer会越来越大；这会导致我们占据的内存越来越大。因此需要一个marklimit；当buffer>=marklimit时，就不再保存markpos的值了。
            System.arraycopy(buffer, 0, nbuf, 0, pos);
            if (!bufUpdater.compareAndSet(this, buffer, nbuf)) {
                // Can't replace buf if there was an async close.
                // Note: This would need to be changed if fill()
                // is ever made accessible to multiple threads.
                // But for now, the only way CAS can fail is via close.
                // assert buf == null;
                throw new IOException("Stream closed");
            }
            buffer = nbuf;
        }
    count = pos;
    int n = getInIfOpen().read(buffer, pos, buffer.length - pos); //从输入流读取buffer.length - pos的数据填充到buffer中，起始位置pos
    if (n > 0)
        count = n + pos;
}

8.2 BufferedOutputStream

BufferedOutputStream 是缓冲输出流。它继承于FilterOutputStream。
BufferedOutputStream 的作用是为另一个输出流提供“缓冲功能”。
BufferedOutputStream 关闭流前会进行flush()，将数据刷到输出流；当写入数据超过缓冲区大小时，会将全部数据写入输出流，而不经过缓冲区

9. DataInputStream/DataOutputStream

9.1 DataInputStream

DataInputStream 是用来装饰其它输入流，它“允许应用程序以与机器无关方式从底层输入流中读取基本 Java 数据类型”。应用程序可以使用DataOutputStream(数据输出流)写入由DataInputStream(数据输入流)读取的数据。
readUTF()

public final static String readUTF(DataInput in) throws IOException {
    // 从“数据输入流”中读取“无符号的short类型”的值：
    // 注意：UTF-8输入流的前2个字节是数据的长度
    int utflen = in.readUnsignedShort();
    byte[] bytearr = null;
    char[] chararr = null;

    if (in instanceof DataInputStream) {
        DataInputStream dis = (DataInputStream)in;
        if (dis.bytearr.length < utflen){
            dis.bytearr = new byte[utflen*2];
            dis.chararr = new char[utflen*2];
        }
        chararr = dis.chararr;
        bytearr = dis.bytearr;
    } else {
        bytearr = new byte[utflen];
        chararr = new char[utflen];
    }

    int c, char2, char3;
    int count = 0;
    int chararr_count=0;

    // 从“数据输入流”中读取数据并存储到字节数组bytearr中；从bytearr的位置0开始存储，存储长度为utflen。
    // 注意，这里是存储到字节数组！而且读取的是全部的数据。
    in.readFully(bytearr, 0, utflen);

    // 将“字节数组bytearr”中的数据 拷贝到 “字符数组chararr”中
    // 注意：这里相当于“预处理的输入流中单字节的符号”，因为UTF-8是1-4个字节可变的。
    while (count < utflen) {
        // 将每个字节转换成int值
        c = (int) bytearr[count] & 0xff;
        // UTF-8的每个字节的值都不会超过127；所以，超过127，则退出。
        if (c > 127) break;
        count++;
        // 将c保存到“字符数组chararr”中
        chararr[chararr_count++]=(char)c;
    }

    // 处理完输入流中单字节的符号之后，接下来我们继续处理。
    while (count < utflen) {
        // 下面语句执行了2步操作。
        // (01) 将字节由 “byte类型” 转换成 “int类型”。
        //      例如， “11001010” 转换成int之后，是 “00000000 00000000 00000000 11001010”
        // (02) 将 “int类型” 的数据左移4位
        //      例如， “00000000 00000000 00000000 11001010” 左移4位之后，变成 “00000000 00000000 00000000 00001100”
        c = (int) bytearr[count] & 0xff;
        switch (c >> 4) {
            // 若 UTF-8 是单字节，即 bytearr[count] 对应是 “0xxxxxxx” 形式；
            // 则 bytearr[count] 对应的int类型的c的取值范围是 0-7。
            case 0: case 1: case 2: case 3: case 4: case 5: case 6: case 7:
                /* 0xxxxxxx*/
                count++;
                chararr[chararr_count++]=(char)c;
                break;

            // 若 UTF-8 是双字节，即 bytearr[count] 对应是 “110xxxxx  10xxxxxx” 形式中的第一个，即“110xxxxx”
            // 则 bytearr[count] 对应的int类型的c的取值范围是 12-13。
            case 12: case 13:
                /* 110x xxxx   10xx xxxx*/
                count += 2;
                if (count > utflen)
                    throw new UTFDataFormatException(
                        "malformed input: partial character at end");
                char2 = (int) bytearr[count-1];
                if ((char2 & 0xC0) != 0x80)	//判断第二个char是否为 /* 10xx xxxx*/
                    throw new UTFDataFormatException(
                        "malformed input around byte " + count);
                chararr[chararr_count++]=(char)(((c & 0x1F) << 6) |
                                                (char2 & 0x3F));
                break;

            // 若 UTF-8 是三字节，即 bytearr[count] 对应是 “1110xxxx  10xxxxxx  10xxxxxx” 形式中的第一个，即“1110xxxx”
            // 则 bytearr[count] 对应的int类型的c的取值是14 。
            case 14:
                /* 1110 xxxx  10xx xxxx  10xx xxxx */
                count += 3;
                if (count > utflen)
                    throw new UTFDataFormatException(
                        "malformed input: partial character at end");
                char2 = (int) bytearr[count-2];
                char3 = (int) bytearr[count-1];
                if (((char2 & 0xC0) != 0x80) || ((char3 & 0xC0) != 0x80))
                    throw new UTFDataFormatException(
                        "malformed input around byte " + (count-1));
                chararr[chararr_count++]=(char)(((c     & 0x0F) << 12) |
                                                ((char2 & 0x3F) << 6)  |
                                                ((char3 & 0x3F) << 0));
                break;

            // 若 UTF-8 是四字节，即 bytearr[count] 对应是 “11110xxx 10xxxxxx  10xxxxxx  10xxxxxx” 形式中的第一个，即“11110xxx”
            // 则 bytearr[count] 对应的int类型的c的取值是15 
            default:
                /* 10xx xxxx,  1111 xxxx */
                throw new UTFDataFormatException(
                    "malformed input around byte " + count);
        }
    }
    // The number of chars produced may be less than utflen
    return new String(chararr, 0, chararr_count);
}

9.2 DataOutputStream

DataOutputStream 是用来装饰其它输出流，将DataOutputStream和DataInputStream输入流配合使用，“允许应用程序以与机器无关方式从底层输入流中读写基本 Java 数据类型”。
writeUTF()

// 将String数据以UTF-8类型的形式写入到“输出流out”中
static int writeUTF(String str, DataOutput out) throws IOException {
    //获取String的长度
    int strlen = str.length();
    int utflen = 0;
    int c, count = 0;

    // 由于UTF-8是1～4个字节不等；
    // 这里，根据UTF-8首字节的范围，判断UTF-8是几个字节的。
    for (int i = 0; i < strlen; i++) {
        c = str.charAt(i);
        if ((c >= 0x0001) && (c <= 0x007F)) {
            utflen++;
        } else if (c > 0x07FF) {
            utflen += 3;
        } else {
            utflen += 2;
        }
    }

    if (utflen > 65535)
        throw new UTFDataFormatException(
        "encoded string too long: " + utflen + " bytes");

    // 新建“字节数组bytearr”
    byte[] bytearr = null;
    if (out instanceof DataOutputStream) {
        DataOutputStream dos = (DataOutputStream)out;
        if(dos.bytearr == null || (dos.bytearr.length < (utflen+2)))
            dos.bytearr = new byte[(utflen*2) + 2];
        bytearr = dos.bytearr;
    } else {
        bytearr = new byte[utflen+2];
    }

    // “字节数组”的前2个字节保存的是“UTF-8数据的长度”
    bytearr[count++] = (byte) ((utflen >>> 8) & 0xFF);
    bytearr[count++] = (byte) ((utflen >>> 0) & 0xFF);

    // 对UTF-8中的单字节数据进行预处理
    int i=0;
    for (i=0; i<strlen; i++) {
        c = str.charAt(i);
        if (!((c >= 0x0001) && (c <= 0x007F))) break;
        bytearr[count++] = (byte) c;
    }

    // 对预处理后的数据，接着进行处理
    for (;i < strlen; i++){
        c = str.charAt(i);
        // UTF-8数据是1个字节的情况
        if ((c >= 0x0001) && (c <= 0x007F)) {
            bytearr[count++] = (byte) c;

        } else if (c > 0x07FF) {
            // UTF-8数据是3个字节的情况
            bytearr[count++] = (byte) (0xE0 | ((c >> 12) & 0x0F));
            bytearr[count++] = (byte) (0x80 | ((c >>  6) & 0x3F));
            bytearr[count++] = (byte) (0x80 | ((c >>  0) & 0x3F));
        } else {
            // UTF-8数据是2个字节的情况
            bytearr[count++] = (byte) (0xC0 | ((c >>  6) & 0x1F));
            bytearr[count++] = (byte) (0x80 | ((c >>  0) & 0x3F));
        }
    }
    // 将字节数组写入到“数据输出流”中
    out.write(bytearr, 0, utflen+2);
    return utflen + 2;
}

10. PrintStream

PrintStream 是打印输出流，它继承于FilterOutputStream。
PrintStream 是用来装饰其它输出流。它能为其他输出流添加了功能，使它们能够方便地打印各种数据值表示形式。
与其他输出流不同，PrintStream 永远不会抛出 IOException；它产生的IOException会被自身的函数所捕获并设置错误标记，用户可以通过 checkError() 返回错误标记，从而查看PrintStream内部是否产生了IOException。
PrintStream 提供了自动flush 和字符集设置功能。自动flush，就是往PrintStream写入的数据会立刻调用flush()函数。
print()方法实际上调用的是write()方法

public void print(int i) {
    write(String.valueOf(i));	//转为string
}

private void write(String s) {
    try {
        synchronized (this) {
            ensureOpen();
            textOut.write(s);
            textOut.flushBuffer();
            charOut.flushBuffer();
            if (autoFlush && (s.indexOf('\n') >= 0))
                out.flush();
        }
    }
    catch (InterruptedIOException x) {
        Thread.currentThread().interrupt();
    }
    catch (IOException x) {
        trouble = true;	//不会抛出IOException, checkError()可查看是否发生异常
    }
}

System中的in，out，err

public final static InputStream in = null;
public final static PrintStream out = null;
public final static PrintStream err = null;

private static native void setIn0(InputStream in);
private static native void setOut0(PrintStream out);
private static native void setErr0(PrintStream err);

private static void initializeSystemClass() {
    
    ...

    FileInputStream fdIn = new FileInputStream(FileDescriptor.in);
    FileOutputStream fdOut = new FileOutputStream(FileDescriptor.out);
    FileOutputStream fdErr = new FileOutputStream(FileDescriptor.err);
    setIn0(new BufferedInputStream(fdIn));
    setOut0(newPrintStream(fdOut, props.getProperty("sun.stdout.encoding")));
    setErr0(newPrintStream(fdErr, props.getProperty("sun.stderr.encoding")));
    
    ...
}

private static PrintStream newPrintStream(FileOutputStream fos, String enc) {
    if (enc != null) {
        try {
            return new PrintStream(new BufferedOutputStream(fos, 128), true, enc);
        } catch (UnsupportedEncodingException uee) {}
    }
    return new PrintStream(new BufferedOutputStream(fos, 128), true);
}

以out为例，获取过程：

首先获取标准输出（屏幕）的标识符out（FileDescriptor对象）
创建“标准输出(屏幕)”对应的“文件输出流”
创建“文件输出流”对应的“缓冲输出流”。目的是为“文件输出流”添加“缓冲”功能。
创建“缓冲输出流”对应的“打印输出流”。目的是为“缓冲输出流”提供方便的打印接口，如print(), println(), printf()；使其能方便快捷的进行打印输出
执行setOut0(ps) ，将ps设置为out静态成员变量

11 参考

http://www.cnblogs.com/skywang12345/p/io_01.html