Lab 3: Java I/O Streams and File Processing Pipeline

The goal of this lab is to build a robust file processing pipeline that handles various types of files efficiently using different Java I/O streams.

In modern applications, processing different types of files efficiently is crucial. From handling large text and binary files to dealing with text in different character encodings. Today, many applications fail to handle unexpected files, whether due to large sizes that make the application slower, or at worst, crash due to running out of memory, or even fail to display content due to unseen character encodings that weren’t anticipated. Therefore, choosing the right I/O approach can significantly enhance performance and reliability.

In this lab, we will work through scenarios that a developer is evaluating to use in a media asset management system he is building for digital content creators who need to process various types of files: raw video metadata (binary), subtitle files (text with different encodings), video descriptions (text), and images (binary). The system needs to handle large files efficiently and support both reading and writing operations.

Objectives

In this lab you will:

  1. Implement and evaluate different file copy methods using byte streams:

    • Copy the file byte by byte.
    • Copy the entire file at once.
    • Copy the file in chunks.
  2. Implement and evaluate different character stream methods:

    • Use unbuffered character streams to read and count words in a text file.
    • Use buffered character streams to read and count words in a text file.
  3. Understand and handle character encoding issues:

    • Read a text file with a different character encoding (Windows-1256) and count the number of words.
    • Compare the performance and memory usage of different I/O approaches.
    • Discuss the best use cases for each approach based on the evaluation results.

Requirements and Tools

  • Java JDK 11 or above
  • Sample files located under src/main/resources:
    • Large files (binary)
    • .txt file for an e-book UTF-8 encodings
    • .txt file in different encodings (other than UTF-8)

Problem Statement

A developer is working on an application for content creators to manage their digital assets efficiently. The developer is experimenting with the Java I/O package to evaluate and select the most appropriate I/O streams for each operation while maintaining optimal performance (fast processing) and reliability (preventing runtime errors).

Getting Started

If your instructor is using GitHub classroom, you will need to accept the assignment using the link below, clone the template repository, and import it as a project into your IDE.

If your instructor is not using GitHub classroom, clone and import the template project at https://github.com/cpit305-spring-25-IT1/lab-03 ↗.

Part 1: Raw Byte Streams

Task 1.1: Byte Streams: Copy the File byte by byte

Implement a simple file copy method using FileInputStream and FileOutputStream. The method should copy the file byte by byte. Copy a large video file and measure execution time.

Implement basic file copy method with FileInputStream/FileOutputStream. The method should be named copyRawByteStreams and declared at src/main/java/cpit305/fcit/kau/edu/sa/io/MyByteStreamsManager.java.

Note: we will use a helper method in the Utils.java class to measure the execution time of the method in milliseconds.

1
2
3
4
5
6
7
8
9
public static void copyRawByteStreams(File source, File destination) throws IOException {
        long startTime = System.currentTimeMillis();





        Utils.printExecutionTime(startTime);
    }

Task 1.2: Byte Streams: Copy the entire bytes at once

Implement a simple file copy method using FileInputStream and FileOutputStream. The method should copy the file by loading the entire content (bytes) into memory at once. Copy a large video file and measure execution time.

Implement basic file copy method with FileInputStream/FileOutputStream. The method should be named copyEntireFileAtOnce and declared at src/main/java/cpit305/fcit/kau/edu/sa/io/MyByteStreamsManager.java.

Note: we will also use the helper method in the Utils.java class to measure the execution time of the method in milliseconds.

1
2
3
4
5
6
7
8
9
    public static void copyEntireFileAtOnce(File source, File destination) throws IOException {
        long startTime = System.currentTimeMillis();





        Utils.printExecutionTime(startTime);
    }

Task 1.3: Byte Streams: Copy the bytes in chunks

Implement a simple file copy method using FileInputStream and FileOutputStream. The method should copy the file by reading chunks of bytes. The chunk size should be passed to the method as an argument. Copy a large video file in (1K, 2K, or 4K byte chunks) and measure execution time.

Implement basic file copy method with FileInputStream/FileOutputStream. The method should be named copyInChunks and declared at src/main/java/cpit305/fcit/kau/edu/sa/io/MyByteStreamsManager.java.

Note: we will also use the helper method in the Utils.java class to measure the execution time of the method in milliseconds.

1
2
3
4
5
6
7
8
9
    public static void copyInChunks(File source, File destination, int chunkSize) throws IOException {
        long startTime = System.currentTimeMillis();





        Utils.printExecutionTime(startTime);
    }

Evaluation and Key Findings

Add the execution time in ms to the following table and discuss which copy raw bytes approach should the developer use and why.

Method/ApproachExecution TimeMemory UsageAbility to Handle Large FilesBest Use Case
Byte-by-byte (copyRawByteStreams)SlowestMinimalYes, but very slowSmall files only (<1MB)
Entire file at once (copyEntireFileAtOnce)Fast for small filesHighest (entire file loaded into memory)No - Out of Memory for large filesVery small files only (<100MB)
Chunk-based (copyInChunks)FastModerate (only chunk size in memory)YesLarge files (>100MB)
  • Byte-by-byte:

    • ✅ Lowest memory usage
    • ❌ Extremely slow due to many I/O operations
    • ❌ Not practical for production use
  • Entire file at once:

    • ✅ Fastest for small files
    • ❌ Risk of OutOfMemoryError for large files
    • ❌ Not suitable for production use with large files
  • Chunk-based:

    • ✅ Best balance of memory usage and performance
    • ✅ Can handle files of any size
    • ✅ Recommended approach for production use
    • 💡 Typical chunk sizes: 4KB-8KB for general use, 64KB-128KB for large files

Part 2: Character Streams and Encoding (Unbuffered vs Buffered)

Task 2.1: Unbuffered Character Streams Reader

Write a method that uses FileReader with no buffer to read the “Pride and Prejudice” e-book using FileReader and count the total number of words. Measure the execution time.

Implement basic file copy method that uses FileReader to check the number of words in the given text file. The method should be named countWordsUnbuffered and declared at src/main/java/cpit305/fcit/kau/edu/sa/io/MyCharacterStreamsManager.java.

Note: we will also use the helper method in the Utils.java class to measure the execution time of the method in milliseconds.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
    public static int countWordsUnbuffered(File file) throws IOException {
        long startTime = System.currentTimeMillis();
        int wordCount = 0;





        Utils.printExecutionTime(startTime);
        return wordCount;
    }

Task 2.2: Buffered Character Streams Reader

Write a method that uses FileReader with a buffer to read the “Pride and Prejudice” e-book using FileReader with BufferedReader and count the total number of words. Measure the execution time.

Implement basic file copy method that uses FileReader with BufferedReader to check the number of words in the given text file. The method should be named countWordsBuffered and declared at src/main/java/cpit305/fcit/kau/edu/sa/io/MyCharacterStreamsManager.java.

Note: we will also use the helper method in the Utils.java class to measure the execution time of the method in milliseconds.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
    public static int countWordsBuffered(File file) throws IOException {
        long startTime = System.currentTimeMillis();
        int wordCount = 0;





        Utils.printExecutionTime(startTime);
        return wordCount;
    }

Evaluation and Key Findings

Add the execution time in ms to the following table and discuss which character stream approach should the developer use and why.

Method/ApproachExecution TimeMemory UsageAbility to Handle Large FilesBest Use Case
Unbuffered Character Streams (countWordsUnbuffered)SlowerMinimalYes, but slowerSmall to medium text files
Buffered Character Streams (countWordsBuffered)FasterModerate (buffer size in memory)YesLarge text files
  • Unbuffered Character Streams:

    • ✅ Lowest memory usage
    • ❌ Slower due to many I/O operations
    • ❌ Not practical for large files
  • Buffered Character Streams:

    • ✅ Faster due to reduced I/O operations
    • ✅ Can handle files of any size
    • ✅ Recommended approach for production use
    • 💡 Typical buffer sizes: 4KB-8KB for general use, 64KB-128KB for large files

Task 3: Encoding Problems and Proper Character Encoding

Download this text document and open it in your text editor. You can see that the character encoding is different from what your system’s default encoding is (UTF-8). Write a method in Java that reads this document with the default encoding. Take a screenshot of the issues you encounter.

Next, implement a method that uses FileInputStream with InputStreamReader and BufferedReader to read the file with the given character encoding and count the number of words in it. The method should be named countWordsWithEncoding and declared at src/main/java/cpit305/fcit/kau/edu/sa/io/MyCharacterStreamsManager.java.

Note: we will also use the helper method in the Utils.java class to measure the execution time of the method in milliseconds.

1
2
3
4
5
6
7
8
9
public static int countWordsWithEncoding(File file, String encoding) throws IOException {
    int wordCount = 0;





    return wordCount;
}

Deliverables and Submission

Please submit a PDF file with screenshots of your work.