CPIT-305 | Lab 3: Java I/O Streams and File Processing Pipeline

The goal of this lab is to build a robust file processing pipeline that handles various types of files efficiently using different Java I/O streams.

In modern applications, processing different types of files efficiently is crucial. From handling large text and binary files to dealing with text in different character encodings. Today, many applications fail to handle unexpected files, whether due to large sizes that make the application slower, or at worst, crash due to running out of memory, or even fail to display content due to unseen character encodings that weren’t anticipated. Therefore, choosing the right I/O approach can significantly enhance performance and reliability.

In this lab, we will work through scenarios that a developer is evaluating to use in a media asset management system he is building for digital content creators who need to process various types of files: raw video metadata (binary), subtitle files (text with different encodings), video descriptions (text), and images (binary). The system needs to handle large files efficiently and support both reading and writing operations.

Objectives

In this lab you will:

Implement and evaluate different file copy methods using byte streams:
- Copy the file byte by byte.
- Copy the entire file at once.
- Copy the file in chunks.
Implement and evaluate different character stream methods:
- Use unbuffered character streams to read and count words in a text file.
- Use buffered character streams to read and count words in a text file.
Understand and handle character encoding issues:
- Read a text file with a different character encoding (Windows-1256) and count the number of words.
- Compare the performance and memory usage of different I/O approaches.
- Discuss the best use cases for each approach based on the evaluation results.

Requirements and Tools

Java JDK 11 or above
Sample files located under src/main/resources:
- Large files (binary)
- .txt file for an e-book UTF-8 encodings
- .txt file in different encodings (other than UTF-8)

Problem Statement

A developer is working on an application for content creators to manage their digital assets efficiently. The developer is experimenting with the Java I/O package to evaluate and select the most appropriate I/O streams for each operation while maintaining optimal performance (fast processing) and reliability (preventing runtime errors).

Getting Started

If your instructor is using GitHub classroom, you will need to accept the assignment using the link below, clone the template repository, and import it as a project into your IDE.

If your instructor is not using GitHub classroom, clone and import the template project at https://github.com/cpit305-spring-25-IT1/lab-03 ↗.

Part 1: Raw Byte Streams

Task 1.1: Byte Streams: Copy the File byte by byte

Implement a simple file copy method using FileInputStream and FileOutputStream. The method should copy the file byte by byte. Copy a large video file and measure execution time.

Implement basic file copy method with FileInputStream/FileOutputStream. The method should be named copyRawByteStreams and declared at src/main/java/cpit305/fcit/kau/edu/sa/io/MyByteStreamsManager.java.

Note: we will use a helper method in the Utils.java class to measure the execution time of the method in milliseconds.

public static void copyRawByteStreams(File source, File destination) throws IOException {
        long startTime = System.currentTimeMillis();





        Utils.printExecutionTime(startTime);
    }

Task 1.2: Byte Streams: Copy the entire bytes at once

Implement a simple file copy method using FileInputStream and FileOutputStream. The method should copy the file by loading the entire content (bytes) into memory at once. Copy a large video file and measure execution time.

Implement basic file copy method with FileInputStream/FileOutputStream. The method should be named copyEntireFileAtOnce and declared at src/main/java/cpit305/fcit/kau/edu/sa/io/MyByteStreamsManager.java.

Note: we will also use the helper method in the Utils.java class to measure the execution time of the method in milliseconds.

    public static void copyEntireFileAtOnce(File source, File destination) throws IOException {
        long startTime = System.currentTimeMillis();





        Utils.printExecutionTime(startTime);
    }

Task 1.3: Byte Streams: Copy the bytes in chunks

Implement a simple file copy method using FileInputStream and FileOutputStream. The method should copy the file by reading chunks of bytes. The chunk size should be passed to the method as an argument. Copy a large video file in (1K, 2K, or 4K byte chunks) and measure execution time.

Implement basic file copy method with FileInputStream/FileOutputStream. The method should be named copyInChunks and declared at src/main/java/cpit305/fcit/kau/edu/sa/io/MyByteStreamsManager.java.

Note: we will also use the helper method in the Utils.java class to measure the execution time of the method in milliseconds.

    public static void copyInChunks(File source, File destination, int chunkSize) throws IOException {
        long startTime = System.currentTimeMillis();





        Utils.printExecutionTime(startTime);
    }

Evaluation and Key Findings

Add the execution time in ms to the following table and discuss which copy raw bytes approach should the developer use and why.

Method/Approach	Execution Time	Memory Usage	Ability to Handle Large Files	Best Use Case
Byte-by-byte (`copyRawByteStreams`)	Slowest	Minimal	Yes, but very slow	Small files only (<1MB)
Entire file at once (`copyEntireFileAtOnce`)	Fast for small files	Highest (entire file loaded into memory)	No - Out of Memory for large files	Very small files only (<100MB)
Chunk-based (`copyInChunks`)	Fast	Moderate (only chunk size in memory)	Yes	Large files (>100MB)

Byte-by-byte:
- ✅ Lowest memory usage
- ❌ Extremely slow due to many I/O operations
- ❌ Not practical for production use
Entire file at once:
- ✅ Fastest for small files
- ❌ Risk of OutOfMemoryError for large files
- ❌ Not suitable for production use with large files
Chunk-based:
- ✅ Best balance of memory usage and performance
- ✅ Can handle files of any size
- ✅ Recommended approach for production use
- 💡 Typical chunk sizes: 4KB-8KB for general use, 64KB-128KB for large files

Part 2: Character Streams and Encoding (Unbuffered vs Buffered)

Task 2.1: Unbuffered Character Streams Reader

Write a method that uses FileReader with no buffer to read the “Pride and Prejudice” e-book using FileReader and count the total number of words. Measure the execution time.

Implement basic file copy method that uses FileReader to check the number of words in the given text file. The method should be named countWordsUnbuffered and declared at src/main/java/cpit305/fcit/kau/edu/sa/io/MyCharacterStreamsManager.java.

Note: we will also use the helper method in the Utils.java class to measure the execution time of the method in milliseconds.

    public static int countWordsUnbuffered(File file) throws IOException {
        long startTime = System.currentTimeMillis();
        int wordCount = 0;





        Utils.printExecutionTime(startTime);
        return wordCount;
    }

Task 2.2: Buffered Character Streams Reader

Write a method that uses FileReader with a buffer to read the “Pride and Prejudice” e-book using FileReader with BufferedReader and count the total number of words. Measure the execution time.

Implement basic file copy method that uses FileReader with BufferedReader to check the number of words in the given text file. The method should be named countWordsBuffered and declared at src/main/java/cpit305/fcit/kau/edu/sa/io/MyCharacterStreamsManager.java.

Note: we will also use the helper method in the Utils.java class to measure the execution time of the method in milliseconds.

    public static int countWordsBuffered(File file) throws IOException {
        long startTime = System.currentTimeMillis();
        int wordCount = 0;





        Utils.printExecutionTime(startTime);
        return wordCount;
    }

Evaluation and Key Findings

Add the execution time in ms to the following table and discuss which character stream approach should the developer use and why.

Method/Approach	Execution Time	Memory Usage	Ability to Handle Large Files	Best Use Case
Unbuffered Character Streams (`countWordsUnbuffered`)	Slower	Minimal	Yes, but slower	Small to medium text files
Buffered Character Streams (`countWordsBuffered`)	Faster	Moderate (buffer size in memory)	Yes	Large text files

Unbuffered Character Streams:
- ✅ Lowest memory usage
- ❌ Slower due to many I/O operations
- ❌ Not practical for large files
Buffered Character Streams:
- ✅ Faster due to reduced I/O operations
- ✅ Can handle files of any size
- ✅ Recommended approach for production use
- 💡 Typical buffer sizes: 4KB-8KB for general use, 64KB-128KB for large files

Task 3: Encoding Problems and Proper Character Encoding

Download this text document and open it in your text editor. You can see that the character encoding is different from what your system’s default encoding is (UTF-8). Write a method in Java that reads this document with the default encoding. Take a screenshot of the issues you encounter.

Next, implement a method that uses FileInputStream with InputStreamReader and BufferedReader to read the file with the given character encoding and count the number of words in it. The method should be named countWordsWithEncoding and declared at src/main/java/cpit305/fcit/kau/edu/sa/io/MyCharacterStreamsManager.java.

Note: we will also use the helper method in the Utils.java class to measure the execution time of the method in milliseconds.

public static int countWordsWithEncoding(File file, String encoding) throws IOException {
    int wordCount = 0;





    return wordCount;
}

Deliverables and Submission

Please submit a PDF file with screenshots of your work.

Submission Instructions

Submit on GitHub Classroom

Section: IT1 Section: IT2 Section: IT3

If your instructor is using GitHub classroom, then you should click on your class submission link, link your GitHub username to your name if you have not already done so, accept the assignment, clone the repository into your local development environment, and push the code to the remote repository on GitHub. Please make sure that your written answers are included in either a README (Markdown) file or a PDF file.

Lab dues dates are listed on GitHub classroom unless otherwise noted.

If your instructor is using GitHub classroom, your submission will be auto-graded by running the included unit tests as well as manually graded for correctness, style, and quality.

How to submit your lab to GitHub Classroom

The video below demonstrates how to submit your work to GitHub classroom