Lab 3: Java I/O Streams and File Processing Pipeline
The goal of this lab is to build a robust file processing pipeline that handles various types of files efficiently using different Java I/O streams.
In modern applications, processing different types of files efficiently is crucial. From handling large text and binary files to dealing with text in different character encodings. Today, many applications fail to handle unexpected files, whether due to large sizes that make the application slower, or at worst, crash due to running out of memory, or even fail to display content due to unseen character encodings that weren’t anticipated. Therefore, choosing the right I/O approach can significantly enhance performance and reliability.
In this lab, we will work through scenarios that a developer is evaluating to use in a media asset management system he is building for digital content creators who need to process various types of files: raw video metadata (binary), subtitle files (text with different encodings), video descriptions (text), and images (binary). The system needs to handle large files efficiently and support both reading and writing operations.
Objectives
In this lab you will:
Implement and evaluate different file copy methods using byte streams:
Copy the file byte by byte.
Copy the entire file at once.
Copy the file in chunks.
Implement and evaluate different character stream methods:
Use unbuffered character streams to read and count words in a text file.
Use buffered character streams to read and count words in a text file.
Understand and handle character encoding issues:
Read a text file with a different character encoding (Windows-1256) and count the number of words.
Compare the performance and memory usage of different I/O approaches.
Discuss the best use cases for each approach based on the evaluation results.
Requirements and Tools
Java JDK 11 or above
Sample files located under src/main/resources:
Large files (binary)
.txt file for an e-book UTF-8 encodings
.txt file in different encodings (other than UTF-8)
Problem Statement
A developer is working on an application for content creators to manage their digital assets efficiently. The developer is experimenting with the Java I/O package to evaluate and select the most appropriate I/O streams for each operation while maintaining optimal performance (fast processing) and reliability (preventing runtime errors).
Getting Started
If your instructor is using GitHub classroom, you will need to accept the assignment using the link below, clone the template repository, and import it as a project into your IDE.
Task 1.1: Byte Streams: Copy the File byte by byte
Implement a simple file copy method using FileInputStream and FileOutputStream. The method should copy the file byte by byte. Copy a large video file and measure execution time.
Implement basic file copy method with FileInputStream/FileOutputStream. The method should be named copyRawByteStreams and declared at src/main/java/cpit305/fcit/kau/edu/sa/io/MyByteStreamsManager.java.
Note: we will use a helper method in the Utils.java class to measure the execution time of the method in milliseconds.
Task 1.2: Byte Streams: Copy the entire bytes at once
Implement a simple file copy method using FileInputStream and FileOutputStream. The method should copy the file by loading the entire content (bytes) into memory at once. Copy a large video file and measure execution time.
Implement basic file copy method with FileInputStream/FileOutputStream. The method should be named copyEntireFileAtOnce and declared at src/main/java/cpit305/fcit/kau/edu/sa/io/MyByteStreamsManager.java.
Note: we will also use the helper method in the Utils.java class to measure the execution time of the method in milliseconds.
Implement a simple file copy method using FileInputStream and FileOutputStream. The method should copy the file by reading chunks of bytes. The chunk size should be passed to the method as an argument. Copy a large video file in (1K, 2K, or 4K byte chunks) and measure execution time.
Implement basic file copy method with FileInputStream/FileOutputStream. The method should be named copyInChunks and declared at src/main/java/cpit305/fcit/kau/edu/sa/io/MyByteStreamsManager.java.
Note: we will also use the helper method in the Utils.java class to measure the execution time of the method in milliseconds.
publicstaticvoidcopyInChunks(File source, File destination, int chunkSize) throws IOException {
long startTime = System.currentTimeMillis();
Utils.printExecutionTime(startTime);
}
Evaluation and Key Findings
Add the execution time in ms to the following table and discuss which copy raw bytes approach should the developer use and why.
Method/Approach
Execution Time
Memory Usage
Ability to Handle Large Files
Best Use Case
Byte-by-byte (copyRawByteStreams)
Slowest
Minimal
Yes, but very slow
Small files only (<1MB)
Entire file at once (copyEntireFileAtOnce)
Fast for small files
Highest (entire file loaded into memory)
No - Out of Memory for large files
Very small files only (<100MB)
Chunk-based (copyInChunks)
Fast
Moderate (only chunk size in memory)
Yes
Large files (>100MB)
Byte-by-byte:
✅ Lowest memory usage
❌ Extremely slow due to many I/O operations
❌ Not practical for production use
Entire file at once:
✅ Fastest for small files
❌ Risk of OutOfMemoryError for large files
❌ Not suitable for production use with large files
Chunk-based:
✅ Best balance of memory usage and performance
✅ Can handle files of any size
✅ Recommended approach for production use
💡 Typical chunk sizes: 4KB-8KB for general use, 64KB-128KB for large files
Part 2: Character Streams and Encoding (Unbuffered vs Buffered)
Task 2.1: Unbuffered Character Streams Reader
Write a method that uses FileReader with no buffer to read the “Pride and Prejudice” e-book using FileReader and count the total number of words. Measure the execution time.
Implement basic file copy method that uses FileReader to check the number of words in the given text file. The method should be named countWordsUnbuffered and declared at src/main/java/cpit305/fcit/kau/edu/sa/io/MyCharacterStreamsManager.java.
Note: we will also use the helper method in the Utils.java class to measure the execution time of the method in milliseconds.
publicstaticintcountWordsUnbuffered(File file) throws IOException {
long startTime = System.currentTimeMillis();
int wordCount = 0;
Utils.printExecutionTime(startTime);
return wordCount;
}
Task 2.2: Buffered Character Streams Reader
Write a method that uses FileReader with a buffer to read the “Pride and Prejudice” e-book using FileReader with BufferedReader and count the total number of words. Measure the execution time.
Implement basic file copy method that uses FileReader with BufferedReader to check the number of words in the given text file. The method should be named countWordsBuffered and declared at src/main/java/cpit305/fcit/kau/edu/sa/io/MyCharacterStreamsManager.java.
Note: we will also use the helper method in the Utils.java class to measure the execution time of the method in milliseconds.
publicstaticintcountWordsBuffered(File file) throws IOException {
long startTime = System.currentTimeMillis();
int wordCount = 0;
Utils.printExecutionTime(startTime);
return wordCount;
}
Evaluation and Key Findings
Add the execution time in ms to the following table and discuss which character stream approach should the developer use and why.
Method/Approach
Execution Time
Memory Usage
Ability to Handle Large Files
Best Use Case
Unbuffered Character Streams (countWordsUnbuffered)
Slower
Minimal
Yes, but slower
Small to medium text files
Buffered Character Streams (countWordsBuffered)
Faster
Moderate (buffer size in memory)
Yes
Large text files
Unbuffered Character Streams:
✅ Lowest memory usage
❌ Slower due to many I/O operations
❌ Not practical for large files
Buffered Character Streams:
✅ Faster due to reduced I/O operations
✅ Can handle files of any size
✅ Recommended approach for production use
💡 Typical buffer sizes: 4KB-8KB for general use, 64KB-128KB for large files
Task 3: Encoding Problems and Proper Character Encoding
Download this text document and open it in your text editor. You can see that the character encoding is different from what your system’s default encoding is (UTF-8). Write a method in Java that reads this document with the default encoding. Take a screenshot of the issues you encounter.
Next, implement a method that uses FileInputStream with InputStreamReader and BufferedReader to read the file with the given character encoding and count the number of words in it. The method should be named countWordsWithEncoding and declared at src/main/java/cpit305/fcit/kau/edu/sa/io/MyCharacterStreamsManager.java.
Note: we will also use the helper method in the Utils.java class to measure the execution time of the method in milliseconds.
If your instructor is using GitHub classroom, then you should click on your class submission link,
link your GitHub username to your name if you have not already done so, accept the assignment, clone the
repository into your local
development environment, and push the code to the remote repository on GitHub. Please make sure that your
written
answers are included in either a README (Markdown) file or a PDF file.
Lab dues dates are listed on GitHub classroom unless otherwise
noted.
If your instructor is using GitHub classroom, your submission will be
auto-graded
by running the included unit tests as well as manually graded for correctness, style, and quality.
How to submit your lab to GitHub Classroom
The video below demonstrates how to submit your work to GitHub classroom