Sometimes it is necessary to combine two sorted files into one sorted file. For example, with data stored on tape. This is sometimes called an external sort, because it is not performed inside the computer’s memory.
Before this can be done it is essential that both source files share a common key (the field to be sorted on) and that both source files are already sorted.
Suppose we want to merge two source files, that are already sorted in ascending order, to produce a new destination file which is also sorted in ascending order, as shown below.
We would normally begin by checking that the source files have some data inside them before proceeding. Then the steps are as follows:
- Read the first value from each source file and compare them. Here, the smaller of the first two values is 1 from FileTwo, so this is written to the new file.
- Compare the first value in FileOne with the second value in FileTwo. Since 4 is smaller than 7, the value 4 is written to the new file next.
- Compare the second value in FileOne with the second value in FileTwo. Since 5 is less than 7, the value 5 is written to the new file next.
- Compare the third value in FileOne with the second value in FileTwo. Since 6 is less than 7, the value 6 is written to the new file next .
- Compare the fourth value in FileOne with the second value in FileTwo. Since 9 is bigger than 7, the value 7 is written to the new file next.
- Compare the fourth value in FileOne with the third value in FileTwo. Since 8 is less than 9, the value 8 is written to the new file next.
- Compare the fourth value in FileOne with the fourth value in FileTwo. Since 9 is less than 10, the value 9 is written to the new file next.
- At this point FileOne has been fully checked, so the remaining values in FileTwo can be written across one after another.
This algorithm can be expressed in simple terms as follows:
Check source files are not empty
Get first item from each file
REPEAT
examine the current item in each source file
copy the smallest item to the new file
get next item from source file that this item was copied from
UNTIL input files are empty
IF only one input file exhausted THEN
copy remaining items from other file to new file
END IF
A more complete description of the algorithm looks like this:
open existing files
create new file
check existing files are not empty
use pointers/counters to identify records for comparison
REPEAT
compare records indicated by pointers
IF records are different THEN
copy earlier value record to new file
move appropriate pointer
ELSE
copy one record only to new file
move both pointers
END IF
UNTIL end of one file
copy remaining records from other file
close files
Note the following points about this algorithm:
- You need to check that the source files have some data in them first
- You must open the two source files and create a new destination file
- If a value is copied to the destination file, you can advance to the next record in the file it came from
- Duplicates need to be handled. They are ignored in this case.
- When one source file is exhausted, the remaining records from the other source file can be copied to the destination file directly.
This merge process assumes two things:
- Both source files share a common key
- Records in each source file are already sorted according to this key
NOTE:
The algorithm for merging two sorted files is very similar to the algorithm for merging two sorted lists, which is an important aspect of the merge sort.