Summary: This is a short tutorial describing how to read large files (for example larger than 4GB) using memory mapped i/o in the Boost C++ API.
Reading large files in C++ can be tricky and platform dependent. Most modern operating systems support large file access in some form or another, and support for large file access is getting better. However, it can be tricky at times to write portable C++ code to read large files. Boost is a portable C++ source library which I have used to write portable code to read/write large files.
boost.iostreams has a mapped_file_source class that can be used to map files in the file system to arrays in memory, in read-only mode (there is another sink class for write only access). Although this supports pretty large files (1 GB or more), very large files can be problematic. Part of the interface for the class is shown below:
namespace boost { namespace iostreams {
class mapped_file_source {
public:
...
explicit mapped_file_source( const std::string& path,
size_type length = max_length,
boost::intmax_t offset = 0 );
void open( const std::string& path,
size_type length = max_length,
boost::intmax_t offset = 0 );
bool is_open() const;
void close();
...
};
} } // End namespace boost::ioWe can either use the constructor or the open method to create the mapped source. If we create a map that is too large by setting the length parameter, to for example 4GB, we will get an access denied runtime exception. This tells us that we should try a smaller length value. Let us call this the page size (this is set to 1GB in the example below by using: #define MMAP_SIZE 1073741824). Then we can use a simple page shift process while we are serially reading the file. Random access can also be programmed but it might be slightly complicated. An example implementation for serial access is given below:
#define MMAP_SIZE 1073741824
unsigned long int page_start = 0;
unsigned long int file_pointer = 0;
unsigned long int page = 0;
bio::mapped_file_source m_file(RF_DATA_FILE, MMAP_SIZE, 0);
// reads specified number of bytes from the file
void read_bytes (void * buffer, unsigned long int num_bytes)
{
assert (num_bytes < MMAP_SIZE);
const unsigned long int end_pointer = file_pointer + num_bytes;
if (end_pointer >= MMAP_SIZE) // repage
{
m_file.close ();
m_file.open (RF_DATA_FILE, MMAP_SIZE, file_pointer);
page_start += file_pointer;
file_pointer = 0;
}
memcpy(buffer, m_file.data() + file_pointer, num_bytes);
file_pointer += num_bytes;
}