<?xml version="1.0" encoding="utf-8" standalone="no"?>
<!DOCTYPE document PUBLIC "-//CNX//DTD CNXML 0.5//EN" "http://cnx.rice.edu/technology/cnxml/schema/dtd/0.5/cnxml_plain.dtd">
<document xmlns="http://cnx.rice.edu/cnxml" xmlns:md="http://cnx.rice.edu/mdml/0.4" xmlns:bib="http://bibtexml.sf.net/" id="new">
  <name xmlns:md="http://cnx.rice.edu/mdml/0.4" xmlns:bib="http://bibtexml.sf.net/">Portable Large File Access By Memory Mapped I/O in C++ Using Boost</name>
  <metadata xmlns:md="http://cnx.rice.edu/mdml/0.4" xmlns:bib="http://bibtexml.sf.net/">
  <md:version xmlns:bib="http://bibtexml.sf.net/">1.1</md:version>
  <md:created xmlns:bib="http://bibtexml.sf.net/">2008/02/22 19:45:41.441 US/Central</md:created>
  <md:revised xmlns:bib="http://bibtexml.sf.net/">2008/02/22 22:00:32.307 US/Central</md:revised>
  <md:authorlist xmlns:bib="http://bibtexml.sf.net/">
      <md:author xmlns:bib="http://bibtexml.sf.net/" id="jsaaymi">
      <md:firstname xmlns:bib="http://bibtexml.sf.net/">Prakash</md:firstname>
      
      <md:surname xmlns:bib="http://bibtexml.sf.net/">Manandhar</md:surname>
      <md:email xmlns:bib="http://bibtexml.sf.net/">jsaaymi@gmail.com</md:email>
    </md:author>
  </md:authorlist>

  <md:maintainerlist xmlns:bib="http://bibtexml.sf.net/">
    <md:maintainer xmlns:bib="http://bibtexml.sf.net/" id="jsaaymi">
      <md:firstname xmlns:bib="http://bibtexml.sf.net/">Prakash</md:firstname>
      
      <md:surname xmlns:bib="http://bibtexml.sf.net/">Manandhar</md:surname>
      <md:email xmlns:bib="http://bibtexml.sf.net/">jsaaymi@gmail.com</md:email>
    </md:maintainer>
  </md:maintainerlist>
  
  <md:keywordlist xmlns:bib="http://bibtexml.sf.net/">
    <md:keyword xmlns:bib="http://bibtexml.sf.net/">Boost</md:keyword>
    <md:keyword xmlns:bib="http://bibtexml.sf.net/">C++</md:keyword>
    <md:keyword xmlns:bib="http://bibtexml.sf.net/">Large File</md:keyword>
    <md:keyword xmlns:bib="http://bibtexml.sf.net/">Memory Mapped I/O</md:keyword>
  </md:keywordlist>

  <md:abstract xmlns:bib="http://bibtexml.sf.net/">This is a short tutorial describing how to read large files (for example larger than 4GB) using memory mapped i/o in the Boost C++ API.</md:abstract>
</metadata>
  <content xmlns:md="http://cnx.rice.edu/mdml/0.4" xmlns:bib="http://bibtexml.sf.net/">
    <para xmlns:md="http://cnx.rice.edu/mdml/0.4" xmlns:bib="http://bibtexml.sf.net/" id="delete_me">Reading large files in C++ can be tricky and platform dependent. Most modern operating systems support large file access in some form or another, and support for large file access is getting better. However, it can be tricky at times to write portable C++ code to read large files. <link xmlns:md="http://cnx.rice.edu/mdml/0.4" xmlns:bib="http://bibtexml.sf.net/" src="http://www.boost.org">Boost</link> is a portable C++ source library which I have used to write portable code to read/write large files.</para><para xmlns:md="http://cnx.rice.edu/mdml/0.4" xmlns:bib="http://bibtexml.sf.net/" id="element-76"><code xmlns:md="http://cnx.rice.edu/mdml/0.4" xmlns:bib="http://bibtexml.sf.net/">boost.iostreams</code> has a <code xmlns:md="http://cnx.rice.edu/mdml/0.4" xmlns:bib="http://bibtexml.sf.net/">mapped_file_source</code> class that can be used to map files in the file system to arrays in memory, in read-only mode (there is another sink class for write only access). Although this supports pretty large files (1 GB or more), very large files can be problematic. Part of the interface for the class is shown below:</para><code xmlns:md="http://cnx.rice.edu/mdml/0.4" xmlns:bib="http://bibtexml.sf.net/" type="block">namespace boost { namespace iostreams {

class mapped_file_source {
public:
    ...
    explicit mapped_file_source( const std::string&amp; path,
                                 size_type length = max_length,
                                 boost::intmax_t offset = 0 );
    void open( const std::string&amp; path,
               size_type length = max_length,
               boost::intmax_t offset = 0 );
    bool is_open() const;
    void close();
    ...
};

} } // End namespace boost::io</code><para xmlns:md="http://cnx.rice.edu/mdml/0.4" xmlns:bib="http://bibtexml.sf.net/" id="element-946">We can either use the constructor or the open method to create the mapped source. If we create a map that is too large by setting the length parameter, to for example 4GB, we will get an access denied runtime exception. This tells us that we should try a smaller length value. Let us call this the page size (this is set to 1GB in the example below by using: <code xmlns:md="http://cnx.rice.edu/mdml/0.4" xmlns:bib="http://bibtexml.sf.net/">#define MMAP_SIZE 1073741824</code>). Then we can use a simple page shift process while we are serially reading the file. Random access can also be programmed but it might be slightly complicated. An example implementation for serial access is given below:</para><code xmlns:md="http://cnx.rice.edu/mdml/0.4" xmlns:bib="http://bibtexml.sf.net/" type="block">#define MMAP_SIZE 1073741824

unsigned long int page_start = 0;
unsigned long int file_pointer = 0;
unsigned long int page = 0;

bio::mapped_file_source m_file(RF_DATA_FILE, MMAP_SIZE, 0);

// reads specified number of bytes from the file
void read_bytes (void * buffer, unsigned long int num_bytes)
{
    assert (num_bytes &lt; MMAP_SIZE);
    const unsigned long int end_pointer = file_pointer + num_bytes;
    if (end_pointer &gt;= MMAP_SIZE) // repage
        {
            m_file.close ();
            m_file.open (RF_DATA_FILE, MMAP_SIZE, file_pointer);
            page_start += file_pointer;
            file_pointer = 0;
        }
        memcpy(buffer, m_file.data() + file_pointer, num_bytes);
        file_pointer += num_bytes;
}</code>   
  </content>
  
</document>
