'Performance of Win32 memory mapped files vs. CRT fopen/fread

I need to read (scan) a file sequentially and process its content. File size can be anything from very small (some KB) to very large (some GB).

I tried two techniques using VC10/VS2010 on Windows 7 64-bit:

  1. Win32 memory mapped files (i.e. CreateFile, CreateFileMapping, MapViewOfFile, etc.)
  2. fopen and fread from CRT.

I thought that memory mapped file technique could be faster than CRT functions, but some tests showed that the speed is almost the same in both cases.

The following C++ statements are used for MMF:

HANDLE hFile = CreateFile(
    filename,
    GENERIC_READ,
    FILE_SHARE_READ,
    NULL,
    OPEN_EXISTING,
    FILE_FLAG_SEQUENTIAL_SCAN,
    NULL
    );

HANDLE hFileMapping = CreateFileMapping(
    hFile,
    NULL,
    PAGE_READONLY,
    0,
    0,
    NULL
    );

The file is read sequentially, chunk by chunk; each chunk is SYSTEM_INFO.dwAllocationGranularity in size.

Considering that speed is almost the same with MMF and CRT, I'd use CRT functions because they are simpler and multi-platform. But I'm curious: am I using the MMF technique correctly? Is it normal that MMF performance in this case of scannig file sequentially is the same as CRT one?

Thanks.



Solution 1:[1]

I believe you'll not see much difference if you access the file sequentially. Because file I/O is very heavily cached, + read-ahead is probably also used.

The thing would be different if you had many "jumps" during the file data processing. Then, each time setting a new file pointer and reading a new file portion will probably kill CRT, whereas MMF will give you the maximum possible performance

Solution 2:[2]

Since you are scanning the file sequentially I would not expect disk usage pattern to be much different for either method.

For large files, MMF might reduce data locality and even result in a copy of all or part of the file being placed in the pagefile, whereas processing via CRT using a small buffer would all take place in RAM. In this instance, MMF would probably be slower. You can mitigate this by only mapping in part of the underlying file at a time, but then things get more complex without any likely win over direct sequential I/O.

MMF are really the way Windows implements inter-process shared memory, rather than a way to speed up generalized file I/O. The file manager cache in the kernel is what you really need to leverage here.

Solution 3:[3]

I thought that memory mapped file technique could be faster than CRT functions, but some tests showed that the speed is almost the same in both cases.

You are probably hitting the file system cache for your tests. Unless you explicitly create file handles to bypass the file system cache (FILE_FLAG_NO_BUFFERING when calling CreateFile), the file system cache will kick in and keep recently accessed files in memory.

There is a small speed difference between reading a file that is in the file system cache with buffering turned on, as the operating system has to perform an extra copy, as well as system call overhead. But for your purposes, you should probably stick with the CRT file functions.

Gustavo Duarte has a great article on memory mapped files (from a generic OS perspective).

Solution 4:[4]

Both methods will eventually come down to disk i/o, that will be your bottleneck. I would go with one method that my higher level functionality likes more - if i have need streaming, I'll go with files, if I need sequential access and fixed size files, I would consider memory mapped files.

Or, in case when you have an algorithm that works only on memory, then mem-mapped files can be easier way out.

Solution 5:[5]

Using ReadFile:

  • Enters Kernel Mode
  • Does a memcpy from the Disk Cache
  • If data isn't in the Disk Cache, triggers a Page Fault which makes the Cache Manager read data from the disk.
  • Exits Kernel Mode
  • Cost of entering and leaving Kernel Mode was about 1600 CPU cycles when I measured it.
  • Avoid small reads, since every call to ReadFile has the overhead of entering and leaving Kernel Mode.

Memory Mapped Files:

  • Basically places the Disk Cache right into your application's address space.
  • If data is in the cache, you just read it.
  • If data isn't there, triggers a Page Fault that makes the Cache Manager read data from the disk. (There is a User/Kernel mode transition to handle this exception)
  • Disk reads don't always succeed. You need to be able to handle memory exceptions from the system, otherwise a disk read failure will be an application crash.

So both ways will use the same Disk Cache, and will use the same mechanism of getting data into the cache (Page Fault exception -> Cache Manager reads data from the disk). The Cache Manager is also responsible for doing data prefetching and such, so it can read more than one page at a time. You don't get a page fault on every memory page.

So the main advantages of Memory-Mapped files are:

  • Can possibly use data in-place without copying it out first
  • Fewer User<->Kernel Mode transitions (depends on access patterns)

And the disadvantages are:

  • Need to handle access violation exceptions for failed disk reads
  • Takes up address space in the program to map entire files

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 valdo
Solution 2
Solution 3
Solution 4 Daniel Mošmondor
Solution 5 Dwedit