05 - files

Reading and writing files doesn't differ much from using other streams - the core difference is initialization.

File streams

File streams are an extension of already presented streams, they have few more functions specifically for dealing with files.

There are 3 main types (all defined in <fstream>):

  • std::ifstream - input file stream

  • std::ofstream - output file stream

  • std::fstream - file stream supporting both input and output

Generally you should use one of the first 2 as situations where the same file is both read and written are very rare.

Opening a file

2 things must be provided: path and mode.

How exactly a path is interpreted depends on the platform and its file system - C++ standard library will pass it unmodified to the underlying system interface. Generally:

  • If an absolute path is given (e.g. /etc/ssh/shh_config) the file will be searched at this exact location.

  • If a relative path is given (e.g. ssh_config or ssh/shh_config - basically any path without root (root is / on unix systems, drive letter with : on Windows)) a file will be searched at the location that is combined from executable's working directory and the given path.

The mode further specifies behavior

  • std::ios_base::app (only for output streams) - any data will be appended at the end of the file

  • std::ios_base::trunc (only for output streams) - truncate the file (as if the file was removed and recreated)

  • std::ios_base::ate (only for input streams) - seek to the end of file after opening (as if entire file has already been read)

  • std::ios_base::bin - open the file in binary mode (default is textual)

  • std::ios_base::in - open the file for reading, added automatically for input streams (but not added for streams that support both directions)

  • std::ios_base::out - open the file for writing, added automatically for output streams (but not added for streams that support both directions)

  • Append flag is commonly used for logging and other diary-like files which are expected to grow over time as the application logs more information (similar to >> redirection in many shells).

  • Truncate flag is commonly used for files that are expected to be refreshed and rewritten every time the application is relaunched (similar to > redirection in many shells).

  • Binary mode is used when the file contains binary data (not text) to avoid any automatic convertion and when the convertion of textual data is undesirable.

  • Other flags are hardly ever used.

We can say that file streams have a "cursor" mechanism (formally called input and output positions). These positions specify at which offset read or write operation happens (2-directional streams will have 2 cursors) (the app**end and **ate flags modify their initial value). The position can be checked and changed using various functions which names start with tell and seek. Because standard library streams have many complex layers and I have never seen these functions used directly in production code, I don't see any value in presenting them.

Binary mode

There are 2 modes in which a C++ stream can operate on a file:

  • binary - data is read and written exactly as given (in terms of bytes)

  • text - some characters are changed to accomodate for platform-specific conventions of textual files:

File to string

A common need is to read entire file into one string object for later processing. There are tons of ways to do this in C++ and there has been somewhat a hot debate what is most idiomatic or most performant. In addition to this, specific applications should not actually read entire file into one object (imagine multi-gigabyte database) but read specific chunks and process them as more data is put into the buffer of limited size - such approach reduces memory usage and allows for concurrent disk operations and data processing.

Still, for majority of applications the benefits of concurrent reading/writing/processing (and any other performance improvements) are not worth the trouble caused by complication - majority of programs read small files (megabytes at best) or files which need to be read entirely anyway (images, sounds, models and other files for games and simulatory applications).

For these reasons, here is a couple of functions that can be used to read entire file contents to 1 array-like object:

TODO paste function implementations from notes