C fread read whole file

Reading And Writing CSV Files With C ++

As a data scientist, reading and writing data from / to CSV is one of the most common tasks I do on the daily. R, my language of choice, makes this easy with and (although I tend to use and from the data.table package).

Hot take. C ++ is not R.

As far as I know, there is no CSV reader / writer built into the C ++ STL. That's not a knock against C ++; it's just a lower level language. If we want to read and write CSV files with C ++, we’ll have to deal with file I / O, data types, and some low level logic on how to read, parse, and write data. For me, this is a necessary step in order to build and test more fun programs like machine learning models.

Writing to CSV

We'll start by creating a simple CSV file with one column of integer data. And we'll give it the header Foo.

Here, ofstream is an “output file stream”. Since it's derived from ostream, we can treat it just like cout (which is also derived from ostream). The result of executing this program is that we get a file called foo.csv in the same directory as our executable. Let's wrap this into a function that's a little more dynamic.

Cool. Now we can use to write a vector of integers to a CSV file with ease. Let's expand on this to support multiple vectors of integers and corresponding column names.

Here we've represented each column of data as a of, and the whole dataset as a of such columns. Now we can write a variable number of integer columns to a CSV file.

Reading from CSV

Now that we've written some CSV files, let's attempt to read them. For now let's correctly assume that our file contains integer data plus one row of column names at the top.

This program reads our previously created CSV files and writes each dataset to a new file, essentially creating copies of our original files.

Going further

So far we've seen how to read and write datasets with integer values only. Extending this to read / write a dataset of only doubles or only strings should be fairly straight-forward. Reading a dataset with unknown, mixed data types is another beast and beyond the scope of this article, but see this code review for possible solutions.

Special thanks to papagaga and Incomputable for helping me with this topic via codereview.stackexchange.com.