My Byte of Code Blog. Tips and observations on creating software with Objective-C, C/C++, Python, Cocoa and Boost on Mac.

Saturday, November 20, 2010

Parse CSV File with Embedded New Lines Using Boost Tokenizer and C++

In my previous post, Parse CSV File With Boost Tokenizer in C++, I have shown how to use Boost Tokenizer to parse CSV files. The algorithm expected that the file contains one record per line.

However, CSV syntax allows for quoted fields to contain embedded line breaks (quoting wikipedia - should be enough to demonstrate this example). The example code from my previous post could not handle those embedded breaks.

As someone asked in comments if the code can handle them, here is a code that fixes that problem and handles embedded line breaks in quoted fields.