I have a .csv file which has ~3GB of data. I want to read all that data and process it. The following program reads the data from a file and stores it into a std::vector<std::vector<std::string>>. However, the program runs for too long and the application (vscode) freezes and needs to be restarted. What have I done wrong?
#include <algorithm>
#include <iostream>
#include <fstream>
#include "sheet.hpp"
extern std::vector<std::string> split(const std::string& str, const std::string& delim);
int main() {
Sheet sheet;
std::ifstream inputFile;
inputFile.open("C:/Users/1032359/cpp-projects/Straggler Job Analyzer/src/part-00001-of-00500.csv");
std::string line;
while(inputFile >> line) {
sheet.addRow(split(line, ","));
}
return 0;
}
// split and Sheet's member functions have been tested thoroughly and work fine. split has a complexity of N^2 though...
EDIT1: The file read has been fixed as per the suggestions in the comments.
The Split function:
std::vector<std::string> split(const std::string& str, const std::string& delim) {
std::vector<std::string> vec_of_tokens;
std::string token;
for (auto character : str) {
if (std::find(delim.begin(), delim.end(), character) != delim.end()) {
vec_of_tokens.push_back(token);
token = "";
continue;
}
token += character;
}
vec_of_tokens.push_back(token);
return vec_of_tokens;
}
EDIT2:
dummy csv row:
5612000000,5700000000,4665712499,798,3349189123,0.02698,0.06714,0.07715,0.004219,0.004868,0.06726,7.915e-05,0.0003681,0.27,0.00293,3.285,0.008261,0,0,0.01608
limits:
field1: starting timestamp (nanosecs)
field2: ending timestamp (nanosecs)
field3: job id (<= 1,000,000)
field4: task id (<= 10,000)
field5: machine id (<= 30,000,000)
field6: CPU time (sorry, no clue)
field7-20: no idea, unused for the current stage, but needed for later stages.
EDIT3: Required Output
remember the .thenby function in Excel?
the sorting order here is sort first on 5th column (1-based indexing), then on 3rd column and lastly on 4th column; all ascending.
I would start by defining a class to carry the information about one record and add overloads for operator>> and operator<< to help reading/writing records from/to streams. I'd probably add a helper to deal with the comma delimiter too.
First, the set of headers I've used:
#include <algorithm> // sort
#include <array> // array
#include <cstdint> // integer types
#include <filesystem> // filesystem
#include <fstream> // ifstream
#include <iostream> // cout
#include <iterator> // istream_iterator
#include <tuple> // tie
#include <vector> // vector
A simple delimiter helper could look like below. It discards (ignore()) the delimiter if it's in the stream or sets the failbit on the stream if the delimiter is not there.
template <char Char> struct delimiter {};
template <char Char> // read a delimiter
std::istream& operator>>(std::istream& is, const delimiter<Char>) {
if (is.peek() == Char) is.ignore();
else is.setstate(std::ios::failbit);
return is;
}
template <char Char> // write a delimiter
std::ostream& operator<<(std::ostream& os, const delimiter<Char>) {
return os.put(Char);
}
The actual record class can, with the information you've supplied, look like this:
struct record {
uint64_t start; // ns
uint64_t end; // ns
uint32_t job_id; // [0,1000000]
uint16_t task_id; // [0,10000]
uint32_t machine_id; // [0,30000000]
double cpu_time;
std::array<double, 20 - 6> unknown;
};
Reading such a record from a stream can then be done like this, using the delimiter class template (instantiated to use a comma and newline as delimiters):
std::istream& operator>>(std::istream& is, record& r) {
delimiter<','> del;
delimiter<'\n'> nl;
// first read the named fields
if (is >> r.start >> del >> r.end >> del >> r.job_id >> del >>
r.task_id >> del >> r.machine_id >> del >> r.cpu_time)
{
// then read the unnamed fields:
for (auto& unk : r.unknown) is >> del >> unk;
}
return is >> nl;
}
Writing a record is similarly done by:
std::ostream& operator<<(std::ostream& os, const record& r) {
delimiter<','> del;
delimiter<'\n'> nl;
os <<
r.start << del <<
r.end << del <<
r.job_id << del <<
r.task_id << del <<
r.machine_id << del <<
r.cpu_time;
for(auto&& unk : r.unknown) os << del << unk;
return os << nl;
}
Reading the whole file into memory, sorting it and then printing the result:
int main() {
std::filesystem::path filename = "C:/Users/1032359/cpp-projects/"
"Straggler Job Analyzer/src/part-00001-of-00500.csv";
std::vector<record> records;
// Reserve space for "3GB" / 158 (the length of a record + some extra bytes)
// records. Increase the 160 below if your records are actually longer on average:
records.reserve(std::filesystem::file_size(filename) / 160);
// open the file
std::ifstream inputFile(filename);
// copy everything from the file into `records`
std::copy(std::istream_iterator<record>(inputFile),
std::istream_iterator<record>{},
std::back_inserter(records));
// sort on columns 5-3-4 (ascending)
auto sorter = [](const record& lhs, const record& rhs) {
return std::tie(lhs.machine_id, lhs.job_id, lhs.task_id) <
std::tie(rhs.machine_id, rhs.job_id, rhs.task_id);
};
std::sort(records.begin(), records.end(), sorter);
// display the result
for(auto& r : records) std::cout << r;
}
The above process takes ~2 minutes on my old computer with spinning disks. If this is too slow, I'd measure the time of the long running parts:
reserve
copy
sort
Then, you can probably use that information to try to figure out where you need to improve it. For example, if sorting is a bit slow, it could help to use a std::vector<double> instead of a std::array<double, 20-6> to store the unnamed fields:
struct record {
record() : unknown(20-6) {}
uint64_t start; // ns
uint64_t end; // ns
uint32_t job_id; // [0,1000000]
uint16_t task_id; // [0,10000]
uint32_t machine_id; // [0,30000000]
double cpu_time;
std::vector<double> unknown;
};
I would suggest a slightly different approach:
Do NOT parse the entire row, only extract fields that are used for sorting
Note that your stated ranges require small number of bits, that together fit in one 64-bit value:
30,000,000 - 25 bit
10,000 - 14 bit
1,000,000 - 20 bit
Save a "raw" source in your vector, so that you can write it out as needed.
Here is what I got:
#include <fstream>
#include <iostream>
#include <string>
#include <vector>
#include <chrono>
#include <algorithm>
struct Record {
uint64_t key;
std::string str;
Record(uint64_t key, std::string&& str)
: key(key)
, str(std::move(str))
{}
};
int main()
{
auto t1 = std::chrono::high_resolution_clock::now();
std::ifstream src("data.csv");
std::vector<Record> v;
std::string str;
uint64_t key(0);
while (src >> str)
{
size_t pos = str.find(',') + 1;
pos = str.find(',', pos) + 1;
char* p(nullptr);
uint64_t f3 = strtoull(&str[pos], &p, 10);
uint64_t f4 = strtoull(++p, &p, 10);
uint64_t f5 = strtoull(++p, &p, 10);
key = f5 << 34;
key |= f3 << 14;
key |= f4;
v.emplace_back(key, std::move(str));
}
std::sort(v.begin(), v.end(), [](const Record& a, const Record& b) {
return a.key < b.key;
});
auto t2 = std::chrono::high_resolution_clock::now();
std::cout << std::chrono::duration_cast<std::chrono::milliseconds>(t2 - t1).count() << std::endl;
std::ofstream out("out.csv");
for (const auto& r : v) {
out.write(r.str.c_str(), r.str.length());
out.write("\n", 1);
}
auto t3 = std::chrono::high_resolution_clock::now();
std::cout << std::chrono::duration_cast<std::chrono::milliseconds>(t3 - t2).count() << std::endl;
}
Of course, you can reserve space in your vector upfront to avoid reallocation.
I've generated a file with 18,000,000 records. My timing shows ~30 second for reading / sorting the file, and ~200 seconds to write the output.
UPDATE:
Replaced streaming with out.write(), reduced writing time from 200 seconds to 17!
As an alternate way to solve this problem, I would suggest to not read all data in memory, but to use the minimum amount of RAM to sort the huge CSV file: a std::vector of line offsets.
The important thing is to understand the concept, not the precise implementation.
As the implementation only needs 8 bytes per line (in 64-bit mode), to sort the 3 GB data file, we only need roughly 150 MB of RAM. The drawback is that the parsing of numbers need to be done several times for the same line, roughly log2(17e6)= 24 times. However, I think that this overhead is partially compensated by the less memory used and no need to parse all numbers of the row.
#include <Windows.h>
#include <cstdint>
#include <vector>
#include <algorithm>
#include <array>
#include <fstream>
std::array<uint64_t, 5> readFirst5Numbers(const char* line)
{
std::array<uint64_t, 5> nbr;
for (int i = 0; i < 5; i++)
{
nbr[i] = atoll(line);
line = strchr(line, ',') + 1;
}
return nbr;
}
int main()
{
// 1. Map the input file in memory
const char* inputPath = "C:/Users/1032359/cpp-projects/Straggler Job Analyzer/src/part-00001-of-00500.csv";
HANDLE fileHandle = CreateFileA(inputPath, GENERIC_READ, FILE_SHARE_READ, NULL, OPEN_EXISTING, 0, NULL);
DWORD highsize;
DWORD lowsize = GetFileSize(fileHandle, &highsize);
HANDLE mappingHandle = CreateFileMapping(fileHandle, NULL, PAGE_READONLY, highsize, lowsize, NULL);
size_t fileSize = (size_t)lowsize | (size_t)highsize << 32;
const char* memoryAddr = (const char*)MapViewOfFile(mappingHandle, FILE_MAP_READ, 0, 0, fileSize);
// 2. Find the offset of the start of lines
std::vector<size_t> linesOffset;
linesOffset.push_back(0);
for (size_t i = 0; i < fileSize; i++)
if (memoryAddr[i] == '\n')
linesOffset.push_back(i + 1);
linesOffset.pop_back();
// 3. sort the offset according to some logic
std::sort(linesOffset.begin(), linesOffset.end(), [memoryAddr](const size_t& offset1, const size_t& offset2) {
std::array<uint64_t, 5> nbr1 = readFirst5Numbers(memoryAddr + offset1);
std::array<uint64_t, 5> nbr2 = readFirst5Numbers(memoryAddr + offset2);
if (nbr1[4] != nbr2[4])
return nbr1[4] < nbr2[4];
if (nbr1[2] != nbr2[2])
return nbr1[2] < nbr2[2];
return nbr1[4] < nbr2[4];
});
// 4. output sorted array
const char* outputPath = "C:/Users/1032359/cpp-projects/Straggler Job Analyzer/output/part-00001-of-00500.csv";
std::ofstream outputFile;
outputFile.open(outputPath);
for (size_t offset : linesOffset)
{
const char* line = memoryAddr + offset;
size_t len = strchr(line, '\n') + 1 - line;
outputFile.write(line, len);
}
}
Four digit numbers stored in a file are written in ASCII and separated by "Space". How do I read them as Integers?
Example file:
53545153 49575150 56485654 53565257 52555756 51534850 56575356 56505055 55525453
Here is what I tried:
ifstream infile("afile2.txt");
if (infile.is_open())
{
string str2;
char c;
while (!infile.eof())
{
getline(infile, str2, ' ');
for (std::string::iterator it = str2.begin(); it != str2.end(); ++it)
cout << (char)*it;
cout << " ";
}
}
infile.close();
In the above code (char)*it is picking only first digit but ASCII number start at 2 digit number i.e. 48.
Four digit numbers stored in a file are written in ASCII and separated by "Space", How do I read them as Integers. Example file: 53545153 49575150 56485654 53565257 52555756 51534850 56575356 56505055 55525453
Those look like 8 digit numbers.
To read a space separated number from a file simple use operator>> from a stream to an integer.
int value;
if (stream >> value) {
// Successfully read a number.
}
If you want to read all the values from a file. You can use a loop:
int value;
while (stream >> value) {
// Enter the body of the loop each time a number is read.
}
Note: Your usage of eof() is bad practice:
while (!infile.eof()) {
// If you enter here the file may be open and readable
// BUT there may be no data left in the file and thus the next
// attempt to read will fail if there is no data.
//
// This happens because the last successful read will read up-to
// but not past the EOF. So you have read all the data but not read
// past the EOF so eof() will return false.
}
More Info
So how do we read 2 digit numbers from groups of 8 digit larger numbers that are space separated.
Well we want to make it work like standard stream readding so we still want to use the operator>> to read from the stream. But none of the built in types read two digit numbers. So we need to define our own class that will read a two digit number.
struct TwoDigit
{
int value; // store the result here
operator int() {return value;} // Convert TwoDigit to integer
};
std::ostream& operator<<(std::ostream& str, TwoDigit const& data) {
str << data.value; // You can do something more complicated
// for printing but its not the current question
// so I am just going to dump the value out.
}
std::istream& operator>>(std::istream& str, TwoDigit& data) {
char c1 = 'X';
char c2 = 'Y';
if (str >> c1 >> c2) {
// successfully read two characters from the stream.
// Note >> automatically drops white space (space return etc)
// so we don't need to worry about that.
if (('0' <= c1 && c1 <= '9') && ('0' <= c2 && c2 <= '9')) {
// We have all good data.
// So let us update the vale.
data.value = ((c1 - '0') * 10) + (c2 - '0');
}
else {
// We have bad data read from the stream.
// So lets mark the stream as bad;
str.clear(std::ios::failbit);
}
}
return str;
}
Now in your code you can simply read
TwoDigit data;
if (stream >> data) {
// Read a single two digit value correctly.
}
// or for a loop:
while(stream >> data) {
// Keep reading data from the stream.
// Each read will consume two digits.
}
// or if you want to fill a vector from a stream.
std::vector<TwoDigit> data(std::istream_iterator<TwoDigit>(stream),
std::istream_iterator<TwoDigit>());
// You can even create a vector of int just as easily.
// Because the TwoDigit has an `operator int()` to convert to int.
std::vector<int> data(std::istream_iterator<TwoDigit>(stream),
std::istream_iterator<TwoDigit>());
This could be an approach if I've understood the problem correctly.
#include <cmath>
#include <iostream>
#include <string>
#include <vector>
std::vector<int> conv(std::istream& is) {
std::vector<int> retval;
std::string group;
while(is >> group) { // read "53545153" for example
int mul =
static_cast<int>(std::pow(10, (group.size() / 2) - 1)); // start at 1000
int res = 0;
for(size_t i = 0; i < group.size(); i += 2, mul /= 10) {
// convert "53" to dec ASCII char 53 ('5') and then to an int 5 and
// multiply by 1000 (mul)
res += (((group[i] - '0') * 10 + (group[i + 1] - '0')) - '0') * mul;
}
retval.emplace_back(res); // store
}
return retval;
}
Testing the function:
#include <sstream>
int main() {
std::istringstream is(
"53545153 49575150 56485654 53565257 52555756 51534850 56575356 56505055 55525453");
auto r = conv(is);
for(int x : r) {
std::cout << x << "\n";
}
}
Output:
5635
1932
8086
5849
4798
3502
8958
8227
7465
I am trying to overload operator>> for a custom PriorityQueue class I've been writing, code is below:
/**
* #brief Overloaded stream extraction operator.
*
* Bitshift operator>>, i.e. extraction operator. Used to write data from an input stream
* into a targeted priority queue instance. The data is written into the queue in the format,
*
* \verbatim
[item1] + "\t" + [priority1] + "\n"
[item2] + "\t" + [priority2] + "\n"
...
* \endverbatim
*
* #todo Implement functionality for any generic Type and PriorityType.
* #warning Only works for primitives as template types currently!
* #param inStream Reference to input stream
* #param targetQueue Instance of priority queue to manipulate with extraction stream
* #return Reference to input stream containing target queue data
*/
template<typename Type, typename PriorityType> std::istream& operator>>(std::istream& inStream, PriorityQueue<Type, PriorityType>& targetQueue) {
// vector container for input storage
std::vector< std::pair<Type, PriorityType> > pairVec;
// cache to store line input from stream
std::string input;
std::getline(inStream, input);
if (typeid(inStream) == typeid(std::ifstream)) {
inStream.ignore(std::numeric_limits<std::streamsize>::max(), '\n');
}
// loop until empty line
while (!input.empty()) {
unsigned int first = 0;
// loop over input cache
for (unsigned int i = 0; i < input.size(); ++i) {
// if char at index i of cache is a tab, break from loop
if (input.at(i) == '\t')
break;
++first;
}
std::string data_str = input.substr(0, first);
// convert from std::string to reqd Type
Type data = atoi(data_str.c_str());
std::string priority_str = input.substr(first);
// convert from std::string to reqd PriorityType
PriorityType priority = atof(priority_str.c_str());
pairVec.push_back(std::make_pair(data, priority));
// get line from input stream and store in input string
std::getline(inStream, input);
}
// enqueue pairVec container into targetQueue
//targetQueue.enqueueWithPriority(pairVec);
return inStream;
}
This currently works for stdin or std::cin input however it doesn't work for fstream input - the very first getline always reads an empty line from the input such that the while loop never gets triggered, and I can't seem to skip it (I tried with inStream.ignore() as you can see above but this doesn't work.
Edit:
Currently I just want to get it working for file input ignoring the fact it only works for int data type and double priority type right now - these aren't relevant (and neither is the actual manipulation of the targetQueue object itself).
For the moment I'm just concerned with resolving the blank-line issue when trying to stream through file-input.
Example file to pass:
3 5.6
2 6.3
1 56.7
12 45.1
where the numbers on each line are \t separated.
Example testing:
#include "PriorityQueue.h"
#include <sstream>
#include <iostream>
#include <fstream>
int main(void) {
// create pq of MAX binary heap type
PriorityQueue<int, double> pq(MAX);
std::ifstream file("test.txt");
file >> pq;
std::cout << pq;
}
where "test.txt" is the in the format of the example file above.
Edit: Simpler Example
Code:
#include <iostream>
#include <fstream>
#include <vector>
class Example {
public:
Example() {}
size_t getSize() const { return vec.size(); }
friend std::istream& operator>>(std::istream& is, Example& example);
private:
std::vector< std::pair<int, double> > vec;
};
std::istream& operator>>(std::istream& is, Example& example) {
int x;
double y;
while (is >> x >> y) {
std::cout << "in-loop" << std::endl;
example.vec.push_back(std::make_pair(x, y));
}
return is;
}
int main(void) {
Example example;
std::ifstream file("test.txt");
file >> example;
file.close();
std::cout << example.getSize() << std::endl;
return 0;
}
The operator is already overloaded -- and shall be overloaded -- for many types. Let those functions do their work:
template<typename Type, typename PriorityType>
std::istream& operator>>(std::istream& inStream, PriorityQueue<Type, PriorityType>& targetQueue)
{
std::vector< std::pair<Type, PriorityType> > pairVec;
Type data;
PriorityType priority;
while(inStream >> data >> priority)
pairVec.push_back(std::make_pair(data, priority));
targetQueue.enqueueWithPriority(pairVec);
return inStream;
}
So I believe the values from the text file go into the vector 'com', what i'm trying to do is recognize the direction then take the value next to direction, set to tmp variable, continue to read, if the direction occurs again, add the combine variable then override tmp variable, set final tmp variable to be passed on to another class. If Repeat has been 'seen' it looks at the last direction used and take the repeat value and add it to last direction used, Any help will be appreciated, sorry for any confusion in questioning
file1.txt:
Forward 2
Left 20
Forward 1
Repeat 3
fileReader.cpp
#include <iostream>
#include <float>
#include <vector>
using namespace std;
int main()
{
ifstream file("text1.txt");
string word;
vector<float> com;
while (file >> word)
{
if(std::find(std.begin(com), std.end(com), Forward) != com.end())
{
}
if(std::find(std.begin(com), std.end(com), Jump) != com.end())
{
}
if(std::find(std.begin(com), std.end(com), Left) != com.end()))
{
}
if(std::find(std.begin(com), std.end(com), Right) != com.end()))
{
}
if ((std::find(std.begin(com), std.end(com), Repeat) != com.end()))
{
}
}
}
You can use a map for parsing your input:
int main()
{
ifstream file("text1.txt"); //open file
if(!file) { /* file could not be opened */ } //and check whether it can be used
std::map<std::string, float> com;
std::string lastCom; //last command for use in "Repeat"
std::string line;
while (std::getline(file, line)) //read a line at once until file end
{
if(line.empty()) continue; //and continue if it is empty
std::string tempCom;
float tempVal;
std::stringstream ss(line); //extract command and value
ss >> tempCom;
ss >> tempVal;
if(tempCom == "Repeat")
{
com[lastCom] += tempVal; //add the value to the last command
}
else
{
com[tempCom] += tempVal; //add the value to the current command
lastCom = tempCom; //and update last command
}
}
}
The code is untested.
Well, I prefered to rewrite your code from scratch using a different container instead of filling the blanks:
#include <iostream>
#include <map>
#include <fstream>
#include <string>
int main() {
std::map<std::string,int> moves{{"Forward", 0}, {"Left", 0}, {"Right", 0},
{"Jump", 0 }, {"Repeat", 0}};
auto iRepeat = moves.find("Repeat");
auto iold = moves.end();
std::ifstream iFile("text1.txt");
if ( !iFile.good() ) return 1;
std::string s;
int x; // There aren't floats in your file...
while ( iFile >> s >> x ) {
auto im = moves.find(s);
if ( im == iRepeat ) {
if ( iold == moves.end() ) continue; // there must be a move to repeat
iold->second += x;
} else if ( im != moves.end() ){
im->second += x; // update the move
iold = im;
}
}
iFile.close();
for ( auto i = moves.begin(); i != moves.end(); i++ ) {
if ( i != iRepeat ) // shows only the moves
std::cout << i->first << ": " << i->second << std::endl;
}
return 0;
}
The output is:
Forward: 6
Jump: 0
Left: 20
Right: 0
I'm looking for some analog scanf("%1d", &sequence) for std::cin >> sequence.
For example:
for ( ; scanf("%1d", &sequence) == 1; ) {
printf("%d ", sequence);
}
stdin: 5341235
stdout: 5 3 4 1 2 3 5
How does it work in C++ ?!
for ( ; std::cin >> *some_magic* sequence; ) {
std::cout << sequence << " ";
}
you can do this if you want (the sequence variable must be of type char)
for ( ; std::cin.read(&sequence,1); ) {
sequence-='0';
std::cout << sequence << " ";;
}
With respect to input parsing there are a number of features unfortunately missing from IOStreams which are present for scanf(). Setting a field width for numeric types is one of them (another one is matching strings in inputs). Assuming you want to stay with formatted input, one way to deal with it is to create a filtering stream buffer which injects a space character after a given number of characters.
Another approach consists of writing a custom std::num_get<char> facet, to imbue() it into the current stream, and then just set up width. Instead of injecting spaces the actual character parsing would observe if either the end of the stream is reached or the number of allowed characters is exceeded. The corresponding code to use this facet would set up a custom std::locale but otherwise look like one would expect:
int main() {
std::istringstream in("1234567890123456789");
std::locale loc(std::locale(), new width_num_get);
in.imbue(loc);
int w(0);
for (int value(0); in >> std::setw(++w) >> value; ) {
std::cout << "value=" << value << "\n";
}
}
Here is a somewhat naive implementation of a corresponding std::num_get<char> facet which just collects the appropriate digits (assuming base 10) and then just calls std::stoi() to get the value converted. It can be done more flexible and more efficient but you get the picture:
#include <iostream>
#include <streambuf>
#include <sstream>
#include <locale>
#include <string>
#include <iomanip>
#include <cctype>
struct width_num_get
: std::num_get<char> {
auto do_get(iter_type it, iter_type end, std::ios_base& fmt,
std::ios_base::iostate& err, long& value) const
-> iter_type override {
int width(fmt.width(0)), count(0);
if (width == 0) {
width = -1;
}
std::string digits;
if (it != end && (*it == '-' || *it == '+')) {
digits.push_back(*it++);
++count;
}
while (it != end && count != width && std::isdigit(static_cast<unsigned char>(*it))) {
digits.push_back(*it);
++it;
++count;
}
try { value = std::stol(digits); }
catch (...) { err |= std::ios_base::failbit; } // should probably distinguish overflow
return it;
}
};
The first described approach could use code like this for reading integers with increasing width (I'm using different width to show that it can flexibly be set):
int main() {
std::istringstream in("1234567890123456789");
int w(0);
for (int value(0); in >> fw(++w) >> value; ) {
std::cout << "value=" << value << "\n";
}
}
Of course, the entire magic is in the little fw() which is a custom manipulator: it installs a filtering stream buffer if the currently used stream buffer isn't of the appropriate type and set the number for characters after which the a space should be injected. The filtering stream buffer reads individual characters and simply injects a space after the corresponding number of characters. The code could be something like this (which currently doesn't do clean-up once the stream is done - I'll add that next):
#include <iostream>
#include <streambuf>
#include <sstream>
class fieldbuf
: public std::streambuf {
std::streambuf* sbuf;
int width;
char buffer[1];
int underflow() {
if (this->width == 0) {
buffer[0] = ' ';
this->width = -1;
}
else {
int c = this->sbuf->snextc();
if (c == std::char_traits<char>::eof()) {
return c;
}
buffer[0] = std::char_traits<char>::to_char_type(c);
if (0 < this->width) {
--this->width;
}
}
this->setg(buffer, buffer, buffer + 1);
return std::char_traits<char>::to_int_type(buffer[0]);
}
public:
fieldbuf(std::streambuf* sbuf): sbuf(sbuf), width(-1) {}
void setwidth(int width) { this->width = width; }
};
struct fw {
int width;
fw(int width): width(width) {}
};
std::istream& operator>> (std::istream& in, fw const& width) {
fieldbuf* fbuf(dynamic_cast<fieldbuf*>(in.rdbuf()));
if (!fbuf) {
fbuf = new fieldbuf(in.rdbuf());
in.rdbuf(fbuf);
static int index = std::ios_base::xalloc();
in.pword(index) = fbuf;
in.register_callback([](std::ios_base::event ev, std::ios_base& stream, int index){
if (ev == std::ios_base::copyfmt_event) {
stream.pword(index) = 0;
}
else if (ev == std::ios_base::erase_event) {
delete static_cast<fieldbuf*>(stream.pword(index));
stream.pword(index) = 0;
}
}, index);
}
fbuf->setwidth(width.width);
return in;
}