I am trying to write a parser to read large text file in C++. Similar python code using readtable method is approximately 7 to 8 times faster.
I am wonder why it runs so slow in C++. Most of the time is taken in using istringstream to parse lines to separate table numbers. It will be great if someone can point issue with code or alternative to istringstream. The code is as below:
'''
#include <fstream>
#include <iostream>
#include <string>
#include <sstream>
#include <vector>
#include <algorithm>
#include <chrono>
using namespace std::chrono;
int main()
{
auto start = high_resolution_clock::now();
std::ifstream inf{ "/Users/***/some.bed" };
std::istringstream iss;
int aprox_nlines = 7000000;
std::vector<int>* ptr_st = new std::vector<int>();
std::vector<int>& start_v = *ptr_st;
start_v.reserve(aprox_nlines);
std::vector<int>* ptr_en = new std::vector<int>();
std::vector<int>& end_v = *ptr_en;
end_v.reserve(aprox_nlines);
// If we couldn't open the output file stream for reading
if (!inf)
{
// Print an error and exit
std::cerr << "Uh oh, File could not be opened for reading!" << std::endl;
return 1;
}
int count=0;
std::string line;
int sstart;
int end_val;
std::string val;
if (inf.is_open())
{
while (getline(inf, line))
{
count += 1;
iss.str(line);
iss >> val;
iss >> sstart;
start_v.push_back(sstart);
iss >> end_val;
end_v.push_back(end_val);
}
std::cout << count<<"\n";
inf.close();
}
auto stop = high_resolution_clock::now();
auto duration = duration_cast<microseconds>(stop - start);
std::cout << "Time taken by function: " << duration.count() << " microseconds" <<"\n";
return 0;
}
'''
It seems using FILE * = fopen() it runs much better. It is around 10 times faster than istringstream. Compared to python inbuilt (readtable) function it is 33% faster.
'''
FILE * ifile = fopen("*/N.bed", "r");
size_t linesz = 60+1;
char * nline = new char[linesz];
char T[50], S[50];
int sn,en;
unsigned int i = 0;
while(getline(&nline, &linesz, ifile) > 0) {
i++;
//std::cout<<nline<<"\n";
sscanf(nline, "%s %d %d", T, &sn, &en);
start_v.push_back(sn);
end_v.push_back(en);
//std::cout<<T<<" "<< S <<"\n";
}
'''
Related
Hi I have got an text file and inside writing:
15 7 152 3078
178 352 1 57
What I want to do is get the int's from first line, sum up the numbers and make it an integer. And than do it for the second line with another int. How can I do that with c++? Thanks for your help.
You can use stringstream to convert a string into integer. And to sum a vector of integer, use accumulate algorithm. You can pass a filename as first argument to the program, by default the program assume the filename as input.txt.
Here is a complete program to demonstrate this.
#include <iostream>
#include <fstream>
#include <sstream>
#include <numeric> // for accumulate
#include <vector>
int main(int argc, char *argv[]) {
std::string filename{"input.txt"};
if(argc > 1) {
filename = argv[1];
}
// open the input file
std::ifstream inputFile(filename);
if(!inputFile.is_open()) {
std::cerr << "Unable to open " << filename << std::endl;
return 1;
}
std::string line;
// read the file line by line
while(getline(inputFile, line)) {
if(line.empty()) continue;
std::stringstream ss(line);
std::vector<int> v;
int x;
// extract the content as integer from line
while(ss >> x) {
v.push_back(x);
}
// add them all
auto total = std::accumulate(v.begin(), v.end(), 0);
std::cout << total << std::endl;
}
}
As in Aamir's answer, but with separate listing of sums and numbers per line. Maybe that helps too.
#include <iostream>
#include <fstream>
#include <sstream>
#include <vector>
int main(int argc, char *argv[]) {
std::string filename{"input.txt"};
if(argc > 1) {
filename = argv[1];
}
// open the input file
std::ifstream inputFile(filename);
if(!inputFile.is_open()) {
std::cerr << "Unable to open " << filename << std::endl;
return 1;
}
std::vector<int> v_numberSum;
std::vector<int> v_numbersPerLine;
std::string line;
// read the file line by line
int i = 0;
while(getline(inputFile, line)) {
if(line.empty()) continue;
v_numberSum.push_back(0);
v_numbersPerLine.push_back(0);
std::stringstream f(line);
std::string s;
int cnt = 0;
while (getline(f, s, ' ')) {
v_numberSum[i] += std::stoi(s);
cnt++;
}
v_numbersPerLine[i] = cnt;
i++;
}
int j = 0;
for (auto intItem: v_numberSum){
std::cout << "sum"<<j<<": " << intItem << " numbers per line: " << v_numbersPerLine[j] << std::endl;
j++;
}
}
I'm trying to read and parse my CSV files in C++ and ran into an error.
The CSV has 1-1000 rows and always 8 columns.
Generally what i would like to do is read the csv and output only lines that match a filter criteria. For example column 2 is timestamp and only in a specific time range.
My problem is that my program cuts off some lines.
At the point where the data is in the string record variable its not cutoff. As soon as I push it into the map of int/vector its cutoff. Am I doing something wrong here?
Could someone help me identify what the problem truly is or maybe even give me a better way to do this?
#include <iostream>
#include <fstream>
#include <string>
#include <vector>
#include <sstream>
#include <iostream>
#include <map>
#include "csv.h"
using std::cout; using std::cerr;
using std::endl; using std::string;
using std::ifstream; using std::ostringstream;
using std::istringstream;
string readFileIntoString(const string& path) {
auto ss = ostringstream{};
ifstream input_file(path);
if (!input_file.is_open()) {
cerr << "Could not open the file - '"
<< path << "'" << endl;
exit(EXIT_FAILURE);
}
ss << input_file.rdbuf();
return ss.str();
}
int main()
{
int filterID = 3;
int filterIDIndex = filterID;
string filter = "System";
/*Filter ID's:
0 Record ID
1 TimeStamp
2 UTC
3 UserID
4 ObjectID
5 Description
6 Comment
7 Checksum
*/
string filename("C:/Storage Card SD/Audit.csv");
string file_contents;
std::map<int, std::vector<string>> csv_contents;
char delimiter = ',';
file_contents = readFileIntoString(filename);
istringstream sstream(file_contents);
std::vector<string> items;
string record;
int counter = 0;
while (std::getline(sstream, record)) {
istringstream line(record);
while (std::getline(line, record, delimiter)) {
items.push_back(record);
cout << record << endl;
}
csv_contents[counter] = items;
//cout << csv_contents[counter][0] << endl;
items.clear();
counter += 1;
}
I can't see a reason why you data is being cropped, but I have refactored you code slightly and using this it might be easier for you to debug the problem, if it doesn't just disappear on its own.
int main()
{
string path("D:/Audit.csv");
ifstream input_file(path);
if (!input_file.is_open())
{
cerr << "Could not open the file - '" << path << "'" << endl;
exit(EXIT_FAILURE);
}
std::map<int, std::vector<string>> csv_contents;
std::vector<string> items;
string record;
char delimiter = ';';
int counter = 0;
while (std::getline(input_file, record))
{
istringstream line(record);
while (std::getline(line, record, delimiter))
{
items.push_back(record);
cout << record << endl;
}
csv_contents[counter] = items;
items.clear();
++counter;
}
return counter;
}
I have tried your code and (after fixing the delimiter) had no problems, but I only had three lines of data, so if it is a memory issue it would have been unlikely to show.
Hello I'm having problem with dividing two doubles on C++, on a basic C++ it's working fine, for example
double DfirstNumber = 4.3;
double DsecondNumber = 2.0;
double DthirdNumber = DfirstNumber/DsecondNumber;
std::cout << DthirdNumber;
but not on the code below:
#include <iostream>
#include <fstream>
#include <string>
#include <Windows.h>
#include <regex>
std::string getLastLine(std::ifstream& in)
{
std::string line;
while (in >> std::ws && std::getline(in, line)) // skip empty lines
;
return line;
}
int main()
{
double DfirstNumber;
double DsecondNumber;
while(true){
std::ifstream file("C:\\Users\\Admin\\Documents\\myfile.txt");
if (file)
{
std::string line = getLastLine(file);
Sleep(1000);
try {
std::regex re("\\d*\\.\\d*");
std::sregex_iterator next(line.begin(), line.end(), re);
int i = 0;
std::string firstNumber;
std::string secondNumber;
while (i < 3) {
std::smatch match = *next;
if (i == 1) {
firstNumber = match.str();
DfirstNumber = std::stof(firstNumber);
}
if (i == 2) {
secondNumber = match.str();
DsecondNumber = std::stof(secondNumber);
}
next++;
i++;
}
}
catch (std::regex_error&) {
std::cout << "regex error";
}
double DthirdNumber = DsecondNumber) / DfirstNumber;
std::cout << DthirdNumber << std::endl;
}
else{
std::cout << "Can't Open the file.\n";
}
}
}
I'm getting every second a new line on myfile.txt so I had to check the last line of file, then using regular expression to get the desired datas and store them to C++ variables.
This is how myfile.txt looks
Hello Name Name0.00042Surname NameSurname Name0.00042$100.03
Hello Name Name0.00143Surname NameSurname Name0.00143$100.53
Hello Name Name0.00342Surname NameSurname Name0.00342$100.32
..............................................^1stNr^.^2ndNr^
... and another program just continues to extract lines like these every second!
Could anyone explain to me why is this happening, because if I'm trying to divide for example
DfirstNumber = 407.33
DsecondNumber = 0.015982
I'm not getting 25486.79764735327 but I'm getting 25489.8
I'm a beginner in c++ and required to write a c++ program to read and print a csv file like this.
DateTime,value1,value2
12/07/16 13:00,3.60,50000
14/07/16 20:00,4.55,3000
May I know how can I proceed with the programming?
I manage to get the date only via a simple multimap code.
I spent some time to make almost (read notice at the end) exact solution for you.
I assume that your program is a console application that receives the original csv-file name as a command line argument.
So see the following code and make required changes if you like:
#include <iostream>
#include <fstream>
#include <sstream>
#include <vector>
#include <map>
#include <string>
std::vector<std::string> getLineFromCSV(std::istream& str, std::map<int, int>& widthMap)
{
std::vector<std::string> result;
std::string line;
std::getline(str, line);
std::stringstream lineStream(line);
std::string cell;
int cellCnt = 0;
while (std::getline(lineStream, cell, ','))
{
result.push_back(cell);
int width = cell.length();
if (width > widthMap[cellCnt])
widthMap[cellCnt] = width;
cellCnt++;
}
return result;
}
int main(int argc, char * argv[])
{
std::vector<std::vector<std::string>> result; // table with data
std::map<int, int> columnWidths; // map to store maximum length (value) of a string in the column (key)
std::ifstream inpfile;
// check file name in the argv[1]
if (argc > 1)
{
inpfile.open(argv[1]);
if (!inpfile.is_open())
{
std::cout << "File " << argv[1] << " cannot be read!" << std::endl;
return 1;
}
}
else
{
std::cout << "Run progran as: " << argv[0] << " input_file.csv" << std::endl;
return 2;
}
// read from file stream line by line
while (inpfile.good())
{
result.push_back(getLineFromCSV(inpfile, columnWidths));
}
// close the file
inpfile.close();
// output the results
std::cout << "Content of the file:" << std::endl;
for (std::vector<std::vector<std::string>>::iterator i = result.begin(); i != result.end(); i++)
{
int rawLen = i->size();
for (int j = 0; j < rawLen; j++)
{
std::cout.width(columnWidths[j]);
std::cout << (*i)[j] << " | ";
}
std::cout << std::endl;
}
return 0;
}
NOTE: Your task is just to replace a vector of vectors (type std::vector<std::vector<std::string>> that are used for result) to a multimap (I hope you understand what should be a key in your solution)
Of course, there are lots of possible solutions for that task (if you open this question and look through the answers you will understand this).
First of all, I propose to consider the following example and to try make your task in the simplest way:
#include <iostream>
#include <sstream>
#include <vector>
#include <string>
using namespace std;
int main()
{
string str = "12/07/16 13:00,3.60,50000";
stringstream ss(str);
vector<string> singleRow;
char ch;
string s = "";
while (ss >> ch)
{
s += ch;
if (ss.peek() == ',' || ss.peek() == EOF )
{
ss.ignore();
singleRow.push_back(s);
s.clear();
}
}
for (vector<string>::iterator i = singleRow.begin(); i != singleRow.end(); i++)
cout << *i << endl;
return 0;
}
I think it can be useful for you.
Code(main.cpp) (C++):
#include <string>
#include <sstream>
#include <stdio.h>
#include <stdlib.h>
#include <ctime>
//general vars
std::ofstream ofs;
std::ifstream ifs;
std::stringstream ss;
//spamFiles vars
std::string defPath;
int defAmt;
void spamFiles(std::string paramPath);
int main(int argc, const char * argv[])
{
srand(time_t(NULL));
std::cout << "Enter the amount of files: ";
std::cin >> ::defAmt;
std::cout << "Now enter the target path: ";
std::cin >> ::defPath;
::spamFiles(::defPath);
std::cout << defAmt << " files were created." << std::endl;
return 0;
}
void spamFiles (std::string paramPath){
//system("open -a Terminal .");
for(int i = 0; i < ::defAmt; i++){
std::string tempS;
int ranNum = rand() % 501;
ss << ranNum;
std::string ssResult = ss.str();
std::string finalPath = ::defPath + ssResult + ".txt";
ifs.open(finalPath);
if(ifs.good()){
finalPath += "dupe.txt";
while(ifs.good()){
finalPath += "dupe.txt";
ifs.open(finalPath);
}
}
ofs.open(finalPath);
ofs << "";
ofs.close();
ss.str(std::string());
}
return;
}
My problem is following.
Whenever I run this and enter, lets say 53 as for the amount, in the end it'll never create the full amount of files. It's always scaled.
Here's an example.
Defined Amont: 300 -> What I Get: 240
Defined Amount: 20 -> What I get: 15
Defined Amount: 600 -> What I get: 450
Thanks in advance.
Based on the logic of your code, you are creating a file if your ifstream object is not 'good()'. If some files aren't being created, then the error lies here.
With some digging, you'll find that the constructor for an ifstream object does not take a string, but instead a char *.
Adding a c_str() to your 'finalPath' variable should take care of this issue.
Some things to note:
You've forgotten to include fstream and iostream.
When digging into problems like this, don't use random numbers as your first test case. It was easier for me to replicate your issue by just trying to create files in numerical order.
Also don't forget 'close()' your ifstreams!
My adaptation of the code:
#include <string>
#include <sstream>
#include <stdio.h>
#include <stdlib.h>
#include <ctime>
#include <fstream>
#include <iostream>
//general vars
std::ofstream ofs;
std::ifstream ifs;
std::stringstream ss;
//spamFiles vars
std::string defPath;
int defAmt;
void spamFiles(std::string paramPath);
int main(int argc, const char * argv[])
{
srand(time_t(NULL));
std::cout << "Enter the amount of files: ";
std::cin >> ::defAmt;
std::cout << "Now enter the target path: ";
std::cin >> ::defPath;
::spamFiles(::defPath);
std::cout << defAmt << " files were created." << std::endl;
return 0;
}
void spamFiles (std::string paramPath){
//system("open -a Terminal .");
for(int i = 0; i < ::defAmt; i++){
std::string tempS;
int ranNum = rand() % 501;
ss << ranNum;
std::string ssResult = ss.str();
std::string finalPath = ::defPath + ssResult + ".txt";
ifs.open(finalPath.c_str());
while(ifs.good()){
finalPath += "dupe.txt";
ifs.open(finalPath.c_str());
}
ifs.close();
std::cout << finalPath << std::endl;
ofs.open(finalPath.c_str());
ofs << "";
ofs.close();
ss.str(std::string());
}
return;
}