I'm looking to read from std::in with a syntax as below (it is always int, int, int, char[]/str). What would be the fastest way to parse the data into an int array[3] and either a string or char array.
#NumberOfLines(i.e.10000000)
1,2,2,'abc'
2,2,2,'abcd'
1,2,3,'ab'
...1M+ to 10M+ more lines, always in the form of (int,int,int,str)
At the moment, I'm doing something along the lines of.
//unsync stdio
std::ios_base::sync_with_stdio (false);
std::cin.tie(NULL);
//read from cin
for(i in amount of lines in stdin){
getline(cin,str);
if(i<3){
int commaindex = str.find(',');
string substring = str.substr(0,commaindex);
array[i]=atoi(substring.c_str());
str.erase(0,commaindex+1)
}else{
label = str;
}
//assign array and label to other stuff and do other stuff, repeat
}
I'm quite new to C++ and recently learned profiling with Visual Studio however not the best at interpreting it. IO takes up 68.2% and kernel takes 15.8% of CPU usage. getline() covers 35.66% of the elapsed inclusive time.
Is there any way I can do something similar to reading large chunks at once to avoid calling getline() as much? I've been told fgets() is much faster, however, I'm unsure of how to use it when I cannot predict the number of characters to specify.
I've attempted to use scanf as follows, however it was slower than getline method. Also have used `stringstreams, but that was incredibly slow.
scanf("%i,%i,%i,%s",&array[0],&array[1],&array[2],str);
Also if it matters, it is run on a server with low memory available. I think reading the entire input to buffer would not be viable?
Thanks!
Update: Using #ted-lyngmo approach, gathered the results below.
time wc datafile
real 4m53.506s
user 4m14.219s
sys 0m36.781s
time ./a.out < datafile
real 2m50.657s
user 1m55.469s
sys 0m54.422s
time ./a.out datafile
real 2m40.367s
user 1m53.523s
sys 0m53.234s
You could use std::from_chars (and reserve() the approximate amount of lines you have in the file, if you store the values in a vector for example). I also suggest adding support for reading directly from the file. Reading from a file opened by the program is (at least for me) faster than reading from std::cin (even with sync_with_stdio(false)).
Example:
#include <algorithm> // std::for_each
#include <cctype> // std::isspace
#include <charconv> // std::from_chars
#include <cstdio> // std::perror
#include <fstream>
#include <iostream>
#include <iterator> // std::istream_iterator
#include <limits> // std::numeric_limits
struct foo {
int a[3];
std::string s;
};
std::istream& operator>>(std::istream& is, foo& f) {
if(std::getline(is, f.s)) {
std::from_chars_result fcr{f.s.data(), {}};
const char* end = f.s.data() + f.s.size();
// extract the numbers
for(unsigned i = 0; i < 3 && fcr.ptr < end; ++i) {
fcr = std::from_chars(fcr.ptr, end, f.a[i]);
if(fcr.ec != std::errc{}) {
is.setstate(std::ios::failbit);
return is;
}
// find next non-whitespace
do ++fcr.ptr;
while(fcr.ptr < end &&
std::isspace(static_cast<unsigned char>(*fcr.ptr)));
}
// extract the string
if(++fcr.ptr < end)
f.s = std::string(fcr.ptr, end - 1);
else
is.setstate(std::ios::failbit);
}
return is;
}
std::ostream& operator<<(std::ostream& os, const foo& f) {
for(int i = 0; i < 3; ++i) {
os << f.a[i] << ',';
}
return os << '\'' << f.s << "'\n";
}
int main(int argc, char* argv[]) {
std::ifstream ifs;
if(argc >= 2) {
ifs.open(argv[1]); // if a filename is given as argument
if(!ifs) {
std::perror(argv[1]);
return 1;
}
} else {
std::ios_base::sync_with_stdio(false);
std::cin.tie(nullptr);
}
std::istream& is = argc >= 2 ? ifs : std::cin;
// ignore the first line - it's of no use in this demo
is.ignore(std::numeric_limits<std::streamsize>::max(), '\n');
// read all `foo`s from the stream
std::uintmax_t co = 0;
std::for_each(std::istream_iterator<foo>(is), std::istream_iterator<foo>(),
[&co](const foo& f) {
// Process each foo here
// Just counting them for demo purposes:
++co;
});
std::cout << co << '\n';
}
My test runs on a file with 1'000'000'000 lines with content looking like below:
2,2,2,'abcd'
2, 2,2,'abcd'
2, 2, 2,'abcd'
2, 2, 2, 'abcd'
Unix time wc datafile
1000000000 2500000000 14500000000 datafile
real 1m53.440s
user 1m48.001s
sys 0m3.215s
time ./my_from_chars_prog datafile
1000000000
real 1m43.471s
user 1m28.247s
sys 0m5.622s
From this comparison I think one can see that my_from_chars_prog is able to successfully parse all entries pretty fast. It was consistently faster at doing so than wc - a standard unix tool whos only purpose is to count lines, words and characters.
I have written a program to read a CSV file but I'm having some trouble in extracting data from that CSV file in c++. I want to count the no. of columns starting from the 5th column in the 1st row until the last column of the 1st row of the CSV file. I have written the following code to read a CVS file, but I am not sure how shall I count the no. of columns as I have mentioned before.
Will appreciate it if anyone could please tell me how shall I go about it?
char* substring(char* source, int startIndex, int endIndex)
{
int size = endIndex - startIndex + 1;
char* s = new char[size+1];
strncpy(s, source + startIndex, size); //you can read the documentation of strncpy online
s[size] = '\0'; //make it null-terminated
return s;
}
char** readCSV(const char* csvFileName, int& csvLineCount)
{
ifstream fin(csvFileName);
if (!fin)
{
return nullptr;
}
csvLineCount = 0;
char line[1024];
while(fin.getline(line, 1024))
{
csvLineCount++;
};
char **lines = new char*[csvLineCount];
fin.clear();
fin.seekg(0, ios::beg);
for (int i=0; i<csvLineCount; i++)
{
fin.getline(line, 1024);
lines[i] = new char[strlen(line)+1];
strcpy(lines[i], line);
};
fin.close();
return lines;
}
I have attached a few lines from the CSV file:-
Province/State,Country/Region,Lat,Long,1/22/20,1/23/20,1/24/20,
,Afghanistan,33.0,65.0,0,0,0,0,0,0,0,
,Albania,41.1533,20.1683,0,0,0,0
What I need is, in the 1st row, the number of dates after Long.
To answer your question:
I have attached a few lines from the CSV file:-
Province/State,Country/Region,Lat,Long,1/22/20,1/23/20,1/24/20, ,Afghanistan,33.0,65.0,0,0,0,0,0,0,0, ,Albania,41.1533,20.1683,0,0,0,0
What I need is, in the 1st row, the number of dates after Long.
Yeah, not that difficult - that's how I would do it:
#include <iostream>
#include <string>
#include <fstream>
#include <regex>
#define FILENAME "test.csv" //Your filename as Macro
//(The compiler just sees text.csv instead of FILENAME)
void read(){
std::string n;
//date format pattern %m/%dd/%YY
std::regex pattern1("\\b\\d{1}[/]\\d{2}[/]\\d{2}\\b");
//date format pattern %mm/%dd/%YY
std::regex pattern2("\\b\\d{2}[/]\\d{2}[/]\\d{2}\\b");
std::smatch result1, result2;
std::ifstream file(FILENAME, std::ios::in);
if ( ! file.is_open() )
{
std::cout << "Could not open file!" << '\n';
}
do{
getline(file,n,',');
//https://en.cppreference.com/w/cpp/string/basic_string/getline
if(std::regex_search(n,result1,pattern1))
std::cout << result1.str(1) << n << std::endl;
if(std::regex_search(n,result2,pattern2))
std::cout << result2.str(1) << n << std::endl;
}
while(!file.eof());
file.close();
}
int main ()
{
read();
return 0;
}
The file test.csv contains the following for testing:
Province/State,Country/Region,Lat,Long,1/22/20,1/23/20,1/24/20, ,Afghanistan,33.0,65.0,0,0,0,0,0,0,0, ,Albania,41.1533,20.1683,0,0,0,0
Province/State,Country/Region,Lat,Long,1/25/20,12/26/20,1/27/20, ,Bfghanistan,33.0,65.0,0,0,0,0,0,0,0, ,Blbania,41.1533,20.1683,0,0,0,0
It actually is pretty simple:
getline takes the open file and "escapes" at a so called escape-charachter,
in your case a comma ','.
(That is the very best way I found in reading csv - you can replace it with whatever you want, for example: ';' or ' ' or '...' - guess you get the drill)
After this you got all data nicely separated underneath one another without a comma.
Now you can "filter" out what you need. I use regex - but use what ever you want.
(Just fyi: For c++ tagged questions you shouldn't use c-style like strncpy..)
I gave you an example for 1.23.20 (m/dd/yy) and to make it simple if your file contains a november or december like 12.22.20 (mm/dd/yy) to make
the regex pattern more easy to read/understand in 2 lines.
you can/may have to expand the regex pattern if the data somehow matches
your date format in the file, really good explained here and not as complicated as it looks.
From that point you can put all the printed stuff f.e. in a vector (some more convenient array) to handle and/or pass/return data - that's up to you.
If you need more explaining I am happy to help you out and/or expand this example, just leave a comment.
You basically want to search for the seperator substring within your line (normally it is ';').
If you print out your lines it should look like this:
a;b;c;d;e;f;g;h
There are several ways to achieve what you want, I would look for a strip or split upon character function. Something along the example below should work. If you use std you can go with str.IndexOf instead of a loop.
int rows(char* line,char seperator, int count) {
unsigned length = strlen(line);
for (int i=pos; i<length;i++){
if(strcmp(line[i],seperator)) break;
}
count++;
if (i<length-1) return rows(substring(line,i,length-i),seperator,count);
else return count;
}
The recursion can obviously be replaced by one loop ;)
int countSign(char* line, char* sign){
unsigned l = strlen(line);
int count = 0;
for (int i=0; i < l; i++) {
if(strcmp(line[i],sign)) count++;
}
}
I'm having slight trouble creating a 2D Vector of String that's created by reading values from a text file. I initially thought I needed to use an array. however I've come to realise that a vector would be much more suited to what I'm trying to achieve.
Here's my code so far:
I've initialised the vector globally, but haven't given it the number of rows or columns because I want that to be determined when we read the file:
vector<vector<string>> data;
Test data in the file called "test" currently looks like this:
test1 test2 test3
blue1 blue2 blue3
frog1 frog2 frog3
I then have a function that opens the file and attempts to copy over the strings from text.txt to the vector.
void createVector()
{
ifstream myReadFile;
myReadFile.open("text.txt");
while (!myReadFile.eof()) {
for (int i = 0; i < 5; i++){
vector<string> tmpVec;
string tmpString;
for (int j = 0; j < 3; j++){
myReadFile >> tmpString;
tmpVec.push_back(tmpString);
}
data.push_back(tmpVec);
}
}
}
However, when I attempt to check the size of my vector in my main function, it returns the value '0'.
int main()
{
cout << data.size();
}
I think I just need a pair of fresh eyes to tell me where I'm going wrong. I feel like the issues lies within the createVector function, although I'm not 100% sure.
Thank you!
You should use std::getline to get the line of data first, then extract each string from the line and add to your vector. This avoids the while -- eof() issue that was pointed out in the comments.
Here is an example:
#include <string>
#include <iostream>
#include <vector>
#include <sstream>
typedef std::vector<std::string> StringArray;
std::vector<StringArray> data;
void createVector()
{
//...
std::string line, tempStr;
while (std::getline(myReadFile, line))
{
// add empty vector
data.push_back(StringArray());
// now parse the line
std::istringstream strm(line);
while (strm >> tempStr)
// add string to the last added vector
data.back().push_back(tempStr);
}
}
int main()
{
createVector();
std::cout << data.size();
}
Live Example
I am very new to coding, and have been practicing with some easy problems at codeforces.com. I was working on this problem, but it seemed to be asking for the input (all at once) yielding the output (all at once). I can only figure out how to get one output at a time.
Here are the basic instructions for the problem:
Input
The first line contains an integer n (1 ≤ n ≤ 100). Each of the following n lines contains one word. All the words consist of lowercase Latin letters and possess the lengths of from 1 to 100 characters.
Output
Print n lines. The i-th line should contain the result of replacing of the i-th word from the input data.
Examples
input
4
word
localization
internationalization
pneumonoultramicroscopicsilicovolcanoconiosis
output
word
l10n
i18n
p43s
Here is my code:
#include <iostream>
#include <string>
using namespace std;
void wordToNumbers(string word){
int midLetters = word.length();
char firstLetter = word.front();
char lastLetter = word.back();
cout <<firstLetter <<(midLetters-2) <<lastLetter <<endl;
}
int main(){
string wordInput;
string firstNum;
getline(cin,firstNum);
int i = stoi(firstNum);
for(i>=1; i--;){
getline(cin,wordInput);
if (wordInput.length() > 10){
wordToNumbers(wordInput);
} else {
cout <<wordInput <<endl;
}
}
return 0;
}
it's perfectly fine to read and print output for the lines one by one.
Exactly your solution accepted: http://codeforces.com/contest/71/submission/16659519
I'm also a beginner in c++. My idea would be to save every line first in a buffer and then write everything to std::cout.
I use a std::vector as the buffer, cause IMO it is simple to understand and very useful in many cases. Basically it is a better array. You can read more about std::vector here.
#include <iostream>
#include <string>
//for use of std::vector container
#include <vector>
using namespace std;
void wordToNumbers(string word){
int midLetters = word.length();
char firstLetter = word.front();
char lastLetter = word.back();
cout <<firstLetter <<(midLetters-2) <<lastLetter <<endl;
}
int main(){
string wordInput;
string firstNum;
//container for buffering all our strings
vector<string> bufferStrings;
getline(cin,firstNum);
int i = stoi(firstNum);
//read line by line and save every line in our buffer-container
for(i>=1; i--;){
getline(cin,wordInput);
//append the new string to our buffer
bufferStrings.push_back(wordInput);
}
//now iterate through the buffer and write everything to cout
for(int index = 0; index < bufferStrings.size(); ++index) {
if (bufferStrings[index].length() > 10){
wordToNumbers(bufferStrings[index]);
} else {
cout <<bufferStrings[index] <<endl;
}
}
return 0;
}
Probably this is not the best or most beautiful solution, but it should be easy to understand :)
I'm writing a program in C++ to take input from a text file (dates and the high/low temp of that day), split the dates and temps into two separate arrays. I have the process down; however, I can't seem to split the strings appropriately. I've tried different methods with getline() and .get, but I need to keep the strings as STRINGS, not an array of chars. I've looked into and read the answers for similar questions using vectors and strtock and there's only one problem: I'm still fairly new, and the more I look into them, the more confused I get.
If I am to use that method in solving my problem, I just need to be pointed in the right direction of how to apply it. Apologies for my noobishness, it's just easy to get overwhelmed with all of the different ways to solve one problem with C++ (which is the reason I enjoy using it so much. ;))!
Sample from text:
10/12/2007 56 87
10/13/2007 66 77
10/14/2007 65 69
etc.
The dates need to be stored in one array, and the temps (both high and low) in another.
Here's what I have (unfinished, but for reference nonetheless)
int main()
//Open file to be read
ifstream textTemperatures;
textTemperatures.open("temps1.txt");
//Initialize arrays.
const int DAYS_ARRAY_SIZE = 32,
TEMPS_ARRAY_SIZE = 65;
string daysArray[DAYS_ARRAY_SIZE];
int tempsArray[TEMPS_ARRAY_SIZE];
int count = 0;
while(count < DAYS_ARRAY_SIZE && !textTemperatures.eof())
{
getline(textTemperatures, daysArray[count]);
cout << daysArray[count] << endl;
count++;
}
Thanks everyone.
Try the following
#include <iostream>
#include <fstream>
#include <sstream>
//...
std::ifstream textTemperatures( "temps1.txt" );
const int DAYS_ARRAY_SIZE = 32;
std::string daysArray[DAYS_ARRAY_SIZE] = {};
int tempsArray[2 * DAYS_ARRAY_SIZE] = {};
int count = 0;
std::string line;
while ( count < DAYS_ARRAY_SIZE && std::getline( textTemperatures, line ) )
{
std::istringstream is( line );
is >> daysArray[count];
is >> tempsArray[2 * count];
is >> tempsArray[2 * count + 1];
}
Here is a simple program that read the formatted input. You can easily replace std::cin with your std::ifstream and do whatever you want with the data inside the loop.
#include <iostream>
#include <string>
#include <vector>
int main ()
{
std::vector<std::string> dates;
std::vector<int> temperatures;
std::string date;
int low, high;
while ((std::cin >> date >> low >> high))
{
dates.push_back(date);
temperatures.push_back(low);
temperatures.push_back(high);
}
}
The magic here is done by std::cin's operator>> which reads up to the first whitespace encountered (tabulation, space or newline) and stores the value inside the right operand.