Want to read important double value at the end of line of istream C++ - c++

I'm trying to read in a large matrix calculated from a text file for a finite element code. The matrix is spatially dependent though and thus I need to be able to conveniently organize the data. The outside source that calculated the values for the matrix was kind enough to put the following lines at the top of the text file
No. activity levels : 3
No. pitch-angles : 90
No. energies : 11
No. L-shells : 10
Which basically tell me the number of positions the matrix is known at. I want to be able to easily pick out these values because it will allow me to preallocate the size of the matrix, as well as know immediately how much I need to interpolate for values not given by this text file. I am trying to do that with the following code
#include<iostream>
#include<fstream>
#include<stdio.h>
#include<stdlib.h>
#include<string.h>
#include<vector>
using namespace std;
int main(){
string diffusionTensorFileName = "BAS_drift_averaged_chorus_kp.txt";
string sline;
int alphaSize=0;
ifstream diffusionTensorFile(diffusionTensorFileName.c_str());
while(getline(diffusionTensorFile,sline)){
if(strncmp(sline.c_str(),"No. pitch-angles : 90",sline.size()-1)==0 && sline.size()-1 != 0){
alphaSize = atoi(sline.c_str());
printf("alphaSize %d \n", alphaSize);
vector<double> alpha(alphaSize);
}
}
}
atoi of course doesn't work very well, and I can't seem to get strtod or any of those functions to work either. Any thoughts? I'm also open to this being the completely wrong way to do this and alternate suggestions on how to proceed.

I think the easiest way would be to use the scan_is method of the std::ctype facet imbued in the streams locale. Its job is to search for first character that matches a given classification and return a pointer to it. We'll take the result of that call and use std::stoi (C++11) to parse it into an integer.
std::locale loc(diffusionTensorFile.getloc());
auto& f = std::use_facet<std::ctype<char>>(loc);
while (std::getline(diffusionTensorFile, sline))
{
const char* begin = sline.front(),
end = sline.back() + 1;
const char* result;
if ((result = f.scan_is(f.digit, begin, end)) != end)
{
alphaSize = std::stoi(result);
// do something with alphaSize
}
}
Live Demo

Related

Unable to ignore the escape characters from a text file stream & store in a wchar_t [ ] in C++

I am trying to read data from a text file using C++ & store the strings at each line into wchar_t [] or LPCWSTR.
(These 2 datatypes are the constraints of the application on which I am working. That's why I have to store the data in these datatypes)
The format of data in the .txt file is, for example:
abc\\def\\ghi 10
jkl\\mnopq\\rstq 20
aqq\\sdsds\\qc 30
I am trying to read data line by line & save each line as a map's key-value pair, where key is of type LPCWSTR or wchar_t[] type & value is of int type
There is no issue in extracting int, but the issue comes in reading the strings
Here is my code:
#include<iostream>
#include<fstream>
#include<windows.h>
#include<cstdlib>
using namespace std;
int main()
{
wchar_t test1[260];
const char* s = "Hello\\ABC\\DEF";
mbstowcs(test1, s, strlen(s));
wcout<<test1<<endl;
wchar_t gr[260];
string gr_temp;
int percentage;
ifstream ifs;
ifs.open("data.txt", ifstream::in);
if (ifs.is_open()) {
while (ifs >> gr_temp >> percentage){
const char* source = gr_temp.c_str();
mbstowcs(gr, source, strlen(source));
wcout<<gr<<L" ";
cout<<percentage<<endl;
}
ifs.close();
}
return 0;
}
However, it is giving the following output:
Hello\ABC\DEFa
abc\\def\\ghi 10
jkl\\mnopq\\rstq 20
aqq\\sdsds\\qc 30
I did not understand why that tiny 'a' appeared out of nowhere in the first line of output
I want the code to instead automatically process those double slashes, i.e. I want the output as:
Hello\ABC\DEF
abc\def\ghi 10
jkl\mnopq\rstq 20
aqq\sdsds\qc 30
It would be even best if I could instead write the entries in the .txt file without double slashes & they get automatically processed without checking for any escape sequences. However, since the issue as in point no. 1) above is there, so I am not sure if it is even possible
Even if add cout<<gr_temp<<endl; as the first line in the while loop, even that also outputs the string with double backward slashes.
What am I missing or doing wrong?
Update:
Also, when I add these key-value pairs to a std::map<LPCWSTR,int> m1 using the statement m1[gr] = percentage; at the end of each while loop, then with the print statement, it only shows one single element in the map.
My updated code is:
#include<iostream>
#include<fstream>
#include<windows.h>
#include<cstdlib>
#include<map>
using namespace std;
std::unordered_map<LPCWSTR, int> m1;
int main()
{
wchar_t test1[260];
const char* s = "Hello\\ABC\\DEF";
mbstowcs(test1, s, strlen(s));
wcout<<test1<<endl;
wchar_t gr[260];
string gr_temp;
int percentage;
ifstream ifs;
ifs.open("data.txt", ifstream::in);
if (ifs.is_open()) {
while (ifs >> gr_temp >> percentage){
const char* source = gr_temp.c_str();
mbstowcs(gr, source, strlen(source));
m1[gr] = percentage;
}
ifs.close();
}
for (auto i = m1.begin(); i != m1.end(); i++) {
wcout<< i->first << L" ";
cout<< i->second << endl;
}
return 0;
}
This code is only adding 1 element in the map & that is the most recent added element.
I edited the code to use unordered_map, but still the same issue.
I further tried to print the size() of the map. In both these cases, size of map m1 was displayed as 1.
Miles Budnek already stated your problems.
If you look at the documentation of your function (http://www.cplusplus.com/reference/cstdlib/mbstowcs/), you will see that the third parameter does not expect the number of bytes to translate to wchar_t, but much rather the maximum number of characters the buffer you are pointing to can hold.
It will stop once it finds a \0 (which just happens to be what strlen is also looking for).
So just replace the third parameter of your first mbstowcs call with 260 (or sizeof(test1)/sizeof(wchar_t) and you're good on that stray 'a'.
As has also already been stated, there are no 'escape parameters' while reading from a file.
These only exist in source code and represent ASCII codes you cannot type. (https://www.asciitable.com/)
\n for example represents the codesign for 'new line' 0x0A.
So escaping the backslashes in the file is unnecessary and can be skipped.
If you know that your input file will have 'double backslashes' and need to 'unescape' them, you could look at the std::string functions 'find' and 'replace'.
Find "\\\\" (two backslashes in a row) and replace with "\\".
In response to your updated question (which is basically another question):
The problem is the key you chose for the map.
Each map, unordered or not, requires unique keys and in your scenario, you keep using the same key.
LPCWSTR expands to 'Pointer to Wide Char String', so while you probably think you are using 'abc\def\ghi' as key, you are actually using &gr[0], which remains the same during all iterations.
As an additional result, once the program leaves the scope of gr, its content becomes invalid and accessing the map (which maintains the pointer but not the content), will access freed memory which tends to crash your program.
The solution as such is simple enough though: You need to use the content as key, instead of the pointer, for example by using a container object like std::wstring.

Sort .csv in multidimensional arrays

I'm trying to read specific values (i.e. values#coordinate XY) from a .csv file and struggle with a proper way to define multidimensional arrays within that .csv.
Here's an example of the form from my .csv file
NaN,NaN,1.23,2.34,9.99
1.23,NaN,2.34,3.45,NaN
NaN,NaN,1.23,2.34,9.99
1.23,NaN,2.34,3.45,NaN
1.23,NaN,2.34,3.45,NaN
NaN,NaN,1.23,2.34,9.99
1.23,NaN,2.34,3.45,NaN
NaN,NaN,1.23,2.34,9.99
1.23,NaN,2.34,3.45,NaN
1.23,NaN,2.34,3.45,NaN
NaN,NaN,1.23,2.34,9.99
1.23,NaN,2.34,3.45,NaN
NaN,NaN,1.23,2.34,9.99
1.23,NaN,2.34,3.45,NaN
1.23,NaN,2.34,3.45,NaN
...
Ok, in reality, this file becomes very large. You can interpret rows=latitudes and columns=longitudes and thus each block is an hourly measured coordinate map. The blocks usually have the size of row[361] column[720] and time periods can range up to 20 years (=24*365*20 blocks), just to give you an idea of the data size.
To structure this, I thought of scanning through the .csv and define each block as a vector t, which I can access by choosing the desired timestep t=0,1,2,3...
Then, within this block I would like to go to a specific line (i.e. latitude) and define it as a vector longitudeArray.
The outcome shall be a specified value from coordinate XY at time Z.
As you might guess, my coding experience is rather limited and this is why my actual question might be very simple: How can I arrange my vectors in order to be able to call any random value?
This is my code so far (sadly it is not much, cause I don't know how to continue...)
#include <fstream>
#include <iostream>
#include <iomanip>
#include <sstream>
#include <string>
#include <vector>
#include <algorithm>
using namespace std;
int main()
{
int longitude, latitude; //Coordinates used to specify desired value
int t; //Each array is associated to a specific time t=0,1,2,3... (corresponds to hourly measured data)
string value;
vector<string> t; //Vector of each block
vector<string> longitudeArray; //Line of array, i.e. latitude
ifstream file("swh.csv"); //Open file
if (!file.is_open()) //Check if file is opened, if not
print "File could..."
{
cout << "File could not open..." << endl;
return 1;
}
while (getline(file, latitude, latitude.empty())) //Scan .csv (vertically) and delimit every time a white line occurs
{
longitudeArray.clear();
stringstream ss(latitude);
while(getline(ss,value,',') //Breaks line into comma delimited fields //Specify line number (i.e. int latitude) here??
{
latitudeArray.push_back(value); //Adds each field to the 1D array //Horizontal vector, i.e. latitude
}
t.push_back(/*BLOCK*/) //Adds each block to a distinct vector t
}
cout << t(longitudeArray[5])[6] << endl; //Output: 5th element of longitudeArray in my 6th block
return 0;
}
If you have any hint, especially if there is a better way handling large .csv files, I'd be very grateful.
Ps: C++ is inevitable for this project...
Tüdelüü,
jtotheakob
As usual you should first think in terms of data and data usage. Here you have floating point values (that can be NaN) that should be accessible as a 3D thing along latitude, longitude and time.
If you can accept simple (integer) indexes, the standard ways in C++ would be raw arrays, std::array and std::vector. The rule of thumb then says: if the sizes are known at compile time arrays (or std::array if you want operation on global arrays) are fine, else go with vectors. And if unsure std:vector is your workhorse.
So you will probably end with a std::vector<std::vector<std::vector<double>>> data, that you would use as data[timeindex][latindex][longindex]. If everything is static you could use a double data[NTIMES][NLATS][NLONGS] that you would access more or less the same way. Beware if the array is large, most compilers will choke if you declare it inside a function (including main), but it could be a global inside one compilation unit (C-ish but still valid in C++).
So read the file line by line, feeding values in your container. If you use statically defined arrays just assign each new value in its position, if you use vectors, you can dynamically add new elements with push_back.
This is too far from your current code for me to show you more than trivial code.
The static (C-ish) version could contain:
#define NTIMES 24*365*20
#define NLATS 361
#define NLONGS 720
double data[NTIMES][NLATS][NLONGS];
...
int time, lat, long;
for(time=0; time<NTIMES; time++) {
for (lat=0; lat<NLATS; lat++) {
for (long=0; long<NLONGS; long++) {
std::cin >> data[time][lat][long];
for (;;) {
if (! std::cin) break;
char c = std::cin.peek();
if (std::isspace(c) || (c == ',')) std::cin.get();
else break;
}
if (! std::cin) break;
}
if (! std::cin) break;
}
if (! std::cin) break;
}
if (time != NTIMES) {
//Not enough values or read error
...
}
A more dynamic version using vectors could be:
int ntimes = 0;
const int nlats=361; // may be a non compile time values
const int nlongs=720; // dito
vector<vector<vector<double>>> data;
int lat, long;
for(;;) {
data.push_back(vector<vector<double>>);
for(lat=0; lat<nlats; lat++) {
data[ntimes].push_back(vector<double>(nlongs));
for(long=0; long<nlongs; long++) {
std::cin >> data[time][lat][long];
for (;;) {
if (! std::cin) break;
char c = std::cin.peek();
if (std::isspace(c) || (c == ',')) std::cin.get();
else break;
}
if (! std::cin) break;
}
if (! std::cin) break;
}
if (! std::cin) break;
if (lat!=nlats || long!=nlongs) {
//Not enough values or read error
...
}
ntimes += 1;
}
This code will successfully process NaN converting it the special not a number value, but it does not check the number of fields per line. To do that, read a line with std::getline and use a strstream to parse it.
Thanks, I tried to transfer both versions to my code, but I couldn't make it run.
Guess my poor coding skills aren't able to see what's obvious to everyone else. Can you name the additional libs I might require?
For std::isspace I do need #include <cctype>, anything else missing which is not mentioned in my code from above?
Can you also explain how if (std::isspace(c) || (c == ',')) std::cin.get(); works? From what I understand, it will check whether c (which is the input field?) is a whitespace, and if so, the right term becomes automatically "true" because of ||? What consequence results from that?
At last, if (! std::cin) break is used to stop the loop after we reached the specified array[time][lat][long]?
Anyhow, thanks for your response. I really appreciate it and I have now an idea how to define my loops.
Again thank you all for your ideas.
Unfortunately, I was not able to run the script... but my task changed slightly, thus the need to read very large arrays is not required anymore.
However, I've got an idea of how to structure such operations and most probably will transfer it to my new task.
You may close this topic now ;)
Cheers
jtothekaob

c++ if(cin>>input) doesn't work properly in while loop

I'm new to c++ and I'm trying to solve the exercise 6 from chapter 4 out of Bjarne Stroustrups book "Programming Principles and Practise Using C++ and don't understand why my code doesn't work.
The exercise:
Make a vector holding the ten string values "zero", "one", ...,
"nine". Use that in a program that converts a digit to its
corresponding spelled-out value: e.g., the input 7 gives the output
seven. Have the same program, using the same input loop, convert
spelled-out numbers into their digit form; e.g., the input seven gives
the output 7.
My loop only executes one time for a string and one time for an int, the loop seems to continue but it doesn't matter which input I'm giving, it doesn't do what it's supposed to do.
One time it worked for multiple int inputs, but only every second time. It's really weird and I don't know how to solve this in a different way.
It would be awesome if someone could help me out.
(I'm also not a native speaker, so sorry, if there are some mistakes)
The library in this code is a library provided with the book, to make the beginning easier for us noobies I guess.
#include "std_lib_facilities.h"
int main()
{
vector<string>s = {"zero","one","two","three","four","five","six","seven","eight","nine"};
string input_string;
int input_int;
while(true)
{
if(cin>>input_string)
{
for(int i = 0; i<s.size(); i++)
{
if(input_string == s[i])
{
cout<<input_string<<" = "<<i<<"\n";
}
}
}
if(cin>>input_int)
{
cout<<input_int<<" = "<<s[input_int]<<"\n";
}
}
return 0;
}
When you (successfully) read input from std::cin, the input is extracted from the buffer. The input in the buffer is removed and can not be read again.
And when you first read as a string, that will read any possible integer input as a string as well.
There are two ways of solving this:
Attempt to read as int first. And if that fails clear the errors and read as a string.
Read as a string, and try to convert to an int. If the conversion fails you have a string.
if(cin >> input) doesn't work properly in while loop?
A possible implementation of the input of your program would look something like:
std::string sentinel = "|";
std::string input;
// read whole line, then check if exit command
while (getline(std::cin, input) && input != sentinel)
{
// use string stream to check whether input digit or string
std::stringstream ss(input);
// if string, convert to digit
// else if digit, convert to string
// else clause containing a check for invalid input
}
To discriminate between int and string value you could use peek(), for example.
Preferably the last two actions of conversion (between int and string) are done by separate functions.
Assuming the inclusion of the headers:
#include <iostream>
#include <sstream>

Big csv file c++ parsing performance

I have a big csv file (25 mb) that represents a symmetric graph (about 18kX18k). While parsing it into an array of vectors, i have analyzed the code (with VS2012 ANALYZER) and it shows that the problem with the parsing efficiency (about 19 seconds total) occurs while reading each character (getline::basic_string::operator+=) as shown in the picture below:
This leaves me frustrated, as with Java simple buffered line file reading and tokenizer i achieve it with less than half a second.
My code uses only STL library:
int allColumns = initFirstRow(file,secondRow);
// secondRow has initialized with one value
int column = 1; // dont forget, first column is 0
VertexSet* rows = new VertexSet[allColumns];
rows[1] = secondRow;
string vertexString;
long double vertexDouble;
for (int row = 1; row < allColumns; row ++){
// dont do the last row
for (; column < allColumns; column++){
//dont do the last column
getline(file,vertexString,',');
vertexDouble = stold(vertexString);
if (vertexDouble > _TH){
rows[row].add(column);
}
}
// do the last in the column
getline(file,vertexString);
vertexDouble = stold(vertexString);
if (vertexDouble > _TH){
rows[row].add(++column);
}
column = 0;
}
initLastRow(file,rows[allColumns-1],allColumns);
init first and last row basically does the same thing as the loop above, but initFirstRow also counts the number of columns.
VertexSet is basically a vector of indexes (int). Each vertex read (separated by ',') goes no more than 7 characters length long (values are between -1 and 1).
At 25 megabytes, I'm going to guess that your file is machine generated. As such, you (probably) don't need to worry about things like verifying the format (e.g., that every comma is in place).
Given the shape of the file (i.e., each line is quite long) you probably won't impose a lot of overhead by putting each line into a stringstream to parse out the numbers.
Based on those two facts, I'd at least consider writing a ctype facet that treats commas as whitespace, then imbuing the stringstream with a locale using that facet to make it easy to parse out the numbers. Overall code length would be a little greater, but each part of the code would end up pretty simple:
#include <iostream>
#include <fstream>
#include <vector>
#include <string>
#include <time.h>
#include <stdlib.h>
#include <locale>
#include <sstream>
#include <algorithm>
#include <iterator>
class my_ctype : public std::ctype<char> {
std::vector<mask> my_table;
public:
my_ctype(size_t refs=0):
my_table(table_size),
std::ctype<char>(my_table.data(), false, refs)
{
std::copy_n(classic_table(), table_size, my_table.data());
my_table[',']=(mask)space;
}
};
template <class T>
class converter {
std::stringstream buffer;
my_ctype *m;
std::locale l;
public:
converter() : m(new my_ctype), l(std::locale::classic(), m) { buffer.imbue(l); }
std::vector<T> operator()(std::string const &in) {
buffer.clear();
buffer<<in;
return std::vector<T> {std::istream_iterator<T>(buffer),
std::istream_iterator<T>()};
}
};
int main() {
std::ifstream in("somefile.csv");
std::vector<std::vector<double>> numbers;
std::string line;
converter<double> cvt;
clock_t start=clock();
while (std::getline(in, line))
numbers.push_back(cvt(line));
clock_t stop=clock();
std::cout<<double(stop-start)/CLOCKS_PER_SEC << " seconds\n";
}
To test this, I generated an 1.8K x 1.8K CSV file of pseudo-random doubles like this:
#include <iostream>
#include <stdlib.h>
int main() {
for (int i=0; i<1800; i++) {
for (int j=0; j<1800; j++)
std::cout<<rand()/double(RAND_MAX)<<",";
std::cout << "\n";
}
}
This produced a file around 27 megabytes. After compiling the reading/parsing code with gcc (g++ -O2 trash9.cpp), a quick test on my laptop showed it running in about 0.18 to 0.19 seconds. It never seems to use (even close to) all of one CPU core, indicating that it's I/O bound, so on a desktop/server machine (with a faster hard drive) I'd expect it to run faster still.
The inefficiency here is in Microsoft's implementation of std::getline, which is being used in two places in the code. The key problems with it are:
It reads from the stream one character at a time
It appends to the string one character at a time
The profile in the original post shows that the second of these problems is the biggest issue in this case.
I wrote more about the inefficiency of std::getline here.
GNU's implementation of std::getline, i.e. the version in libstdc++, is much better.
Sadly, if you want your program to be fast and you build it with Visual C++ you'll have to use lower level functions than std::getline.
The debug Runtime Library in VS is very slow because it does a lot of debug checks (for out of bound accesses and things like that) and calls lots of very small functions that are not inlined when you compile in Debug.
Running your program in release should remove all these overheads.
My bet on the next bottleneck is string allocation.
I would try read bigger chunks of memory at once and then parse it all.
Like.. read full line. and then parse this line using pointers and specialized functions.
Hmm good answer here. Took me a while but I had the same problem. After this fix my write and process time went from 38 sec to 6 sec.
Here's what I did.
First get data using boost mmap. Then you can use boost thread to make processing faster on the const char* that boost mmap returns. Something like this: (the multithreading is different depending on your implementation so I excluded that part)
#include <boost/iostreams/device/mapped_file.hpp>
#include <boost/thread/thread.hpp>
#include <boost/lockfree/queue.hpp>
foo(string path)
{
boost::iostreams::mapped_file mmap(path,boost::iostreams::mapped_file::readonly);
auto chars = mmap.const_data(); // set data to char array
auto eofile = chars + mmap.size(); // used to detect end of file
string next = ""; // used to read in chars
vector<double> data; // store the data
for (; chars && chars != eofile; chars++) {
if (chars[0] == ',' || chars[0] == '\n') { // end of value
data.push_back(atof(next.c_str())); // add value
next = ""; // clear
}
else
next += chars[0]; // add to read string
}
}

vector subscript out of range C++ (substring)

So I'm having this problem with substrings and converting them into integers. This will probably be an easy-fix but I'm not managing to find the answer.
So I receive this string "12-12-2012" and i want to split it, convert into integers and call the modifications methods like this:
string d = (data.substr(0,data.find("-")));
setDia(atoi(d.c_str()));
But it gives me the error mentioned in the title when I try to comvert into an integer.
EDIT:
Turns out that the string doesn't actually contain a '-' but this is really confusing since the string in the parameter results from this : to_char(s.diaInicio,'dd-mm-yyyy')
More information: I used the debugger and it's making the split correctly since the value that atoi receives is 12 (the first split). But I don't know why the VS can't convert into an integer even though the string passed is "12".
This code is not save in the sense that it fails when data does not contain a -.
Try this:
std::size_t p = data.find("-");
if(p == std::string::npos) {
// ERROR no - in string!
}
else {
std::string d = data.substr(0,p);
setDia(atoi(d.c_str()));
}
Please duplicate the problem with a very simple program. If what you say is correct, then the following program should also fail (taken from Danvil's example, and without calling the unknown (to us) setDia() function):
#include <string>
#include <cstdlib>
using namespace std;
int main()
{
string data = "12-12-2012";
std::size_t p = data.find("-");
if(p == std::string::npos) {
// ERROR no - in string!
}
else {
std::string d = data.substr(0,p);
atoi(d.c_str());
}
}