C++ Determining empty cells from reading in data - c++

I'm reading in data from a csv file that has some columns ending before others, i.e.:
0.01 0.02 0.01
0.02 0.02
And I'm trying to figure out how to catch these empty locations and what to do with them. My current code looks like this:
#include <iostream>
#include <fstream>
#include <sstream>
int main(){
//Code that reads in the data, determines number of rows & columns
//Set up array the size of all the cells (including empty):
double *ary = new double[cols*rows]; //Array of pointers
double var;
std::string s;
int i = 0, j = 0;
while(getline(data,line))
{
std::istringstream iss(line); //Each line in a string
while(iss >> var) //Send cell data to placeholder
{
ary[i*cols+j] = var;
j+=1;
}
i+=1;
}
How can I determine if the cell is empty? I want to convert these to "NaN" somehow. Thank you!

You can do something like follows.
Get the inputs, line by line and using (std::getline(sstr, word, ' ')) you can set the deliminator to ' ' and the rest is checking weather the scanned word is empty or not.
If it's empty, we will set it to NaN(only once).
Input:
0.01 0.02 0.01
0.02 0.02
0.04 0.08
Here is the output:
#include <iostream>
#include <fstream>
#include <sstream>
#include <vector>
int main()
{
std::fstream file("myfile.txt");
std::vector<std::string> vec;
if(file.is_open())
{
std::string line;
bool Skip = true;
while(std::getline(file, line))
{
std::stringstream sstr(line);
std::string word;
while (std::getline(sstr, word, ' '))
{
if(!word.empty())
vec.emplace_back(word);
else if(word.empty() && Skip)
{
vec.emplace_back("NaN");
Skip = false;
}
}
Skip = true;
}
file.close();
}
for(size_t i = 0; i < vec.size(); ++i)
{
std::cout << vec[i] << " ";
if((i+1)%3 ==0) std::cout << std::endl;
}
return 0;
}

Related

How to read a CSV dataset in which each row has a distinct length. C++

I'm just new to C++ and am studying how to read data from csv file.
I want to read the following csv data into vector. Each row is a vector. The file name is path.csv:
0
0 1
0 2 4
0 3 6 7
I use the following function:
vector<vector<int>> read_multi_int(string path) {
vector<vector<int>> user_vec;
ifstream fp(path);
string line;
getline(fp, line);
while (getline(fp, line)) {
vector<int> data_line;
string number;
istringstream readstr(line);
while (getline(readstr, number, ',')) {
//getline(readstr, number, ',');
data_line.push_back(atoi(number.c_str()));
}
user_vec.push_back(data_line);
}
return user_vec;
}
vector<vector<int>> path = read_multi_int("C:/Users/data/paths.csv");
Print funtion:
template <typename T>
void print_multi(T u)
{
for (int i = 0; i < u.size(); ++i) {
if (u[i].size() > 1) {
for (int j = 0; j < u[i].size(); ++j) {
//printf("%d ", u[i][j]);
cout << u[i][j] << " ";
}
printf("\n");
}
}
printf("\n");
}
Then I get
0 0 0
0 1 0
0 2 4
0 3 6 7
Zeros are added at the end of the rows. Is possible to just read the data from the csv file without adding those extra zeros? Thanks!
Based on the output you are seeing and the code with ',' commas, I beleive that your actual input data really looks like this:
A,B,C,D
0,,,
0,1,,
0,2,4,
0,3,6,7
So the main change is to replace atoi with strtol, as atoi will always return 0 on a failure to parse a number, but with strtol we can check if the parse succeeded.
That means that the solution is as follows:
vector<vector<int>> read_multi_int(string path) {
vector<vector<int>> user_vec;
ifstream fp(path);
string line;
getline(fp, line);
while (getline(fp, line)) {
vector<int> data_line;
string number;
istringstream readstr(line);
while (getline(readstr, number, ',')) {
char* temp;
char numberA[30];
int numberI = strtol(number.c_str(), &temp, 10);
if (temp == number || *temp != '\0' ||
((numberI == LONG_MIN || numberI == LONG_MAX) && errno == ERANGE))
{
// Could not convert
}else{
data_line.emplace_back(numberI);
}
}
user_vec.emplace_back(data_line);
}
return user_vec;
}
Then to display your results:
vector<vector<int>> path = read_multi_int("C:/Users/data/paths.csv");
for (const auto& row : path)
{
for (const auto& s : row) std::cout << s << ' ';
std::cout << std::endl;
}
Give the expected output:
0
0 1
0 2 4
0 3 6 7
Already very good, but there is one obvious error and another error in your print function. Please see, how I output the values, with simple range based for loops.
If your source file does not contain a comma (','), but a different delimiter, then you need to call std::getline with this different delimiter, in your case a blank (' '). Please read here about std::getline.
If we then use the following input
Header
0
0 1
0 2 4
0 3 6 7
with the corrected program.
#include <vector>
#include <fstream>
#include <iostream>
#include <string>
#include <sstream>
using namespace std;
vector<vector<int>> read_multi_int(string path) {
vector<vector<int>> user_vec;
ifstream fp(path);
string line;
getline(fp, line);
while (getline(fp, line)) {
vector<int> data_line;
string number;
istringstream readstr(line);
while (getline(readstr, number, ' ')) {
//getline(readstr, number, ',');
data_line.push_back(atoi(number.c_str()));
}
user_vec.push_back(data_line);
}
return user_vec;
}
int main() {
vector<vector<int>> path = read_multi_int("C:/Users/data/paths.csv");
for (vector<int>& v : path) {
for (int i : v) std::cout << i << ' ';
std::cout << '\n';
}
}
then we receive this as output:
0
0 1
0 2 4
0 3 6 7
Which is correct, but unfortunately different from your shown output.
So, your output routine, or some other code, may also have some problem.
Besides. If there is no comma, then you can take advantage of formatted input functions using the extraction operator >>. This will read your input until the next space and convert it automatically to a number.
Additionally, it is strongly recommended, to initialize all variables during definition. You should do this always.
Modifying your code to use formatted input, initialization, and, maybe, better variable names, then it could look like the below.
#include <vector>
#include <fstream>
#include <iostream>
#include <string>
#include <sstream>
using namespace std;
vector<vector<int>> multipleLinesWithIntegers(const string& path) {
// Here we will store the resulting 2d vector
vector<vector<int>> result{};
// Open the file
ifstream fp{ path };
// Read header line
string line{};
getline(fp, line);
// Now read all lines with numbers in the file
while (getline(fp, line)) {
// Here we will store all numbers of one line
vector<int> numbers{};
// Put the line into an istringstream for easier extraction
istringstream sline{ line };
int number{};
while (sline >> number) {
numbers.push_back(number);
}
result.push_back(numbers);
}
return result;
}
int main() {
vector<vector<int>> values = multipleLinesWithIntegers("C:/Users/data/paths.csv");
for (const vector<int>& v : values) {
for (const int i : v) std::cout << i << ' ';
std::cout << '\n';
}
}
And, the next step would be to use a some more advanced style:
#include <vector>
#include <fstream>
#include <iostream>
#include <string>
#include <sstream>
#include <iterator>
auto multipleLinesWithIntegers(const std::string& path) {
// Here we will store the resulting 2d vector
std::vector<std::vector<int>> result{};
// Open the file and check, if it could be opened
if (std::ifstream fp{ path }; fp) {
// Read header line
if (std::string line{}; getline(fp, line)) {
// Now read all lines with numbers in the file
while (getline(fp, line)) {
// Put the line into an istringstream for easier extraction
std::istringstream sline{ line };
// Get the numbers and add them to the result
result.emplace_back(std::vector(std::istream_iterator<int>(sline), {}));
}
}
else std::cerr << "\n\nError: Could not read header line '" << line << "'\n\n";
}
else std::cerr << "\n\nError: Could not open file '" << path << "'\n\n'";
return result;
}
int main() {
const std::vector<std::vector<int>> values{ multipleLinesWithIntegers("C:/Users/data/paths.csv") };
for (const std::vector<int>& v : values) {
for (const int i : v) std::cout << i << ' ';
std::cout << '\n';
}
}
Edit
You have shown your output routine. That should be changed to:
void printMulti(const std::vector<std::vector<int>>& u)
{
for (int i = 0; i < u.size(); ++i) {
if (u[i].size() > 0) {
for (int j = 0; j < u[i].size(); ++j) {
std::cout << u[i][j] << ' ';
}
std::cout << '\n';
}
}
std::cout << '\n';
}

printing only those strings that include 2-digit number

printing only those strings that include 2-digit number
the text inside "myFile.txt is
{the pink double jump 34
the rising frog 2
doing the code 11
nice 4 }
"
#include <iostream>
#include <sstream>
#include <string>
#include <fstream>
#include <algorithm>
int main()
{
std::string path = "myFile.txt";
std::ifstream de;
de.open(path);
if (!de.is_open()) {
std::cout << "nah";
}
else {
std::cout << "file is opened";
std::string str;
while (!de.eof()) {
std::getline(de, str);
for (int i = 0; i < str.length(); i++) {
int aa = 10;
if (str[i] > aa) {
str = "0";
}
}
std::cout << str << "\n\n";
}
}
}
what am I doing wrong? how can I check if there is any 2-digit number inside the string?
You could use stoi as follows:
#include <fstream>
#include <iostream>
#include <string>
#include <vector>
int main()
{
std::ifstream inp("test.txt");
std::string word;
std::vector<std::string> listOfTwoDigitStrings;
while(inp>>std::ws>>word) {
if(word.length() == 2) {
int num = std::stoi(word);
if(num >= 10 && num <= 99) {
listOfTwoDigitStrings.push_back(word);
}
}
}
for(const auto& word: listOfTwoDigitStrings) {
std::cout<<word<<' ';
}
std::cout<<'\n';
return 0;
}
which has the output
34 11
when test.txt contains
{the pink double jump 34
the rising frog 2
doing the code 11
nice 4 } "
P.S.: As you're looking for strings, just read in strings rather than lines and then reading off strings from that line. Reading off strings just makes it simpler since it boils down to just narrowing down to 2-digit strings and then just verifying whether they are numbers or not. Also, as mentioned in the comments, refrain from !file.eof() code.

Need to separate numbers from a string on a line, separated by ';', (25;16;67;13) in c++

We have a string (25;16;67;13;14;.......)
We need to print out the numbers separately. The last number does not have a semicolon behind it.
Output should be something like that:
25
16
67
13
14
......
Assuming we are using str.find, str.substr and size_t variables current_pos, prev_pos, what will be the condition of the while loop we are using to browse the line, so that it prints out all the numbers, not just the first one?
You can make use of std::istringstream:
#include <sstream>
#include <iostream>
int main() {
std::string text("25;16;67;13;14");
std::istringstream ss(text);
std::string token;
while(std::getline(ss, token, ';'))
{
std::cout << token << '\n';
}
return 0;
}
Running the above code online results in the following output:
25
16
67
13
14
If you need only to print the numbers in the string (rather than represent them in data structures) the solution is quite easy. Simply read the entire string, then print it character by character. If the character is a semicolon, print a new line instead.
#include <iostream>
#include <string>
using namespace std;
int main(){
string input;
cin >> input;
for(int i = 0; i < input.length(); i++){
if(input.at(i) == ';') cout << endl;
else cout << input.at(i);
}
}
using namespace std;
int main() {
string a{ "1232,12312;21414:231;23231;22" };
for (int i = 0; i < a.size(); i++) {
if (ispunct(a[i])) {
a[i] = ' ';
}
}
stringstream line(a);
string b;
while (getline(line, b, ' ')) {
cout << b << endl;
}
}
//any punctuation ",/;:<>="
I will give you an exact answer to your question with an example and an alternative solution with an one-liner.
Please see
#include <iostream>
#include <string>
#include <iterator>
#include <algorithm>
#include <regex>
const std::regex re(";");
int main() {
std::string test("25;16;67;13;14;15");
// Solution 1: as requested
{
size_t current_pos{};
size_t prev_pos{};
// Search for the next semicolon
while ((current_pos = test.find(';', prev_pos)) != std::string::npos) {
// Print the resulting value
std::cout << test.substr(prev_pos, current_pos - prev_pos) << "\n";
// Update search positions
prev_pos = current_pos + 1;
}
// Since there is no ; at the end, we print the last number manually
std::cout << test.substr(prev_pos) << "\n\n";
}
// Solution 2. All in one statement. Just to show to you what can be done with C++
{
std::copy(std::sregex_token_iterator(test.begin(), test.end(), re, -1), {}, std::ostream_iterator<std::string>(std::cout, "\n"));
}
return 0;
}

Comparing elements of text file between each other

I am trying to compare blocks of three numbers with each other to make a new output file with only the ones that meet that: first digit of the block is less than the second and less than the third, the second digit in the block has to be greater than the first but less than the third.
This is my code for the input file:
int main()
{
ofstream outfile ("test.txt");
outfile << "123 456 789 123 123 432 \n 123 243 " << endl;
I want to split this in blocks of three like"123", "456" and so on to be able to only write only the ones that meet the requirement in the new output file. I decided to conver the whole file into an integer vector to be able to compare them.
char digit;
ifstream file("test.txt");
vector<int> digits;
while(file >> digit) {
digits.push_back(digit - '0');
}
and I suppose that the method that compares them would look something like this:
bool IsValid(vector<int> digits){
for(int i=0; i<digits.size(); i++){
if(digits[0] < digits[1] & digits[0] < digits[2] & digits[1]<digits[2])
return true;
else{
return false;
}
}
}
However this would just compare the first block, would you do it differently? or should I keep doing the vector idea
You can do in this way. The "get" read a single char and when there are 3 digits the function IsValid is called.
#include <fstream>
#include <string>
#include <vector>
using namespace std;
bool IsValid(vector<int> digits)
{
if(digits[0] < digits[1] & digits[0] < digits[2] & digits[1]<digits[2])
return true;
else
return false;
}
int main()
{
ifstream in("test.txt");
ofstream out("output.txt");
char tmp;
vector<int> digits;
while(in.get(tmp))
{
if(tmp!=' ' and tmp!='\n')
{
digits.push_back(tmp-'0');
if(digits.size()==3)
{
if(IsValid(digits))
out<<digits[0]<<digits[1]<<digits[2]<<endl;
digits.clear();
}
}
}
out.close();
in.close();
}
if you task is : first digit of the block is less than the second and less than the third, the second digit in the block has to be greater than the first but less than the third - you num in string type - sorted - use it )) if data is 3 digits and space separeted )))
std::stringstream ss{line}; - for example like fstream )))
#include <iostream>
#include <vector>
#include <iterator>
#include <string>
#include <sstream>
#include <algorithm>
int main() {
std::string line{"123 456 789 123 123 432 123 243 "};
std::cout << line << std::endl;
std::string out_line;
std::stringstream ss{line};
std::string tmp_str;
while(ss >> tmp_str) {
if (std::is_sorted(std::begin(tmp_str), std::end(tmp_str))) {
out_line += tmp_str + " ";
}
}
std::cout << out_line << std::endl;
return 0;
}

Converting string to floats; can't clear stringstream correctly?

Disclaimer: I must use c++ 98
As part of a class assignment, I have been tasked to convert space-delimited strings into floats (then calculate a span with those floats, but that is irrelevant to my problem). The strings are coming from text files. If there are any discrepancies with the float, I am supposed to ignore it and consider it corrupted. For example, a text file could consist of a line that looks like this:
34.6 24.2 18.a 54.3 20.0 15.6
In this case, 18.a would simply be considered corrupt and no further manipulation has to be done to it.
Now, I am having a problem clearing my stringstream of corrupt data. For reference, here is my code:
#include <vector>
#include <limits>
#include <string>
#include <sstream>
#include <fstream>
#include <iostream>
using namespace std;
int main(int argc, char* argv[]) {
//Open file
ifstream infile("dataFile");
//Get all file input into a single string
string line;
string buffer;
while (getline(infile, buffer)) {
line += buffer + " ";
}
infile.close();
//Populate vector
float temp;
//I have tried to clear the stream with `data >> dummy` for
//both string and single char types below, but `data >> string`
//always clears too much, and `data >> char` doesn't seem to clear
//correctly either
//string dummy;
//char dummy;
vector<float> temps;
istringstream data(line);
while (data) {
//values between -100 and 100 are also considered corrupt
if (data >> temp && (temp <= 100 && temp >= -100)) {
temps.push_back(temp);
}
else if (!data.eof()) {
data.clear();
//trying to ignore all characters until I reach a space
//but that doesn't work correctly either
data.ignore(numeric_limits<streamsize>::max(), ' ');
//data >> dummy;
//cout << "Dummy: " << dummy << endl;
temps.push_back(-101.0);
}
}
//display resulting vector values
for(int i=0; i<temps.size(); ++i) {
cout << temps[i] << " ";
}
cout << endl;
}
My issue lies within the while (data) loop, specifically, inside the else if (!data.eof()) block. When data >> temp (type float) fails, the else if block runs. I clear the consequential failbit and attempt to ignore the remaining characters until the next space-delimiter comes up. However, a text file with a line like such:
a *a -100.1 100.1 a 10.a a 13-6s 12abc -12.a
produces problems. 13 and -6 are both processed as valid floats. I want to ignore the entire chunk of 13-6s, because these values are intended to be space-delimited.
What is the correct way to deal with this istringstream issue, where the characters are not being ignored the way I want?
I have been told by my professor that I can accomplish this with very basic STL techniques. He explicitly recommended to use stringstream as a way to parse floats. Is he in the wrong here?
Please comment for further clarity, if needed; I've been at this for quite some time now and would much appreciate some help.
Thank you!
This should do what you need.
#include <iostream>
#include <string>
#include <sstream>
int main() {
std::string temp;
// cin will write to temp with each space delimited entry
while (std::cin >> temp) {
std::stringstream s(temp);
float f;
// the first case checks if the actual write the float succeeds
// the second case checks if the entire stringstream has been read
if (!(s >> f) || !s.eof()) {
std::cout << temp << " failed!" << std::endl;
}
else {
std::cout << f << std::endl;
}
}
}
Apologies for not being able to answer your stringstream question but this solution should remove any necessity for that.
Note that input of 34.6 24.2 18.a 54.3 20.0 15.6 returns an output of:
34.6
24.2
18.a failed!
54.3
20
15.6
Edit: I added a case to the if statement to handle the stranger cases (i.e. 13-6s). It's a neat solution I found here.
Edit 2: I annotated some of the more complicated parts.
Try the following approach as it is shown in the demonstrative program.
#include <iostream>
#include <string>
#include <sstream>
#include <cstdlib>
#include <vector>
int main()
{
std::string line( "a *a -100.1 100.1 -100 a 10.a a 13-6s 100 12abc -12.a" );
std::istringstream is( line );
std::vector<float> values;
std::string item;
while ( is >> item )
{
const char *s = item.c_str();
char *tail;
float value = std::strtof( s, &tail );
if ( *tail == '\0' && -100.0f <= value && value <= 100.0f )
{
values.push_back( value );
}
}
for ( float value : values ) std::cout << value << ' ';
std::cout << std::endl;
return 0;
}
The program output is
-100 100
If to use this string
std::string line( "34.6 24.2 18.a 54.3 20.0 15.6" );
then the program output will be
34.6 24.2 54.3 20 15.6
Another approach is the following.
#include <iostream>
#include <string>
#include <sstream>
#include <limits>
#include <vector>
int main()
{
std::string line( "a *a -100.1 100.1 -100 a 10.a a 13-6s 100 12abc -12.a" );
// std::string line( "34.6 24.2 18.a 54.3 20.0 15.6" );
std::istringstream is( line );
std::vector<float> values;
while ( !is.eof() )
{
float value;
int c;
if ( not ( is >> value ) || ( ( c = is.get() ) != ' ' && c != std::char_traits<char>::eof() ) )
{
is.clear();
is.ignore( std::numeric_limits<std::streamsize>::max(), ' ' );
}
else if ( -100.0f <= value && value <= 100.0f )
{
values.push_back( value );
}
}
for ( float value : values ) std::cout << value << ' ';
std::cout << std::endl;
return 0;
}
The output will be the same as shown above.