C++ Read matrices from file with multiple delimiters [duplicate] - c++

This question already has answers here:
Parse (split) a string in C++ using string delimiter (standard C++)
(33 answers)
How do I iterate over the words of a string?
(84 answers)
Closed 6 years ago.
So I am given a file with ten matrices, and I would like to read from file these matrices and save them into vectors/arrays, where each matrix is stored into either a vector or an array. However, the format of these matrices makes it hard for me to read the data(I'm not good with reading from input file).
the file has the following format. Elements of each matrix are separated by "," . Each row is separated by ";", and each matrix is separated by "|". For example three 2 by 2 matrices are as follows.
1,2;3,4|0,1;1,0|5,3;3,1|
And I just want to save matrices into three different vectors, but I am not sure how to do this.
I tried
while(getline(inFile,line)){
stringstream linestream(line);
string value;
while(getline(linestream, value, ','){
//save into vector
}
}
But this is obviously very crude, and only seperates data by comma. Is there a way to separate the data with multiple delimiters?
Thank you!

string line;
while(getline(infile, line, '|'))
{
stringstream rowstream(line);
string row;
while(getline(rowstream, row, ';'))
{
stringstream elementstream(row);
string element;
while(getline(elementstream, element, ','))
{
cout << element << endl;
}
}
}
Using above code you can build the logic to store individual element as you like.

I use this own function to split a string to a vector of strings :
/**
* \brief Split a string in substrings
* \param sep Symbol separating the parts
* \param str String to be splitted
* \return Vector containing the splitted parts
* \pre The separator can not be 0
* \details Example :
* \code
* std::string str = "abc.def.ghi..jkl.";
* std::vector<std::string> split_str = split('.', str); // the vector is ["abc", "def", "ghi", "", "jkl", ""]
* \endcode
*/
std::vector<std::string> split(char sep, const std::string& str);
std::vector<std::string> split(char sep, const std::string& str)
{
assert(sep != 0 && "PRE: the separator is null");
std::vector<std::string> s;
unsigned long int i = 0;
for(unsigned long int j = 0; j < str.length(); ++j)
{
if(str[j] == sep)
{
s.push_back(str.substr(i, j - i));
i = j + 1;
}
}
s.push_back(str.substr(i, str.size() - i));
return s;
}
Then, expecting you have a class Matrix, you can do something like :
std::string matrices_str;
std::ifstream matrix_file(matrix_file_name.c_str());
matrix_file >> matrices_str;
const std::vector<std::string> matrices = split('|', matrices_str);
std::vector<Matrix<double> > M(matrices.size());
for(unsigned long int i = 0; i < matrices.size(); ++i)
{
const std::string& matrix = matrices[i];
const std::vector<std::string> rows = split(';', matrix);
for(unsigned long int j = 0; j < rows.size(); ++j)
{
const std::string& row = matrix[i];
const std::vector<std::string> elements = split(',', row);
for(unsigned long int k = 0; k < elements.size(); ++k)
{
const std::string& element = elements[k];
if(j == 0 && k == 0)
M[i].resize(rows.size(), elements.size());
std::istringstream iss(element);
iss >> M[i](j,k);
}
}
}
Or, compressed code :
std::string matrices_str;
std::ifstream matrix_file(matrix_file_name.c_str());
matrix_file >> matrices_str;
const std::vector<std::string> matrices = split('|', matrices_str);
std::vector<Matrix<double> > M(matrices.size());
for(unsigned long int i = 0; i < matrices.size(); ++i)
{
const std::vector<std::string> rows = split(';', matrices[i]);
for(unsigned long int j = 0; j < rows.size(); ++j)
{
const std::vector<std::string> elements = split(',', matrix[i]);
for(unsigned long int k = 0; k < elements.size(); ++k)
{
if(j == 0 && k == 0)
M[i].resize(rows.size(), elements[k].size());
std::istringstream iss(elements[k]);
iss >> M[i](j,k);
}
}
}

You can use finite state machine concept. You need define states for each step.
Read one char and then decide what it is (number or delimiter).
Here is concept how you could do it.
For more reading check this on internet. text parsing, finite state machine, lexical analyzer, formal grammar
enum State
{
DECIMAL_NUMBER,
COMMA_D,
SEMICOLON_D,
PIPE_D,
ERROR_STATE,
};
char GetChar()
{
// implement proper reading from file
static char* input = "1,2;3,4|0,1;1,0|5,3;3,1|";
static int index = 0;
return input[index++];
}
State GetState(char c)
{
if ( isdigit(c) )
{
return DECIMAL_NUMBER;
}
else if ( c == ',' )
{
return COMMA_D;
}
else if ( c == ';' )
{
return SEMICOLON_D;
}
else if ( c == '|' )
{
return PIPE_D;
}
return ERROR_STATE;
}
int main(char* argv[], int argc)
{
char c;
while ( c = GetChar() )
{
State s = GetState(c);
switch ( c )
{
case DECIMAL_NUMBER:
// read numbers
break;
case COMMA_D:
// append into row
break;
case SEMICOLON_D:
// next row
break;
case PIPE_D:
// finish one matrix
break;
case ERROR_STATE:
// syntax error
break;
default:
break;
}
}
return 0;
}

The example you have actually maps to a very simple byte machine.
Start with a zeroed matrix and something that keeps track where in the matrix you're writing. Read one character at a time. If the character is a digit, multiply the current number in the matrix by 10 and add the digit to it, if the character is a comma, advance to the next number in the row, if the character is a semi-colon go to the next row, if the character is a pipe, start a new matrix.
You might not want to do it exactly this way if the numbers are floating point. I'd save them in a buffer and use a standard method of parsing floating point numbers. But other than that you don't really need to keep much complex state or build a large parser. You might want to add error handling at a later stage, but even there the error handling is pretty trivial and only depends on the current character you're scanning.

Related

How do you find first character NOT in string array without using classes or libraries when comparing two strings?

I am trying to compare two string arrays, but am not allowed to use classes or libraries to assist.
The issue I have with this is that if one string is more than one character, then it compares the whole string to again, even though it already checked the first one.
char *find_first_not_in_the_set(char *str, const char *set)
{
for(int i = 0; *(str + i) != '\0'; i++)
{
for(int j = 0; *(set + j) != '\0'; j++)
{
if(str[i] != set[j])
{
return &(str[i]);
}
}
}
return NULL;
}
If "Hello World!" is the first string and the second string is "He". The program should return l, but it returns H because it still checks the first character.
I'd rather use this:
bool matrix[256] = {0};
int length = strlen(set);
// remember all characters we have in the 'set'
for( int i=0; i<length; i++) matrix[set[i] & 0xFF] = 1;
length = strlen(str);
// now check the characters from 'str'
for( int i=0; i<length; i++) {
if( ! matrix[str[i] & 0xFF] ) {
printf( "Found: %c", str[i] );
break;
}
}
For every character in str, your code checks if it is present on each and every position in set.Thus, when i=0 'H' is compared with set[0] i.e. 'H' for j=0.But when j=1,'H' is compared with 'e' and this causes the function to return str[0] because i is still 0.
Your problem will be solved if you use just one loop and check str[i]!=set[i].

Why split string is done in many questions in hackerrank?

It is Chief Hopper Greedy algorithm question .Here it is
https://www.hackerrank.com/challenges/chief-hopper/problem
I want to ask why splitting of string is done even though we are not giving any string as input and after that they used stoi function to convert that in int ??
string arr_temp_temp;
getline(cin, arr_temp_temp);
vector<string> arr_temp = split_string(arr_temp_temp);
vector<int> arr(n);
for (int i = 0; i < n; i++) {
int arr_item = stoi(arr_temp[i]);
arr[i] = arr_item;
}
vector<string> split_string(string input_string) {
string::iterator new_end = unique(input_string.begin(), input_string.end(), [] (const char &x, const char &y) {
return x == y and x == ' ';
});
input_string.erase(new_end, input_string.end());
while (input_string[input_string.length() - 1] == ' ') {
input_string.pop_back();
}
vector<string> splits;
char delimiter = ' ';
size_t i = 0;
size_t pos = input_string.find(delimiter);
while (pos != string::npos) {
splits.push_back(input_string.substr(i, pos - i));
i = pos + 1;
pos = input_string.find(delimiter, i);
}
splits.push_back(input_string.substr(i, min(pos, input_string.length()) - i + 1));
return splits;
I don't know where did u find this approach but from my point of view author trying to save time on IO operations. I think that this approach is wrong. I don't know how fast is reading string with getline compared to reading each int value one by one in the loop:
for(int i = 0; i<n; i++) cin>> x;
But I'm quite sure that converting string to integer is more time-consuming. So it's normal in competive programming to use scanf/printf for the fast IO or
ios_base::sync_with_stdio(false); cin.tie(nullptr); cout.tie(nullptr);
if you wanna use cout/cin.
To sum up I think that code author tried to save time on IO operations.
UPD: Sorry I was in rush and didn't take into account the platform. It should be this platform feature to read input only line by line so they give a template for you to focus only on the problem.

read string into array

I want to read a string with integers and whitespaces into an array. For example I have a string looks like 1 2 3 4 5, and I want to convert it into an integer array arr[5]={1, 2, 3, 4, 5}. How should I do that?
I tried to delete the whitespaces, but that just assign the whole 12345 into every array element. If I don't everything element will all assigned 1.
for (int i = 0; i < str.length(); i++){
if (str[i] == ' ')
str.erase(i, 1);
}
for (int j = 0; j < size; j++){ // size is given
arr[j] = atoi(str.c_str());
}
A couple of notes:
Use a std::vector. You will most likely never know the size of an input at compile time. If you do, use a std::array.
If you have C++11 available to you, maybe think about stoi or stol, as they will throw upon failed conversion
You could accomplish your task with a std::stringstream which will allow you to treat a std::string as a std::istream like std::cin. I recommend this way
alternatively, you could go the hard route and attempt to tokenize your std::string based on ' ' as a delimiter, which is what it appears you are trying to do.
Finally, why reinvent the wheel if you go the tokenization route? Use Boost's split function.
Stringstream approach
std::vector<int> ReadInputFromStream(const std::string& _input, int _num_vals)
{
std::vector<int> toReturn;
toReturn.reserve(_num_vals);
std::istringstream fin(_input);
for(int i=0, nextInt=0; i < _num_vals && fin >> nextInt; ++i)
{
toReturn.emplace_back(nextInt);
}
// assert (toReturn.size() == _num_vals, "Error, stream did not contain enough input")
return toReturn;
}
Tokenization approach
std::vector<int> ReadInputFromTokenizedString(const std::string& _input, int _num_vals)
{
std::vector<int> toReturn;
toReturn.reserve(_num_vals);
char tok = ' '; // whitespace delimiter
size_t beg = 0;
size_t end = 0;
for(beg = _input.find_first_not_of(tok, end); toReturn.size() < static_cast<size_t>(_num_vals) &&
beg != std::string::npos; beg = _input.find_first_not_of(tok, end))
{
end = beg+1;
while(_input[end] == tok && end < _input.size())
++end;
toReturn.push_back(std::stoi(_input.substr(beg, end-beg)));
}
// assert (toReturn.size() == _num_vals, "Error, string did not contain enough input")
return toReturn;
}
Live Demo
Your code arr[j] = atoi(str.c_str()); is fault. The str is a string, not a char. When you used atoi(const char *), you should give the &char param. So the correct code is arr[j] = atoi(&str[j]). By the way, if you want to change the string to int, you could use the function arr[j] = std::stoul(str). I hope this can help you.
You have modified/parsing the string in one loop, but copying to integer array in another loop. without setting any marks, where all the embedded integers in strings start/end. So we have to do both the actions in single loop.
This code is not perfect, but to give you some idea; followed the same process you followed, but used vectors.
string str = "12 13 14";
vector<int> integers;
int start=0,i = 0;
for (; i < str.length(); i++){
if (str[i] == ' ')
{
integers.push_back(atoi(str.substr(start,i).c_str()));
start = i;
}
}
integers.push_back(atoi(str.substr(start,i).c_str()));

User input to matrix in C++

I have trouble to read in an input from user and convert them into matrix for calculation. For example, with the input = {1 2 3 / 4 5 6}, the program should read in the matrix in the form of
1 2 3
4 5 6
which have 3 cols and 2 rows. What i got so far which does not seem to work:
input.replace(input.begin(), input.end(), '/', ' ');
stringstream ss(input);
string token;
while (getline(ss, token, ' '))
{
for (int i = 0; i < row; i++)
{
for (int j = 0; j < col; j++)
{
int tok = atoi(token.c_str());
(*matrix).setElement(i, j, tok);
}
}
}
So what I'm trying to do is to break the input into token and store them into the matrix using the setElement function which take the number of row, column and the variable that user want to store. What wrong with this code is that the variable of tok doesnt seem to change and keep stuck in 0. Assuming that row and col are knows.
Thanks so much for any help.
Although many simple ways exist to solve the specific problem (and other answer have various good suggestions) let me try to give a more general view of the problem of "formatted input".
There are essentially three kind of problems, here:
at low level you have to do a string to number conversion
at a higher level you have to parse a composite format (understanding rows and line separation)
finally you also have to understand the size of the compound (how many rows and cols?)
this 3 things are not fully independent and the last is needed to know how to store elements (how do you size the matrix?)
Finally there is a 4th problem (that is spread all other the other 3): what to do if the input is "wrong".
These kind of problem are typically afforded in two opposite ways:
Read the data as they come, recognize if the format is matched, and dynamically grow the data structure that have to contain them or...
Read all the data as once as they are (textual form), then analyze the text to figure out how many elements it has, then isolate the "chunks" and do the conversions.
Point 2. requires good string manipulations, but also requires the ability to know how the input is long (what happens if one of the separating spaces is a new-line? the idea the everything is got with a getline fails in those cases)
Point 1 requires a Matrix class that is capable to grow as you read or a temporary dynamic structure (like and std container) in which you can place what you read before sending it into the appropriate place.
Since I don't know how your matrix works, let me keep a temporary vector and counters to store lines.
#include <vector>
#include <iostream>
#include <cassert>
class readmatrix
{
std::vector<int> data; //storage
size_t rows, cols; //the counted rows and columns
size_t col; //the counting cols in a current row
Matrix& mtx; //refer to the matrix that has to be read
public:
// just keep the reference to the destination
readmatrix(Matrix& m) :data(), rows(), cols(), cols(), mtx(m)
{}
// make this class a istream-->istream functor and let it be usable as a stream
// manipulator: input >> readmatrix(yourmatrix)
std::istream& operator()(std::istream& s)
{
if(s) //if we can read
{
char c=0:
s >> c; //trim spaces and get a char.
if(c!='{') //not an open brace
{ s.setstate(s.failbit); return s; } //report the format failure
while(s) //loop on rows (will break when the '}' will be found)
{
col=0;
while(s) //loop on cols (will break when the '/' or '}' will be found)
{
c=0; s >> c;
if(c=='/' || c=='}') //row finished?
{
if(!cols) cols=col; //got first row length
else if(cols != col) //it appears rows have different length
{ s.setstate(s.failbit); return s; } //report the format failure
if(c!='/') s.unget(); //push the char back for later
break; //row finished
}
s.unget(); //pushthe "not /" char back
int x; s >> x; //get an integer
if(!s) return s; //failed to read an integer!
++col; data.push_back(x); //save the read data
}
++rows; //got an entire row
c=0; s >> c;
if(c == '}') break; //finished the rows
else s.unget(); //push back the char: next row begin
}
}
//now, if read was successful,
// we can dispatch the data into the final destination
if(s)
{
mtx.setsize(rows,cols); // I assume you can set the matrix size this way
auto it = data.begin(); //will scan the inner vector
for(size_t r=0; r<rows; ++r) for(size_t c=0; c<cols; ++c, ++it)
mtx(r,c) = *it; //place the data
assert(it == data.end()); //this must be true if counting have gone right
}
return s;
}
};
Now you can read the matrix as
input >> readmatrix(matrix);
You will notice at this point that there are certain recurring patterns in the code: this is typical in one-pass parses, and those patterns can be grouped to form sub-parsers. If you do it generically you -in fact- will rewrite boost::spirit.
Of course some adaption can be done depending on how your matrix works (has it fixed sizes??), or what to do if rows sizes don't match (partial column filling ??)
You can even add a formatted input operator like
std::istream& operator>>(std::istream& s, Matrix& m)
{ return s >> readmatrix(m); }
so that you can just do
input >> matrix;
You are trying to operate on each cell of the matrix for each char read in the input!
You have to take one char for each cell, not multiple.
Splitting a string in tokens can be done by using the following function.
Please don't be shocked that the following code isn't runnable, this is due to the missing matrix class.
Try the following:
#include <iostream>
#include <string>
#include <sstream>
#include <algorithm>
#include <iterator>
using namespace std;
void split(const string& str, char delimiter, vector<string>& result) {
string::size_type i = 0;
string::size_type delimOcc = str.find(delimiter);
while (delimOcc != string::npos) {
result.push_back(str.substr(i, delimOcc-i));
i = ++delimOcc;
delimOcc = str.find(delimiter, delimOcc);
if (delimOcc == string::npos) {
result.push_back(str.substr(i, str.length()));
}
}
}
int main()
{
std::string input = "1 2 3 / 4 5 6";
vector<string> rows;
split(input, '/', rows);
for(int i = 0; i < rows.size(); i++) {
vector<string> cols;
split(rows[i], ' ', cols);
for(int j = 0; j < cols.size(); j++) {
if(cols[j][0] != '\0'){
int tok = stoi(cols[j]);
(*matrix).setElement(i, j, tok);
cout << tok << " - " << i << " - " << j << endl;
}
else {
if(j == 0) j--;
}
}
}
return 0;
}
If you know the size of the matrix on forehand you actually don't need getline, you should read int by int. (untested code)
input.replace(input.begin(), input.end(), '/', '\n');
stringstream ss(input);
for (int i = 0; i < row; i++)
{
for (int j = 0; j < col; j++)
{
int tok;
ss >> tok;
(*matrix).setElement(i, j, tok);
}
}

Getting Word Frequency From Vector In c++

I have googled this question and couldn't find an answer that worked with my code so i wrote this to get the frequency of the words the only issue is that i am getting the wrong number of occurrences of words apart form one that i think is a fluke. Also i am checking to see if a word has already been entered into the vector so i don't count the same word twice.
fileSize = textFile.size();
vector<wordFrequency> words (fileSize);
int index = 0;
for(int i = 0; i <= fileSize - 1; i++)
{
for(int j = 0; j < fileSize - 1; j++)
{
if(string::npos != textFile[i].find(textFile[j]) && words[i].Word != textFile[j])
{
words[j].Word = textFile[i];
words[j].Times = index++;
}
}
index = 0;
}
Any help would be appreciated.
Consider using a std::map<std::string,int> instead. The map class will handle ensuring that you don't have any duplicates.
Using an associative container:
typedef std::unordered_map<std::string, unsigned> WordFrequencies;
WordFrequencies count(std::vector<std::string> const& words) {
WordFrequencies wf;
for (std::string const& word: words) {
wf[word] += 1;
}
return wf;
}
It is hard to get simpler...
Note: you can replace unordered_map with map if you want the worlds sorted alphabetically, and you can write custom comparisons operations to treat them case-insensitively.
try this code instead if you do not want to use a map container..
struct wordFreq{
string word;
int count;
wordFreq(string str, int c):word(str),count(c){}
};
vector<wordFreq> words;
int ffind(vector<wordFreq>::iterator i, vector<wordFreq>::iterator j, string s)
{
for(;i<j;i++){
if((*i).word == s)
return 1;
}
return 0;
}
Code for finding the no of occurrences in a textfile vector is then:
for(int i=0; i< textfile.size();i++){
if(ffind(words.begin(),words.end(),textfile[i])) // Check whether word already checked for, if so move to the next one, i.e. avoid repetitions
continue;
words.push_back(wordFreq(textfile[i],1)); // Add the word to vector as it was not checked before and set its count to 1
for(int j = i+1;j<textfile.size();j++){ // find possible duplicates of textfile[i]
if(file[j] == (*(words.end()-1)).word)
(*(words.end()-1)).count++;
}
}