C++ string parser issues

C++ string parser issues - c++

Ok, so I'm working on a homework project in C++ and am running into an issue, and can't seem to find a way around it. The function is supposed to break an input string at user-defined delimiters and store the substrings in a vector to be accessed later. I think I got the basic parser figured out, but it doesn't want to split the last part of the input.
int main() {
string input = "comma-delim-delim&delim-delim";
vector<string> result;
vector<char> delims;
delims.push_back('-');
delims.push_back('&');
int begin = 0;
for (int i = begin; i < input.length(); i++ ){
for(int j = 0; j < delims.size(); j++){
if(input.at(i) == delims.at(j)){
//Compares chars in delim vector to current char in string, and
//creates a substring from the beginning to the current position
//minus 1, to account for the current char being a delimiter.
string subString = input.substr(begin, (i - begin));
result.push_back(subString);
begin = i + 1;
}
The above code works fine for splitting the input code up until the last dash. Anything after that, because it doesn't run into another delimiter, it won't save as a substring and push into the result vector. So in an attempt to rectify the matter, I put together the following:
else if(input.at(i) == input.at(input.length())){
string subString = input.substr(begin, (input.length() - begin));
result.push_back(subString);
}
However, I keep getting out of bounds errors with the above portion. It seems to be having an issue with the boundaries for splitting the substring, and I can't figure out how to get around it. Any help?

In your code you have to remember that .size() is going to be 1 more than your last index because it starts at 0. so an array of size 1 is indexed at [0]. so if you do input.at(input.length()) will always overflow by 1 place. input.at(input.length()-1) is the last element. here is an example that is working for me. After your loops just grab the last piece of the string.
if(begin != input.length()){
string subString = input.substr(begin,(input.length()-begin));
result.push_back(subString);
}

Working from the code in the question I've substituted iterators so that we can check for the end() of the input:
int main() {
string input = "comma-delim-delim&delim-delim";
vector<string> result;
vector<char> delims;
delims.push_back('-');
delims.push_back('&');
auto begin = input.begin(); // use iterator
for(auto ii = input.begin(); ii <= input.end(); ii++){
for(auto j : delims) {
if(ii == input.end() || *ii == j){
string subString(begin,ii); // can construct string from iterators, of if ii is at end
result.push_back(subString);
if(ii != input.end())
begin = ii + 1;
else
goto done;
}
}
}
done:
return 0;
}

This program uses std::find_first_of to parse the multiple delimiters:
int main() {
string input = "comma-delim-delim&delim-delim";
vector<string> result;
vector<char> delims;
delims.push_back('-');
delims.push_back('&');
auto begin = input.begin(); // use iterator
for(;;) {
auto next = find_first_of(begin, input.end(), delims.begin(), delims.end());
string subString(begin, next); // can construct string from iterators
result.push_back(subString);
if(next == input.end())
break;
begin = next + 1;
}
}

Related

read string into array

I want to read a string with integers and whitespaces into an array. For example I have a string looks like 1 2 3 4 5, and I want to convert it into an integer array arr[5]={1, 2, 3, 4, 5}. How should I do that?
I tried to delete the whitespaces, but that just assign the whole 12345 into every array element. If I don't everything element will all assigned 1.
for (int i = 0; i < str.length(); i++){
if (str[i] == ' ')
str.erase(i, 1);
}
for (int j = 0; j < size; j++){ // size is given
arr[j] = atoi(str.c_str());
}

A couple of notes:
Use a std::vector. You will most likely never know the size of an input at compile time. If you do, use a std::array.
If you have C++11 available to you, maybe think about stoi or stol, as they will throw upon failed conversion
You could accomplish your task with a std::stringstream which will allow you to treat a std::string as a std::istream like std::cin. I recommend this way
alternatively, you could go the hard route and attempt to tokenize your std::string based on ' ' as a delimiter, which is what it appears you are trying to do.
Finally, why reinvent the wheel if you go the tokenization route? Use Boost's split function.
Stringstream approach
std::vector<int> ReadInputFromStream(const std::string& _input, int _num_vals)
{
std::vector<int> toReturn;
toReturn.reserve(_num_vals);
std::istringstream fin(_input);
for(int i=0, nextInt=0; i < _num_vals && fin >> nextInt; ++i)
{
toReturn.emplace_back(nextInt);
}
// assert (toReturn.size() == _num_vals, "Error, stream did not contain enough input")
return toReturn;
}
Tokenization approach
std::vector<int> ReadInputFromTokenizedString(const std::string& _input, int _num_vals)
{
std::vector<int> toReturn;
toReturn.reserve(_num_vals);
char tok = ' '; // whitespace delimiter
size_t beg = 0;
size_t end = 0;
for(beg = _input.find_first_not_of(tok, end); toReturn.size() < static_cast<size_t>(_num_vals) &&
beg != std::string::npos; beg = _input.find_first_not_of(tok, end))
{
end = beg+1;
while(_input[end] == tok && end < _input.size())
++end;
toReturn.push_back(std::stoi(_input.substr(beg, end-beg)));
}
// assert (toReturn.size() == _num_vals, "Error, string did not contain enough input")
return toReturn;
}
Live Demo

Your code arr[j] = atoi(str.c_str()); is fault. The str is a string, not a char. When you used atoi(const char *), you should give the &char param. So the correct code is arr[j] = atoi(&str[j]). By the way, if you want to change the string to int, you could use the function arr[j] = std::stoul(str). I hope this can help you.

You have modified/parsing the string in one loop, but copying to integer array in another loop. without setting any marks, where all the embedded integers in strings start/end. So we have to do both the actions in single loop.
This code is not perfect, but to give you some idea; followed the same process you followed, but used vectors.
string str = "12 13 14";
vector<int> integers;
int start=0,i = 0;
for (; i < str.length(); i++){
if (str[i] == ' ')
{
integers.push_back(atoi(str.substr(start,i).c_str()));
start = i;
}
}
integers.push_back(atoi(str.substr(start,i).c_str()));

c++ finding a string using part of the string

lets say that we have:
string list[]= {"12.34.56.78","55.34.5","23.44.5"}
I want the user to enter part of the string which is also a string:
for example string 55 and it will loop through the string a and look for the whole string and print "55.34.5"
What I was doing is:
str is a string input and list is a whole list of the strings
for (int i=0; i<n; i++){
for (int j=0; j<(list[i].length()); j++){
for (int k=0; k<(str.length()); k++){
if (list[i][j] == str[k])
cout<<list[i]<<endl;
else
break;
however, there is a problem with this, and it doesn't work properly.
Update:
so I have updated my code to:
for (int i=0; i<n; i++)
if (strncmp(list[i].c_str(), str.c_str(), str.length()) == 0)){
cout<<list[i]<<endl;
}
however, this doesn't output any of the strings.

For any function fanatics (see it work):
std::string findInList(const std::vector<std::string> &searchFrom, const std::string &lookFor) {
for (const std::string &s : searchFrom) {
if (s.find(lookFor) != std::string::npos)
return s;
}
return "";
}
I used a vector instead of an array because vectors are better and don't require extra work to get the array size from. If C++11 isn't being used, a normal for loop works perfectly fine.
This also assumes you want the first match to be returned. A probably better option is to return a vector of strings, empty if none are found, which makes it explicit that none were found, or as many as are found otherwise. Instead of returning the found string, just add it to the vector and continue on, returning the vector when you're done.
If you want to model the standard algorithms, you can also have it take a beginning iterator and an ending iterator instead of the actual container. This will allow you to call it on any type of container, including arrays, with any range in that container to look through.
Taking both points into consideration, you can evolve it into this (see it work):
template <typename Iterator>
std::vector<std::string> findInList(Iterator start, const Iterator end, const std::string &lookFor) {
std::vector<std::string> ret;
for (; start != end; ++start)
if (start->find(lookFor) != std::string::npos)
ret.emplace_back(*start);
return ret;
}
Again, if not using C++11, emplace_back can be swapped out for push_back.

That just compares the first character in list[i] with the first char in your string. If the corresponding first chars match, it prints the entire ith string and then advances k, the offset into your str, without changing the offset into the string against which you're comparing. I think you can dispense with the inner two loops, and use a fixed length string comparison, i.e.,
for (int i=0; i < n; i++) {
if (strncmp(list[i].c_str(), str.c_str(), str.length()) == 0) {
// match
}
}

Here's an answer that combines both of the previous answers. It uses the find member function of the std::string class
for (int i=0; i < n; i++) {
if (list[i].find(str) != std::string::npos) {
std::cout << list[i] << std::endl;
}
}

Getting Word Frequency From Vector In c++

I have googled this question and couldn't find an answer that worked with my code so i wrote this to get the frequency of the words the only issue is that i am getting the wrong number of occurrences of words apart form one that i think is a fluke. Also i am checking to see if a word has already been entered into the vector so i don't count the same word twice.
fileSize = textFile.size();
vector<wordFrequency> words (fileSize);
int index = 0;
for(int i = 0; i <= fileSize - 1; i++)
{
for(int j = 0; j < fileSize - 1; j++)
{
if(string::npos != textFile[i].find(textFile[j]) && words[i].Word != textFile[j])
{
words[j].Word = textFile[i];
words[j].Times = index++;
}
}
index = 0;
}
Any help would be appreciated.

Consider using a std::map<std::string,int> instead. The map class will handle ensuring that you don't have any duplicates.

Using an associative container:
typedef std::unordered_map<std::string, unsigned> WordFrequencies;
WordFrequencies count(std::vector<std::string> const& words) {
WordFrequencies wf;
for (std::string const& word: words) {
wf[word] += 1;
}
return wf;
}
It is hard to get simpler...
Note: you can replace unordered_map with map if you want the worlds sorted alphabetically, and you can write custom comparisons operations to treat them case-insensitively.

try this code instead if you do not want to use a map container..
struct wordFreq{
string word;
int count;
wordFreq(string str, int c):word(str),count(c){}
};
vector<wordFreq> words;
int ffind(vector<wordFreq>::iterator i, vector<wordFreq>::iterator j, string s)
{
for(;i<j;i++){
if((*i).word == s)
return 1;
}
return 0;
}
Code for finding the no of occurrences in a textfile vector is then:
for(int i=0; i< textfile.size();i++){
if(ffind(words.begin(),words.end(),textfile[i])) // Check whether word already checked for, if so move to the next one, i.e. avoid repetitions
continue;
words.push_back(wordFreq(textfile[i],1)); // Add the word to vector as it was not checked before and set its count to 1
for(int j = i+1;j<textfile.size();j++){ // find possible duplicates of textfile[i]
if(file[j] == (*(words.end()-1)).word)
(*(words.end()-1)).count++;
}
}

Splitting string into a vector<string> of words

From Accelerated C++(book), I found this code which is identical program, but the processed in program itself is different, and confused me on some part.
The code below, well, obviously it will output each word one-by-one(by loops) based on user input after the user included end-of-file, then, end the program.
int main()
{
string s;
while (cin >> s)
cout << s << endl;
return 0;
}
Unlike code above, this one will store each word in a vector, then use index i and j to detect the non-whitespace character, and the real question is, I don't understand how it happens with the vector.
What is whitespace in vector? An element?
At first, I thought the program will proceed through each character, because I thought the whitespace is character(which i and j functionality is for), then, the book come and said it proceed through each word, I don't know how to test this myself, like I can see how the inner process in the compiler itself..
vector<string> split(const string& s)
{
vector<string> ret;
typedef string::size_type string_size;
string_size i = 0;
// invariant: we have processed characters [original value of i, i)
while (i != s.size())
{
// ignore leading blanks
// invariant: characters in range [original i, current i) are all spaces
while (i != s.size() && isspace(s[i]))
++i;
// find end of next word
string_size j = i;
// invariant: none of the characters in range [original j, current j)is a space
while (j != s.size() && !isspace(s[j]))
j++;
// if we found some nonwhitespace characters
if (i != j) {
// copy from s starting at i and taking j - i chars
ret.push_back(s.substr(i, j - i));
i = j;
}
}
return ret;
}
int main() {
string s;
// read and split each line of input
while (getline(cin, s)) {
vector<string> v = split(s);
// write each word in v
for (vector<string>::size_type i = 0; i != v.size(); ++i)
cout << v[i] << endl;
}
return 0;
}

The code you posted above does not split a line of text into words, based on whitespace, it instead splits a line into characters. However, that's if the code was actually compilable and not missing any necessary braces ({, }). EDIT: Actually whether it splits words or individual characters depends on where the braces go, bottom line is that the code doesn't compile.
Here is a fixed version of the code that splits each word, rather than each character, by simply moving the last if statement in split outside of it's immediate while block:
#include <iostream>
#include <vector>
using namespace std;
vector<string> split(const string& s)
{
vector<string> ret;
typedef string::size_type string_size;
string_size i = 0;
// invariant: we have processed characters [original value of i, i)
while (i != s.size()) {
// ignore leading blanks
// invariant: characters in range [original i, current i) are all spaces
while (i != s.size() && isspace(s[i]))
++i;
// find end of next word
string_size j = i;
// invariant: none of the characters in range [original j, current j)is a space
while (j != s.size() && !isspace(s[j]))
j++;
// if we found some nonwhitespace characters
if (i != j) {
// copy from s starting at i and taking j - i chars
ret.push_back(s.substr(i, j - i));
i = j;
}
}
return ret;
}
int main() {
string s;
// read and split each line of input
while (getline(cin, s)) {
vector<string> v = split(s);
// write each word in v
for (vector<string>::size_type i = 0; i != v.size(); ++i)
cout << v[i] << endl;
}
return 0;
}
What happens to the string passed to split is:
While still characters in the string (while (i != s.size()))
While we're reading a space from the string while (i != s.size() && isspace(s[i]))
Increment the counter until we get to the start of a word (++i)
Set the end of the word as the start of the word (string_size j = i)
While we're still inside this word and not up to a space (while (j != s.size() && !isspace(s[j])))
Increment the counter indicating the end of the word (j++)
If there are some non-whitespace characters - end is greater than the start (if (i != j))
Create a sub-string from the start point to the end point of the word (s.substr(i, j - i)), and add that word to the vector (ret.push_back(..)).
Rinse and repeat.

If you are just splitting based on space, then you don't need write a custom method. STL has options for you.
std::string line;
std::vector<std::string> strings;
while ( std::getline(std::cin, line))
{
std::istringstream s ( line);
strings.insert(strings.end(),
std::istream_iterator<std::string>(s),
std::istream_iterator<std::string>());
}
// For simplicity sake using lambda.
std::for_each(strings.begin(), strings.end(), [](const std::string& str)
{
std::cout << str << "\n";
});

A vector of vector problems

I'm trying to loop through a list of strings and find where a given character is located at in said string. I then store the string in a given vector based on where/if the character occurs. I'm getting a runtime error in the following code before the loop finishes executing. I've looked over the it half a dozen times already and can't seem to find anything wrong.
vector< vector<string> > p;
for(list< string >::iterator ix = dictionary.begin(); ix != dictionary.end(); ix++)
{
int index = contains(*ix, guess);
index++;
p.at(index).push_back(*ix); //0 will contain all the words that do not contain the letter
//1 will be the words that start with the char
//2 will be the words that contain the the char as the second letter
//etc...
}
int contains(string str, char c)
{
char *a = (char *)str.c_str();
for(int i = 0; i < (str.size() + 1); i++)
{
if(a[i] == c)
return i;
}
return -1;
}

Change
(str.size() + 1)
...to
str.size()
You would be in undefined territory at str.size(), let alone that PLUS one.
For that matter, why are you fiddling with the extra char* instead of std::string[]?
For THAT matter, why don't you simply use std::string::find()?
That is, of course, assuming you're using std::string and not some other string... :)
In fact, back to the call site... string::find() returns the index of where the target character matched, or string::npos if NOT matched. So, can you dispense with the extra function altogether?
int pos = (*ix).find( guess );
p.at( ( pos == string::npos ) ? 0 : ( pos + 1 ) ).push_back( *ix );

vector< vector > p defines p as empty vector. You must have vector elements added to it before using vector::at().
For example:
const size_t MAX_LETTERS_IN_WORD = 30;
vector< vector<string> > p(MAX_LETTERS_IN_WORD);
/* same as before */
As an alternative you can check p.size() before using at() and push_back() additional elements into p as needed

The problem with the runtime error, might be because you access the vector p at a position that doesn't exist yet. You have to make space in the vector before you access a specific index.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

C++ string parser issues - c++

Related

read string into array

c++ finding a string using part of the string

Getting Word Frequency From Vector In c++

Splitting string into a vector<string> of words

A vector of vector problems

Categories

Resources