istringstream skip next (n) word(s) - c++

Is there a propper way in an isstrinstream to skip/ ignore the next, or even the next n words?
The possibility to read n times a variable seems to work, but is very clunky:
for (int i = 0; i < n; ++i)
{
std::string tmp;
some_stream >> tmp;
}
std::istream::ignore doesn't seem to do the job, as only n letters are skipped.

It's clunky because it's not common enough to have gotten the appropriate attention to get a standard algorithm in place.
{
std::string tmp;
for(size_t i = 0; i < number_of_words_to_skip; ++i) some_stream >> tmp;
}
You can make it fancier by creating a null receiver:
std::copy_n(std::istream_iterator<std::string>(some_stream),
number_of_words_to_skip,
std::back_inserter(instance_of_something_that_does_not_do_anything));
Since the std library is void of such a receiver and adding it for this one case is likely to cause more confusion than your original solution., I'd say that the idiomatic approach, Today, is just to make a loop like above.

Related

Quickly processing a string

I have an input string which I will use to build an output string. The output string is much the same as the input string but has some slight changes along the way, depending on if we hit a certain characters in the input. The code looks like this
outputTree(std::ostream& o, const char* input) {
size_t len = strlen(input);
int indent = 0;
string output;
for(size_t i = 0; i < input_len; i++) {
if(input[i] == '(') {
indent++;
output.append(1,'\n');
for(int j = 0; j < indent; j++) {
output.append(" ");
}
}
if(input[i] == ')') {
output.append(1,'\n');
for(int j = 0; j < indent; j++) {
output.append(" ");
}
indent--;
}
output.append(1,input[i]);
}
o << output << endl;
}
While this works, doing it character by character is pretty slow. Can anyone recommend some better approaches to this (pref with standard features, ie no boost)?
I am not sure what "pretty slow" means and how much improvement you expect. The algorithm you have is an efficient O(N) algorithm. The only thing you can do is trying to optimise individual operations, but then again I am not sure what kind of optimisations is the compiler doing.
One thing you could do is get rid of the inner loops that write the indentation string as many times is needed by the level of indentation (your indent variable). So instead of writing a newline and then execute the for loop, you can perform just one append that has the newline concatenated with the right amount of indentation. If you can set a limit to the depth of your indentation (say 10 levels deep) you can then create an array that has all 10 different indentation strings (all strings starting with a newline).
Another thing you can try is to use std::string::find to find the occurrences of '(' and ')' and do string copies up to these markers, instead of copying character by character.
Finally, note that your code might not behave as you want it, if the input is not as expected. Notice what will happen if you first encounter a ')' without having a corresponding '('.

Find the most frequently word using hashmaps c++

I need to find the most frequently occurring word and return that value. I must use hash maps and the fuction would take a file name. This is what I've done so far but im very confused.
int most_frequent_word(string filename)
{
string words;
ifstream in(filename.c_str());
unordered_map<string, int> word_map;
while(in >> words)
{
for(int i = 0; i < 100; i++)
{
word_map[words[i]]++;
}
}
return words;
}
any help would be appreciated it. Thanks!
There are several issues in your code that might cause it to work not as expected.
First is for i loop. Why would you need taht loop at all? Leave it like this, you need to count words.
while(in >> words)
{
word_map[words]++;
}
Rename words to word, actually you are reading one word here in >> words.
The third is return statement. You cannot return string when it is declared that function returns int.
However there is nothing to return yet, because so far we only know the number each word occurred. Run a loop to find max value.
int result = 0;
for(unordered_map<string, int>::iterator it = word_map.begin(); it != word_map.end(); it++)
result = max(result, it->second);
return result;
Here word_map consists of pairs of a word and its number of occurrences. We need to iterate over all of these pairs looking for max occurrences. To do this we use iterator it.
I'm also confused!
for(int i = 0; i < 100; i++)
{
word_map[words[i]]++;
}
What are you doing here? Where does the 100 come from? Why do you care for single letters of your words (which is what words[i] gets you)?
If I understand your task correctly, wouldn't it suffice to
++word_map[words];
instead?
Also why do you return words? It's a string, and your function should return and int. Instead find the largest value in your map and you're done.

Taking user input in C++ without c++ standard library classes

I'm trying to take user input from the console without using the c++ standard library classes. Here's my code
while(true){
std::string line = " ";
while (getline(std::cin, line)) {
std::string arr[100];
int i = 0, len = 0;
for (int j=0; j < line.length(); j++) {
if(line[j] ==' ' || line[j] == '\n'){
std::string word = line.substr(i, j);
arr[len] = word;
len++;
i = j;
}
}
for (int k = 0; k <len ; ++k) {
std::cout<<arr[k]<<std::endl;
}
}
//break;
}
The idea is to identify each word and store it in an array. However this program only identifies the first word. Any idea, what am I doing wrong here.
In reading this, it would seem your problem lies on the line std::string word = line.substr(i, j);. You have to understand that substr's arguments are not "from i to j", but "from i, j characters". Read the documentation. :)
I haven't tested this, so it might not be perfect, but the principle is there.
while(true){
std::string line = " ";
while (getline(std::cin, line)) {
std::string arr[100];
int num_chars = 0, word = 0;
for (int i=0; i < line.length(); i++) {
/*We increment i, so every non-space character is one we
* will include in the word.*/
num_chars++;
if(line[i] ==' ' || line[i] == '\n'){
/*We want all characters from i to i+num_chars, that is
* we want all characters from j, forward i indices.
*/
std::string word = line.substr(i, num_chars);
arr[word] = word;
word++;
//We reset i here, in prep for parsing the next word.
i = 0;
}
}
for (int k = 0; k <len ; ++k) {
std::cout<<arr[k]<<std::endl;
}
}
//break;
}
Two other considerations:
1) Beware the single-letter variables, as it makes it far harder to read your code later. i is standard for a loop iterator or index, with j being the next when you're nesting for loops. However, i is not suitable for the "length of the word". Similarly, len is not suitable for the index of the word being stored. I changed the variables in the code to make it easier to read.
2) I would seriously consider revisiting your loop structure. while is common and very useful, but it is also highly prone to infinite loops. In fact, while(true) IS an infinite loop, so if you don't reach break for whatever reason, you'll wind up with some serious issues.
--
I would also agree that, if you want to avoid "STL" (and, actually std:: and stl are commonly confused, but are not the same thing...so let's say you want to avoid std), you'll want to avoid std::string and std::cin. Use C-strings and scanf/printf instead, as Nidhoegger suggested. It is more efficient than the std options, but it also more prone to the errors and "undefined behaviors" characteristic to C. It would take more effort, but would yield a more efficient result if you do it right.
While we're at it, I do NOT recommend std::stringstream unless your other tools cannot do the job well. That class has serious performance and efficiency problems, which are well-documented. I only recommend using it in cases where writing your own code with std::string and such would be either too laborious or have a high probability of being inefficient. This is NOT one of those cases.

independent things influence each other (I have no idea what is going on)

Sorry for the title, but I really have no idea what the problem is. The code looks like that (here it has no sense, but in the bigger project is has, so please, do not ask "why do you want to do....")
#include <iostream>
#include <vector>
#include <fstream>
using namespace std;
string sort (string slowo){
string litery = slowo;
for (int i=0; i<litery.length()-1; i++)
for (int j=0; j<litery.length()-1; j++)
if (litery[j]>litery[j+1])
swap(litery[j], litery[j+1]); // (3)
return litery;
}
int main()
{
fstream wordlist;
wordlist.open("wordlist_test",ios::in);
vector<string> words;
while (!wordlist.eof()){ // (4)
bool ok = true;
string word;
getline(wordlist,word);
string sorted = sort(word);
if (ok){
cout<<word<<endl; // (1)
words.push_back(word);
}
}
for (int i = 0; i<words.size(); i++){
cout<<words[i]<<endl; // (2)
}
}
There are for words in file "wordlist_tests". Program at the end should just write them to vector and write what's in vector into standard output. The problem is:
however line(1) proves that all words are ok
vector appears to be
empty in line (2)
now iteresting (probably just for me) part:
there are two ways to make it right:
I can just remove line(3) (however, if I am right, as the variable is passed to sort function through the value, it just swap two letters in independent variable; it has nothing to do with my vector), or:
I can change condition in while loop (4).
for example just like this:
int tmp = 0;
while (tmp < 5){
tmp++;
/..../
What is wrong with this code? What should I do write these words down to vector but still sort them and using this while loop? I cannot find the connection between this things (ok, I see that connection is variable word, but I do not know in what way). Any help appreciate.
What happens in swap() if one of the words is the empty sting ""?
If this happens, litery = "".
The condition in the loops will be to iterate from 0 to (unsigned) 0 - 1, which is a very large number.
You'll then execute if (litery[0] > litery[1])
litery[1] will access beyond the end of the empty string, which causes undefined behavior.
Let's fix this:
The common fix for this, is to iterate from 1 to string.length(). Here's an example:
string sort (string litery){
for (int i=1; i<litery.length(); i++)
for (int j=1; j<litery.length(); j++)
if (litery[j-1]>litery[j])
swap(litery[j-1], litery[j]);
return litery;
}

C++ Dynamic Array Inputs

I am using two dynamic arrays to read from a file. They are to keep track of each word and the amount of times it appears. If it has already appeared, I must keep track in one array and not add it into the other array since it already exists. However, I am getting blank spaces in my array when I meet a duplicate. I think its because my pointer continues to advance, but really it shouldn't. I do not know how to combat this. The only way I have was to use a continue; when I print out the results if the array content = ""; if (*(words + i) == "") continue;. This basically ignores those blanks in the array. But I think that is messy. I just want to figure out how to move the pointer back in this method. words and frequency are my dynamic arrays.
I would like guidance in what my problem is, rather than solutions.
I have now changed my outer loop to be a while loop, and only increment when I have found the word. Thank you WhozCraig and poljpocket.
Now this occurs.
Instead of incrementing your loop variable [i] every loop, you need to only increment it when a NEW word is found [i.e. not one already in the words array].
Also, you're wasting time in your inner loop by looping through your entire words array, since words will only exist up to index i.
int idx = 0;
while (file >> hold && idx < count) {
if (!valid_word(hold)) {
continue;
}
// You don't need to check past idx because you
// only have <idx> words so far.
for (int i = 0; i < idx; i++) {
if (toLower(words[i]) == toLower(hold)) {
frequency[i]++;
isFound = true;
break;
}
}
if (!isFound) {
words[idx] = hold;
frequency[idx] = 1;
idx++;
}
isFound = false;
}
First, to address your code, this is what it should probably look like. Note how we only increment i as we add words, and we only ever scan the words we've already added for duplicates. Note also how the first pass will skip the j-loop entirely and simply insert the first word with a frequency of 1.
void addWords(const std::string& fname, int count, string *words, int *frequency)
{
std::ifstream file(fname);
std::string hold;
int i = 0;
while (i < count && (file >> hold))
{
int j = 0;
for (; j<i; ++j)
{
if (toLower(words[j]) == toLower(hold))
{
// found a duplicate at j
++frequency[j];
break;
}
}
if (j == i)
{
// didn't find a duplicate
words[i] = hold;
frequency[i] = 1;
++i;
}
}
}
Second, to really address your code, this is what it should actually look like:
#include <iostream>
#include <fstream>
#include <map>
#include <string>
//
// Your implementation of toLower() goes here.
//
typedef std::map<std::string, unsigned int> WordMap;
WordMap addWords(const std::string& fname)
{
WordMap words;
std::ifstream inf(fname);
std::string word;
while (inf >> word)
++words[toLower(word)];
return words;
}
If it isn't obvious by now how a std::map<> makes this task easier, it never will be.
check out SEEK_CUR(). If you want to set the cursor back
The problem is a logical one, consider several situations:
Your algorithm does not find the current word. It is inserted at position i of your arrays.
Your algorithm does find the word. The frequency of the word is incremented along with i, which leaves you with blank entries in your arrays whenever there's a word which is already present.
To conclude, 1 works as expected but 2 doesn't.
My advice is that you don't rely on for loops to traverse the string but use a "get-next-until-end" approach which uses a while loop. With this, you can track your next insertion point and thus get rid of the blank entries.
int currentCount = 0;
while (file)
{
// your inner for loop
if (!found)
{
*(words + currentCount) = hold;
*(frequency + currentCount) = 1;
currentCount++;
}
}
Why not use a std::map?
void collect( std::string name, std::map<std::string,int> & freq ){
std::ifstream file;
file.open(name.c_str(), std::ifstream::in );
std::string word;
while( true ){
file >> word; // add toLower
if( file.eof() ) break;
freq[word]++;
}
file.close();
}
The problem with your solution is the use of count in the inner loop where you look for duplicates. You'll need another variable, say nocc, initially 0, used as limit in the inner loop and incremented whenever you add another word that hasn't been seen yet.