Finding words in a (weird) string in C++

Finding words in a (weird) string in C++ - c++

What is technically wrong in this program? The expected result is 6 since that is the total number of words present in the string.
#include <iostream>
using namespace std;
int main()
{
string str = " Let's count the number of words ";
int word = 0;
for (int i = 0; str[i] != '\0';)
{
if ((str[i] == 32 && str[i + 1] == 32) || (str[i] == 32 && str[i - 1] == 32))
{
++i;
}
else if ((str[i] == 32 && str[i - 1] != 32) || (str[i] == 32 && str[i + 1] != 32))
{
word++;
}
++i;
}
cout << "No. of words: " << word << endl;
return 0;
}
My incorrect result:
No. of words: 0
Also, if I try changing the spaces in the string or even the string itself to a totally new set of spaced out words, say:
string str = " Hello world ";
string str = "Hello world! How are you? ";
I still get incorrect results, but different from 0. I'm new to C++ programming and these kinds of strange behaviors are giving me nightmares. Is this common? What I can do to get this corrected?
If you could highlight or correct my program the way I'd written it, it would be much helpful and quick for me to understand the mistake instead of having to know some new commands at this point. Because, as I said, I'm a total beginner in C/C++.
Thanks for your time!

I'm new to C++ programming and these kinds of strange behaviors are giving me nightmares. Is this common?
Yes, it's very common. You've written a load of logic piled up in a heap and you don't have the tools to understand how it behaves.
What I can do to get this corrected?
You can work on this from both directions:
debug this to improve your understanding of how it operates:
identify in advance what you expect it to do for some short input, at each line
single-step through it in the debugger to see what it actually does
think about why it doesn't do what you expected
Sometimes the problem is that your code doesn't implement your algorithm correctly, and sometimes the algorithm itself is broken, and often it's a bit of both. Working through both will give you some insight.
write code that is easier to understand in the first place (and equivalently, write algorithms that are easy to reason about).
This depends on you having some intuition about whether something is easy to reason about, which you develop from iterating step 1.
... instead of having to know some new commands at this point.
Well, you need to learn to use a debugger anyway, so now is as good a time to start as any.
We can certainly improve the existing code, although I'd prefer to fix the logic. In general I'd encourage you to abstract your existing if conditions out into little functions, but the problem is that they don't currently seem to make any sense.
So, how do we define a word?
Your code says it is at least one non-space character preceded or followed by a space. (Do definitely prefer ' ' to 32, by the way, and std::isspace is better than either.)
However your code's implied definition is problematic, because:
each word longer than one character has both a first and last character, and you'll count each of them
you can't check whether the first character is preceded by anything, without going out of bounds
the last character is followed by the null terminator, but you don't count that as whitespace
Let's just choose a different definition, that doesn't require reading str[i-1], and doesn't require the tricky traversal your current code gets wrong.
I claim that a word is a contiguous substring of non-whitespace characters, and words are separated by contiguous substrings of whitespace characters. So, instead of looking at each pair of consecutive characters, we can write pseudocode to work in those terms:
for (current = str.begin(); current != str.end(); ) {
// skip any leading whitespace
current = find_next_non_whitespace(str, current);
if (current != str.end()) {
// we found a word
++words;
current = find_next_whitespace(str, current);
}
}
NB. When I talked about abstracting your code out into little functions, I meant things like find_next_non_whitespace - they should be trivial to implement, easy to test, and have a name that tells you something.
When I said your existing conditions didn't seem to make sense, it's because replacing
if ((str[i] == 32 && str[i + 1] == 32) || (str[i] == 32 && str[i - 1] == 32))
with, say,
if (two_consecutive_spaces(str, i))
prompts more questions than it answers. Why have a special case for exactly two consecutive spaces? Is it different to just one space? What will actually happen if we have two words with a single space between them? Why do we advance by two characters in this case, but only one on the word branch?
The fact that the code can't easily be mapped back onto explicable logic is a bad sign - even if it worked (which we know it doesn't), we don't understand it well enough to ever change, extend or refactor it.

I think you have some ways to do it. Take a look at this code. Very similar to yours:
string s = " Let's count the number of words ";
int word = 0;
for (auto i = 0; s[i] != '\0'; i++) {
if (i == 0) {
if (s[i] != ' ') {
++word;
}
continue;
}
if (s[i - 1] == ' ' && s[i] != ' ') {
++word;
}
}
cout << "No of Words: " << word << endl;
The idea is to iterate over the string reading character by character. So we do some logic:
If we are in the first string character and it's equals to ' ', go to the next loop iteration
If we are in the first string character and it's different from ' ', means we are starting a word, so counts it and jump to the next loop iteration.
If we reach the second if, means we are not at the first position, so trying to access i - 1 should be valid. Then we just check if the previous char is a blank space and the current one it's not. This means we are starting a new word. So counts it and jump to the next loop iteration.
Another and more simple way to do it is using stringstream:
string s = " Let's count the number of words ";
stringstream ss(s);
string sub;
int word = 0;
while (ss >> sub) {
++word;
}
cout << "No of Words: " << word << endl;
This way you're basically extracting word by word from your string.

Related

Is 'If Else' statement indentation important or not in C++? [duplicate]

This question already has answers here:
If statements without brackets
(3 answers)
Closed 10 months ago.
Does the indentation in an if else statement have any bearing on the execution of the code or is it just something to do for cleaner code?
Example from the book Accelerated C++ written by Andrew Koening:
while(c != cols) {
if(r == pad + 1 && c == pad + 1) {
cout << greet;
c += greet.size();
} else {
if(r == 0 || r == rows - 1 || c == 0 || c == cols - 1)
cout << "*";
else
cout << " ";
++c;
}
}
The prefix increment of c is executed regardless of whether r=0 or not, but I don’t understand why.
If the if statement turns true, an asterisk is printed. If not, a blank space is printed and c is incremented.
That’s how I am reading it, but c gets incremented regardless of what the values of r or c are.
This is what it says in the book, but there isn’t any explanation I could find:
Notice how the different indentation of
++c;
draws attention to the fact that it is executed regardless of whether we are in the
border.

Whitespace does not affect C++ runtime behavior. (Unlike certain other languages, like Python).
I should mention that in your else block, you do not use braces. So, only the first statement (cout << " ";) will be part of the else clause. The subsequent ++c; will execute regardless of the value of r and c.
Note that this last point is subjective, so take it with a grain of salt... As you can see, when braces are omitted from if ... else ... blocks, there is potential for confusion. Some would argue that it leads to more concise code, but many (including myself) would argue that you should always use braces. This is especially important when you work on a large team because code tends to grow over time. I've seen many cases in production code where an if statement was missing the braces and someone added a second line to the if clause without remembering to add braces. This didn't work as expected and caused wasted time debugging fails, just because the braces were omitted.

Both C and C++ are not affected by white space in their interpretation of the code. That does not mean the programmer should not care about its misuse.
The best way to illustrate what the above code actually represents is to explicitly define all of the inferred braces as below. Note that the if statement that had no braces only has one line of code affected by the 'if then' or 'else' clause.
This is one of the reasons that people try to insist on
'good coding practices' to ensure that other people are able to clearly interpret the flow and intent of the programmer.
while(c != cols) {
if(r == pad + 1 && c == pad + 1) {
cout << greet;
c += greet.size();
} else {
if(r == 0 || r == rows - 1 || c == 0 || c == cols - 1) {
cout << "*";
} else {
cout << " ";
}
++c;
}
}

In C++ the length of indentation does not affect the interpretation of the statements. Sometimes whitespace is needed to separate characters, e.g., in int a. Other times it is not needed, e.g. in a=b+c;.
The if statement is defined that after the condition if(condition) can only be one statement.
If we want more statements we have to group them with braces {...}

Unlike Python, C++ does not care about indentation.
But your else applies only on the first line. To apply to a block, it should be within { }
else
{
cout << " ";
++c;
}
Indentation is not your problem here.

Can someone explain this C++ code? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 5 years ago.
Improve this question
I am a new here. I do not understand the if statement: i==0, It eliminates repetition. How it works? Thanks.
vector<string>words;
for (string temp; cin >> temp;)
words.push_back(temp);
cout << "Number of words:" << words.size() << '\n';
sort(words);
for (int i = 0; i < words.size(); ++i)
if (i == 0 || words[i - 1] != words[i])
cout << words[i] << '\n';

If i equals 0, then you can't look words[i-1] because you can't do words[-1]
Furthermore, when you use || operator, if the first expression is true, the second expression is not checked
With i == 0 || words[i - 1] != words[i] you can print your first words because i equals 0 and the expression words[i - 1] != words[i] isn't checked and doesn't crash your program !
then with i different from 0, the first expresion isn't true and the second is checked.
For the unrepetition part :
Your array is sorted, so same words are one after another.
Then you have to check if the previous word isn't the same, you can print the word
How words[i - 1] != words[i] works :
for std::string, operators == and != look for the length of each string, and each character in the string
Comparison operator for std::string
Moreover, words[i-1] look for the previous words, and words[i] for the current one, to compare them.
So here, the expression is true if the two consecutives words aren't the same, in length and letters.
if you have words dog cat cat cat_ in your array, dog is printed first (because of the i == 0 part), the second word cat is printed, then the epression is false because the words are identical ("cat" == "cat"), and finaly, cat_is printed because different from cat

This program is first sorting all the words which are a vector of strings, and prints only unique word.
i==0 means the first word, Since you can't compare the first word with any previous so it will always be unique(from its previous words which doesn't exist)
word[i-1]!=word[i] check if the current word is different from previous the print that word.
|| is a Logical Or operator.

looks like your program prints the first word, and every other word that isn't repeated in a sorted list of words. If you're trying to look for unique words, try using std::unique.

A better way of parsing a string for numbers in brackets?

I've tried searching on here / Google to find a more optimal way of processing an input that I need to handle. This is an example...
[1 5 0 50 100 60] [2 4 1 0 40 50]
The numbers are random but I know how many bracketed sets there are beforehand. Also, I know for certain that the format will always be the same...
6 numbers which are enclosed by brackets
I have something working already where I get the input into a line and I then iterate character by character checking...
1) Outer for loop that accounts for the number of bracketed sets
2) First to see if it is a space, '[', ']'
3) If it isn't, get that number and store it
4) then start checking for space again
5) Store the next etc
6) Till I reach ']' and continue the loop
But I feel like there needs to be a better / cleaner way of handling the parsing.
sample code...
char c = line[position];
while (c == '[' || c == ']' || cc == ' '){
position++;
c = line[position];
}
string firstStr;
while (c != ' '){
firstStr += c;
position++;
c = line[position];
}
first = atoi(firstStr.c_str());
while (c == ' '){
position++;
ch = line[position];
}
string secondStr;
while (c != ' '){
secondStr += c;
position++;
c = line[position];
}
second = atoi(secondStr.c_str());

Yes, I'd say that this is too complicated for the simple reason that the C++ library already contains optimized implementations of all algorithms that are needed here.
std::string line;
That's your input. Now, let's parse it.
#include <algorithm>
auto b=line.begin(), e=line.end();
while ((b=std::find(b, e, '[')) != e)
{
auto n_start=++b;
b=std::find(b, e, ']');
auto your_six_numbers_are_in_here=std::string(n_start, b);
// Now, do whatever you want with your numbers.
}
Since you "know for certain" that your input is valid, most aspects of input validation are no longer an issue, and the above should be sufficient.
The your_six_numbers_are_in_here string may or may not contain leading or trailing spaces. How to get rid of them, and how to extract the actual numbers is a separate task. Now, since you know "for certain" that your input will be valid, then this becomes a simple matter of:
std::istringstream i(your_six_numbers_are_in_here);
int a, b, c, d, e, f;
i >> a >> b >> c >> d >> e >> f;
It goes without saying that if you do not know "for certain" that your input will be valid, additional work will be needed, here.

Beginner difficulty with vectors and while-loops in C++

Update:
So it turns out there were two issues:
The first is that I checked the [k-1] index before I checked k == 0. This was a crash, although mostly fixable, and not the primary issue I posted about.
The primary issue is that the code seems to execute only after I press ctrl+z. Not sure why that would be.
Original:
So, learning from Stroustrup's text in C++ programming, I got to an example on vectors and tried implementing it myself. The gist is that the program user enters a bunch of words, and the program alphabetizes them, and then prints them without repeats. I managed to get working code using a for statement, but one of my initial attempts confuses me as to why this one doesn't work.
To be clear, I'm not asking to improve this code. I already have better, working code. I'm wondering here why the code below doesn't work.
The "error" I get is that the code compiles and runs fine, but when I input words, nothing happens and I'm prompted to input more.
I'm certain there's an obvious mistake, but I've been looking everywhere for the last 8 hours (no exaggeration) just devoted to finding the error on my own. But I can't.
int main() {
vector<string> warray; string wentry; int k = 0;
cout << "Enter words and I'll alphabetize and delete repeats:\n\n";
while (cin >> wentry) warray.push_back(wentry);
sort(warray.begin(), warray.end());
while (k < warray.size()) {
if (warray[k - 1] != warray[k] || k == 0) cout << warray[k] << "\n";
++k;
}
}
My reasoning for why this should work is this: I initialize my array of words, my word entry per input, and a variable to index word output.
Then I have a while statement so that every input is stacked at the end of the array.
Then I sort my array.
Then I use my index which starts at 0 to output the 0th item of the array.
Then so long as there are words in the array not yet reached by the index, the index will check that the word is not a repeat of the prior index position, and then print if not.
No matter what whappens, the index is incremented by one, and the check begins again.
Words are printed until the index runs through and checks all the words in the array.
Then we wait for new entries, although this gets kind of screwy with the above code, since the sorting is done before the checking. This is explicitly not my concern, however. I only intend for this to work once.

To end the cycle of input you need to insert EOF character which is ctrl+d. However, there are other problems in your code. You have k = 0 to start with so the moment you will try warray[k - 1] your code will crash.

At the point where you take
warray[k - 1]
for the first time, k is zero, so you want to get the warray value at index -1, which is not necessarily defined in memory (and even if, I wouldn't do this anyway). So as it compiles, I guess the address is defined in your case by accident.
I would try simply reversing the OR combination in your if-condition:
if (k == 0 || warray[k - 1] != warray[k])
thus for the first iteration (k == 0) it won't check the second condition because the first condition is then already fulfilled.
Does it work then?

You're stuck in the while loop because you don't have a way of breaking out of it. That being said, you can use Ctrl + d (or use Ctrl + z if executing on windows in the command prompt) to break out of the loop and continue executing the code.
As for while loop at the bottom which prints out the sorted vector of values, your program is going to crash as user902384 suggested because your program will first check for warray[k - 1].
Ideally, you want to change the last part of your program to:
while (k < warray.size())
{
if (k == 0 || warray[k - 1] != warray[k])
cout << warray[k] << "\n";
++k;
}
This way, the k == 0 check passes and your program will skip checking warray[k - 1] != warray[k] (which would equal warray[-1] != warray[0] when k=0).

You just needed to reverse:
if (warray[k - 1] != warray[k] || k == 0)
to
if (k == 0 || warray[k - 1] != warray[k] )
for terminating this condition if k = 0.
An alternative.
Although it can termed as a bit off topic, considering you want to work with std::vector<>, but std::set<> is an excellent container which satisfies your current two conditions:
Sort the strings in alphabetical order.
Delete all the repetitions.
Include <set> in your .cpp file, and create a set object, insert all the std::string and iterate through the set to get your ordered, duplicate-free strings!
The code:
int main() {
//Define a set container.
set<string> s;
//A temporary string variable.
string temp;
//Inserting strings into the set.
while (cin >> temp) s.insert(temp);
//Create a set<int> iterator.
set<string>::iterator it;
//Scanning the set
for(it = s.begin(); it != s.end(); ++it)
{
//To access the element pointed by the iterator,
//use *it.
cout<<*it<<endl;
}
return 0;
}
I just recommended this container, because you will study set in Stroustrup's text, and it is very easy and convenient instead of laboring over a vector.

Loop efficiency - C++

Beginners question, on loop efficiency. I've started programming in C++ (my first language) and have been using 'Principles and Practice Using C++' by Bjarne Stroustrup. I've been making my way through the earlier chapters and have just been introduced to the concept of loops.
The first exercise regarding loops asks of me the following:
The character 'b' is char('a'+1), 'c' is char('a'+2), etc. Use a loop to write out
a table of characters with their corresponding integer values:
a 97, b 98, ..., z 122
Although, I used uppercase, I created the following:
int number = 64; //integer value for # sign, character before A
char letter = number;//converts integer to char value
int i = 0;
while (i<=25){
cout << ++letter << "\t" << ++number << endl;
++i;
}
Should I aim for only having 'i' be present in a loop or is it simply not possible when converting between types? I can't really think of any other way the above can be done apart from having the character value being converted to it's integer counterpart(i.e. opposite of current method) or simply not having the conversion at all and have letter store '#'.

Following on from jk you could even use the letter itself in the loop (letter <= 'z'). I'd also use a for loop but that's just me.
for( char letter = 'a'; letter <= 'z'; ++letter )
std::cout << letter << "\t" << static_cast<int>( letter ) << std::endl;

You should aim for clarity first and you try to micro-optimize instead. You could better rewrite that as a for loop:
const int offsetToA = 65;
const int numberOfCharacters = 26;
for( int i = 0; i < numberOfCharacters; ++i ) {
const int characterValue = i + offsetToA;
cout << static_cast<char>( characterValue ) << characterValue << endl;
}
and you can convert between different types - that's called casting (the static_cast construct in the code above).

That's not a bad way to do it, but you can do it with only one loop variable like this:
char letter = 65;
while(letter <= 65+25){
printf("%c\t%d\n", letter, letter);
++letter;
}

there is nothing particularly inefficient about the way you are doing it but it certainly is possible to just convert between chars and ints (a char is an integer type). this would mean you only need to store 1 counter rather than the 3 (i, letter + number) you curently have
also, for looping from a fixed start to end a 'for' loop is perhaps more idiomatic (though its possible you havent met this yet!)

If you are concerned about the efficiency of your loop, I would urge you to try this:
Get this code compiled and running under an IDE, such as Visual Studio, and set a break point at the beginning. When you get there, switch to the disassembly view (instruction view) and start hitting the F11 (single-step) key, and keep a mental count of how many times you are hitting it.
You will see that it enters the loop, compares i against 25, and then starts doing the code for the cout line. That involves incrementing letter, and then going into the << routine for cout. It does a number of things in there, possibly going deeper into subroutines, etc., and finally comes back out, returning an object. Then it pushes "\t" as an argument and passes it to that object, and goes back in and does all the stuff it did before. Then it takes number, increments it, and passes it to the cout::<< routine that accepts an integer, calls a function to convert it to a string (which involves a loop), then does all the stuff it did before to loop that string into the output buffer and return.
Tired? You're not done yet. The endl has to be output, and when that happens, not only does it put "\n" in the buffer, but it calls the system routine to flush that buffer to the file or console where you are sending the I/O. You probably can't F11 into that, but rest assured it takes lots of cycles and doesn't return until the I/O is done.
By now, your F11-count should be in the vicinity of several thousand, more or less.
Finally, you come out and get to the ++i statement, which takes 1 or 2 instructions, and jumps back to the top of the loop to start the next iteration.
NOW, are you still worried about the efficiency of the loop?
There's an easier way to make this point, and it's just as instructive. Wrap an infinite loop around your entire code so it runs forever. While it's running, hit the "pause" button in the IDE, and look at the call stack. (This is called a "stackshot".) If you do this several times you get a good idea of how it spends time. Here's an example:
NTDLL! 7c90e514()
KERNEL32! 7c81cbfe()
KERNEL32! 7c81cc75()
KERNEL32! 7c81cc89()
MSVCRTD! 1021bed3()
MSVCRTD! 1021bd59()
MSVCRTD! 10218833()
MSVCRTD! 1023a500()
std::_Fputc() line 42 + 18 bytes
std::basic_filebuf<char,std::char_traits<char> >::overflow() line 108 + 25 bytes
std::basic_streambuf<char,std::char_traits<char> >::sputc() line 85 + 94 bytes
std::ostreambuf_iterator<char,std::char_traits<char> >::operator=() line 304 + 24 bytes
std::num_put<char,std::ostreambuf_iterator<char,std::char_traits<char> > >::_Putc() line 633 + 32 bytes
std::num_put<char,std::ostreambuf_iterator<char,std::char_traits<char> > >::_Iput() line 615 + 25 bytes
std::num_put<char,std::ostreambuf_iterator<char,std::char_traits<char> > >::do_put() line 481 + 71 bytes
std::num_put<char,std::ostreambuf_iterator<char,std::char_traits<char> > >::put() line 444 + 44 bytes
std::basic_ostream<char,std::char_traits<char> >::operator<<() line 115 + 114 bytes
main() line 43 + 96 bytes
mainCRTStartup() line 338 + 17 bytes
I did this a bunch of times, and not ONCE did it stop in the code for the outer i<=25 loop. So optimizing that loop is like someone's great metaphor: "getting a haircut to lose weight".

Since no one else mentioned it: Having a fixed amount of iterations, this is also a candidate for post-condition iteration with do..while.
char letter = 'a';
do {
std::cout << letter << "\t" << static_cast<int>( letter ) << std::endl;
} while ( ++letter <= 'z' );
However, as shown in Patrick's answer the for idiom is often shorter (in number of lines in this case).

You can promote char to int...
//characters and their corresponding integer values
#include"../../std_lib_facilities.h"
int main()
{
char a = 'a';
while(a<='z'){
cout<<a<<'\t'<<a*1<<'\n'; //a*1 => char operand promoted to integer!
++a;
}
cout<<endl;
}

Incrementing three separate variables is probably a little confusing. Here's a possibility:
for (int i = 0; i != 26; ++i)
{
int chr = 'a' + i;
std::cout << static_cast<char>(chr) << ":\t" << chr << std::endl;
}
Note that using a for loop keeps all the logic of setting up, testing and incrementing the loop variable in one place.

At this point, I wouldn't worry about micro-optimizations such as an efficient way to write a small loop like this. What you have allows a for loop to do the job nicely, but if you are more comfortable with while, you should use that. But I am not sure if that is your question.
I don't think you have understood the question properly. You are writing the code, knowing that 'A' is 65. The whole point of the exercise is to print the value of 'A' to 'Z' on your system, without knowing what value they have.
Now, to get an integer value for a character c, you can do: static_cast<int>(c). I believe that is what you're asking.
I haven't written any code because it should be more fun for you to do so.
Question for the experts: In C, I know that 'a'...'z' need not have continuous values (same for 'A'...'Z'). Is the same true for C++? I would think so, but then it seems highly unlikely that Stroustrup's book assumes that.

thanks for the help.. all i wrote down was
int main()
{
char letter = 96;
int number = letter;
int i = 0;
while(i <26)
{
cout <<++letter <<":" <<++numbers <<" ";
++i;
}
works great...and pretty simple to understand now.

I've tried this and worked fine:
char a = 'a';
int i = a; //represent char a as an int
while (a <= 'z') {
cout << a << '\t' << i << '\n';
++a;
++i;
}

Programming Principles and Practice using C++ (2nd Edition) | Bjarne Stroustrup
Chapter 4 - Computation (Try this #3 - Character Loop)
The character 'b' is char('a'+1), 'c' is char('a'+2), etc. Use
a loop to write out a table of characters with their corresponding integer values:
a 97 b 98 . . . z 122
This is how I solved the problem (from 10 years ago :D)
I am a freshmen btw, so I just started reading this book now... just want to input my solution
#include <iostream>
using namespace std;
int main()
{
int i = 0;
while (i < 26) {
cout << char('a' + i) << '\t' << int(97 + i) << '\n';
++i;
}
}
I solved it by analyzing first the problem which is knowing the char value of 'a' which is 97 up to 'z'. According to this ASCII table
https://www.ascii-code.com/#:~:text=ASCII%20printable%20characters%20%28character%20code%2032-127%29%20Codes%2032-127,digits%2C%20punctuation%20marks%2C%20and%20a%20few%20miscellaneous%20symbols.
Now, we have a clearer understanding on how to solve the said problem.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Finding words in a (weird) string in C++ - c++

Related

Is 'If Else' statement indentation important or not in C++? [duplicate]

Can someone explain this C++ code? [closed]

A better way of parsing a string for numbers in brackets?

Beginner difficulty with vectors and while-loops in C++

Loop efficiency - C++

Categories

Resources