Simple Sentence Reverser in C++ - c++

I'm trying to build a program to solve a problem in a text book I bought recently and it's just driving me crazy.
I have to built a sentence reverser so I get the following:
Input = "Do or do not, there is no try."
Output = "try. no is there not, do or Do"
Here's what I've got so far:
void ReverseString::reversalOperation(char str[]) {
char* buffer;
int stringReadPos, wordReadPos, writePos = 0;
// Position of the last character is length -1
stringReadPos = strlen(str) - 1;
buffer = new char[stringReadPos+1];
while (stringReadPos >= 0) {
if (str[stringReadPos] == ' ') {
wordReadPos = stringReadPos + 1;
buffer[writePos++] = str[stringReadPos--];
while (str[wordReadPos] != ' ') {
buffer[writePos] = str[wordReadPos];
writePos++;
wordReadPos++;
}
} else {
stringReadPos--;
}
}
cout << str << endl;
cout << buffer << endl;
}
I was sure I was on the right track but all I get for an output is the very first word ("try.") I've been staring at this code so long I can't make any headway. Initially I was checking in the inner while look for a '/0' character as well but it didn't seem to like that so I took it out.

Unless you're feeling masochistic, throw your existing code away, and start with std::vector and std::string (preferably an std::vector<std::string>). Add in std::copy with the vector's rbegin and rend, and you're pretty much done.

This is utter easy in C++, with help from the standard library:
std::vector< std::string > sentence;
std::istringstream input( str );
// copy each word from input to sentence
std::copy(
(std::istream_iterator< std::string >( input )), std::istream_iterator< std::string >()
, std::back_inserter( sentence )
);
// print to cout sentence in reverse order, separated by space
std::copy(
sentence.rbegin(), sentence.rend()
, (std::ostream_iterator< std::string >( std::cout, " " ))
);

In the interest of science, I tried to make your code work as is. Yeah, it's not really the C++ way to do things, but instructive nonetheless.
Of course this is only one of a million ways to get the job done. I'll leave it as an exercise for you to remove the trailing space this code leaves in the output ;)
I commented my changes with "EDIT".
char* buffer;
int stringReadPos, wordReadPos, writePos = 0;
// Position of the last character is length -1
stringReadPos = strlen(str) - 1;
buffer = new char[stringReadPos+1];
while (stringReadPos >= 0) {
if ((str[stringReadPos] == ' ')
|| (stringReadPos == 0)) // EDIT: Need to check for hitting the beginning of the string
{
wordReadPos = stringReadPos + (stringReadPos ? 1 : 0); // EDIT: In the case we hit the beginning of the string, don't skip past the space
//buffer[writePos++] = str[stringReadPos--]; // EDIT: This is just to grab the space - don't need it here
while ((str[wordReadPos] != ' ')
&& (str[wordReadPos] != '\0')) // EDIT: Need to check for hitting the end of the string
{
buffer[writePos] = str[wordReadPos];
writePos++;
wordReadPos++;
}
buffer[writePos++] = ' '; // EDIT: Add a space after words
}
stringReadPos--; // EDIT: Decrement the read pos every time
}
buffer[writePos] = '\0'; // EDIT: nul-terminate the string
cout << str << endl;
cout << buffer << endl;

I see the following errors in your code:
the last char of buffer is not set to 0 (this will cause a failure in cout<
in the inner loop you have to check for str[wordReadPos] != ' ' && str[wordReadPos] != 0 otherwise while scanning the first word it will never find the terminating space

Since you are using a char array, you can use C string library. It will be much easier if you use strtok: http://www.cplusplus.com/reference/clibrary/cstring/strtok/
It will require pointer use, but it will make your life much easier. Your delimiter will be " ".

What where the problems with your code and what are more cplusplusish ways of doing is yet well written. I would, however, like to add that the methodology
write a function/program to implement algorithm;
see if it works;
if it doesn't, look at code until you get where the problem is
is not too productive. What can help you resolve this problem here and many other problems in the future is the debugger (and poor man's debugger printf). It will make you able to see how your program actually works in steps, what happens to the data etc. In other words, you will be able to see which parts of it works as you expect and which behaves differently. If you're on *nix, don't hesitate to try gdb.

Here is a more C++ version. Though I think the simplicity is more important than style in this instance. The basic algorithm is simple enough, reverse the words then reverse the whole string.
You could write C code that was just as evident as the C++ version. I don't think it's necessarily wrong to write code that isn't ostentatiously C++ here.
void word_reverse(std::string &val) {
size_t b = 0;
for (size_t i = 0; i < val.size(); i++) {
if (val[i] == ' ') {
std::reverse(&val[b], &val[b]+(i - b));
b = ++i;
}
}
std::reverse(&val[b], &val[b]+(val.size() - b));
std::reverse(&val[0], &val[0]+val.size());
}
TEST(basic) {
std::string o = "Do or do not, there is no try.";
std::string e = "try. no is there not, do or Do";
std::string a = o;
word_reverse(a);
CHECK_EQUAL( e , a );
}
Having a multiple, leading, or trailing spaces may be degenerate cases depending on how you actually want them to behave.

Related

C++ string not printed

I have the next code:
void addContent ( const std::string& message ) {
std::string newMessage;
for ( int i = 0, remainder = textCapacity - currentText; i < remainder; i++ ) {
newMessage[i] = message[i];
std::cout << newMessage; //here nothing is printed
}
}
But nothing is printed.
Only if I change newMessage to newMessage[i] everything is good. And I dont undestand why?
newMessage is an empty std::string. Doing [i] to it is accessing invalid memory. The string is always empty, and you're just writing to invalid memory. That's a recipe for disaster, and you're (un)lucky it's not crashing on you.
I'm not sure what message[i] is, but you probably want newMessage[i] = message[i]. But you might as well skip the temporary newMessage variable and just print out message[i] itself.
newMessage is an empty string, so nothing will be printed. Also, std::cout is buffered, so in order to flush the buffer you should call std::endl or std::flush
I would rather change from this:
newMessage[i] = message[i];
to this:
newMessage += message[i];
And when printing:
std::cout << newMessage<<std::endl;
Using [i] on empty string is looking for trouble because your entering invalid out of bound memory. Sometimes it will do nothing sometimes your program will crash.
As Cornstalks said, you have an out-of-bounds access.
More importantly, the code is way too complex for the task. Don't use a manual loop to partially copy one std::string to another. To copy a part of message to newMessage, use substr on message and assign:
newMessage = message.substr(from_index, number_of_chars);
or the iterator-based stuff:
std::string newMessage(message.begin() + from_index, message.begin() + to_index);
The latter is more efficient. So you want
std::string newMessage(message.begin(), message.begin() + (textCapacity - currentText));
Using that string as newMessage[i] is implying that it an array of strings. Replace that line with std::string newMessage[textCapacity];.

How to count whitespace occurences in a string in c++

I have a project for my advanced c++ class that's supposed to do a number of things, but I'm trying to focus on this function first, because after it works I can tweak it to fulfill the other needs. This function searches through a file and performs a word count by counting the number of times ' ' appears in the document. Maybe not accurate, but it'll be a good starting place. Here's the code I have right now:
void WordCount()
{
int count_W = 0; //Varaible to store word count, will be written to label
int i, c = 0; //i for iterator
ifstream fPath("F:\Project_1_Text.txt");
FileStream input( "F:\Project_1_Text.txt", FileMode::Open, FileAccess::Read );
StreamReader fileReader( %input );
String ^ line;
//char ws = ' ';
array<Char>^ temp;
input.Seek( 0, SeekOrigin::Begin );
while ( ( line = fileReader.ReadLine() ) != nullptr )
{
Console::WriteLine( line );
c = line->Length;
//temp = line->ToCharArray();
for ( i = 0; i <= c; i++)
{
if ( line[i] == ' ' )
count_W++;
}
//line->ToString();
}
//Code to write to label
lblWordCount->Text = count_W.ToString();
}
All of this works except for one problem. When I try to run the program, and open the file, I get an error that tells me the Index is out of bounds. Now, I know what that means, but I don't get how the problem is occurring. And, if I don't know what's causing the problem, I can't fix it. I've read that it is possible to search through a string with a for loop, and of course that also holds true for a char array, and there is code in there to perform that conversion, but in both cases I get the same error. I know it is reading through the file correctly, because the final program also has to perform a character count (which is working), and it read back the size of each line in the target document perfectly from start to finish. Anyway, I'm out of ideas, so I thought I'd consult a higher power. Any ideas?
Counting whitespace is simple:
int spaces = std::count_if(s.begin(), s.end(),
[](unsigned char c){ return std::isspace(c); });
Two notes, though:
std::isspace() cannot be used immediately with char because char may be signed and std::isspace() takes an int which is required to be positive.
This counts the number of spaces, not the number of words (or words - 1): words may be separated by sequences of spaces consisting of more than one consecutive space.
It could be your loop. You're going from i=0 to i=c, but i=c is too far. You should go to i=c-1:
for ( i=0; i<c; i++)

C++: Removing all asterisks from a string where the asterisks are NOT multiplication symbols

So basically, I might have some string that looks like: "hey this is a string * this string is awesome 97 * 3 = 27 * this string is cool".
However, this string might be huge. I'm trying to remove all the asterisks from the string, unless that asterisk appears to represent multiplication. Efficiency is somewhat important here, and I'm having trouble coming up with a good algorithm to remove all the non-multiplication asterisks from this.
In order to determine whether an asterisk is for multiplication, I can obviously just check whether it's sandwiched in between two numbers.
Thus, I was thinking I could do something like (pseudocode):
wasNumber = false
Loop through string
if number
set wasNumber = true
else
set wasNumber = false
if asterisk
if wasNumber
if the next word is a number
do nothing
else
remove asterisk
else
remove asterisk
However, that^ is ugly and inefficient on a huge string. Can you think of a better way to accomplish this in C++?
Also, how could I actually check whether a word is a number? It's allowed to be a decimal. I know there's a function to check if a character is a number...
Fully functioning code:
#include <iostream>
#include <string>
using namespace std;
string RemoveAllAstericks(string);
void RemoveSingleAsterick(string&, int);
bool IsDigit(char);
int main()
{
string myString = "hey this is a string * this string is awesome 97 * 3 = 27 * this string is cool";
string newString = RemoveAllAstericks(myString);
cout << "Original: " << myString << "\n";
cout << "Modified: " << newString << endl;
system("pause");
return 0;
}
string RemoveAllAstericks(string s)
{
int len = s.size();
int pos;
for(int i = 0; i < len; i++)
{
if(s[i] != '*')
continue;
pos = i - 1;
char cBefore = s[pos];
while(cBefore == ' ')
{
pos--;
cBefore = s[pos];
}
pos = i + 1;
char cAfter = s[pos];
while(cAfter == ' ')
{
pos++;
cAfter = s[pos];
}
if( IsDigit(cBefore) && IsDigit(cAfter) )
RemoveSingleAsterick(s, i);
}
return s;
}
void RemoveSingleAsterick(string& s, int i)
{
s[i] = ' '; // Replaces * with a space, but you can do whatever you want
}
bool IsDigit(char c)
{
return (c <= 57 && c >= 48);
}
Top level overview:
Code searches the string until it encounters an *. Then, it looks at the first non-whitespace character before AND after the *. If both characters are numeric, the code decides that this is a multiplication operation, and removes the asterick. Otherwise, it is ignored.
See the revision history of this post if you'd like other details.
Important Notes:
You should seriously consider adding boundary checks on the string (i.e. don't try to access an index that is less than 0 or greater than len
If you are worried about parentheses, then change the condition that checks for whitespaces to also check for parentheses.
Checking whether every single character is a number is a bad idea. At the very least, it will require two logical checks (see my IsDigit() function). (My code checks for '*', which is one logical operation.) However, some of the suggestions posted were very poorly thought out. Do not use regular expressions to check if a character is numeric.
Since you mentioned efficiency in your question, and I don't have sufficient rep points to comment on other answers:
A switch statement that checks for '0' '1' '2' ..., means that every character that is NOT a digit, must go through 10 logical operations. With all due respect, please, since chars map to ints, just check the boundaries (char <= '9' && char >= '0')
You can start by implementing the slow version, it could be much faster than you think. But let's say it's too slow. It then is an optimization problem. Where does the inefficiency lies?
"if number" is easy, you can use a regex or anything that stops when it finds something that is not a digit
"if the next word is a number" is just as easy to implement efficiently.
Now, it's the "remove asterisk" part that is an issue to you. The key point to notice here is that you don't need to duplicate the string: you can actually modify it in place since you are only removing elements.
Try to run through this visually before trying to implement it.
Keep two integers or iterators, the first one saying where you are currently reading your string, and the second one saying where you are currently writing your string. Since you only erase stuff, the read one will always be ahead of the writing one.
If you decide to keep the current string, you just need to advance each of your integers/iterators one by one, and copying accordingly. If you don't want to keep it, just advance the reading string! Then you only have to cut the string by the amount of asterisks you removed. The complexity is simply O(n), without any additional buffer used.
Also note that your algorithm would be simpler (but equivalent) if written like this:
wasNumber = false
Loop through string
if number
set wasNumber = true
else
set wasNumber = false
if asterisk and wasNumber and next word is a number
do nothing // using my algorithm, "do nothing" actually copies what you intend to keep
else
remove asterisk
I found your little problem interesting and I wrote (and tested) a small and simple function that would do just that on a std::string. Here u go:
// TestStringsCpp.cpp : Defines the entry point for the console application.
//
#include "stdafx.h"
#include <string>
#include <iostream>
using namespace std;
string& ClearAsterisk(string& iString)
{
bool bLastCharNumeric = false;
string lString = "0123456789";
for (string::iterator it = iString.begin(); it != iString.end() ; ++it) {
switch (*it) {
case ' ': break;//ignore whitespace characters
case '*':
if (bLastCharNumeric) {
//asterisk is preceded by numeric character. we have to check if
//the following non space character is numeric also
for (string::iterator it2 = it + 1; it2 != iString.end() ; ++it2) {
if (*it2 != ' ') {
if (*it2 <= '9' && *it2 >= '0') break;
else iString.erase(it);
break; //exit current for
}
}
}
else iString.erase(it);;
break;
default:
if (*it <= '9' && *it >= '0') bLastCharNumeric= true;
else bLastCharNumeric = false; //reset flag
}
}
return iString;
}
int _tmain(int argc, _TCHAR* argv[])
{
string testString = "hey this is a string * this string is awesome 97 * 3 = 27 * this string is cool";
cout<<ClearAsterisk(testString).c_str();
cin >> testString; //this is just for the app to pause a bit :)
return 0;
}
It will work perfectly with your sample string but it will fail if you have a text like this: "this is a happy 5 * 3day menu" because it checks only for the first nonspace character after the '*'. But frankly I can't immagine a lot of cases you would have this kind of construct in a sentence.
HTH,JP.
A regular expression wouldn't necessarily be any more efficient, but it would let you rely on somebody else to do your string parsing and manipulation.
Personally, if I were worried about efficiency, I would implement your pseudocode version while limiting needless memory allocations. I might even mmap the input file. I highly doubt that you'll get much faster than that.

Read file and extract certain part only

ifstream toOpen;
openFile.open("sample.html", ios::in);
if(toOpen.is_open()){
while(!toOpen.eof()){
getline(toOpen,line);
if(line.find("href=") && !line.find(".pdf")){
start_pos = line.find("href");
tempString = line.substr(start_pos+1); // i dont want the quote
stop_pos = tempString .find("\"");
string testResult = tempString .substr(start_pos, stop_pos);
cout << testResult << endl;
}
}
toOpen.close();
}
What I am trying to do, is to extrat the "href" value. But I cant get it works.
EDIT:
Thanks to Tony hint, I use this:
if(line.find("href=") != std::string::npos ){
// Process
}
it works!!
I'd advise against trying to parse HTML like this. Unless you know a lot about the source and are quite certain about how it'll be formatted, chances are that anything you do will have problems. HTML is an ugly language with an (almost) self-contradictory specification that (for example) says particular things are not allowed -- but then goes on to tell you how you're required to interpret them anyway.
Worse, almost any character can (at least potentially) be encoded in any of at least three or four different ways, so unless you scan for (and carry out) the right conversions (in the right order) first, you can end up missing legitimate links and/or including "phantom" links.
You might want to look at the answers to this previous question for suggestions about an HTML parser to use.
As a start, you might want to take some shortcuts in the way you write the loop over lines in order to make it clearer. Here is the conventional "read line at a time" loop using C++ iostreams:
#include <fstream>
#include <iostream>
#include <string>
int main ( int, char ** )
{
std::ifstream file("sample.html");
if ( !file.is_open() ) {
std::cerr << "Failed to open file." << std::endl;
return (EXIT_FAILURE);
}
for ( std::string line; (std::getline(file,line)); )
{
// process line.
}
}
As for the inner part the processes the line, there are several problems.
It doesn't compile. I suppose this is what you meant with "I cant get it works". When asking a question, this is the kind of information you might want to provide in order to get good help.
There is confusion between variable names temp and tempString etc.
string::find() returns a large positive integer to indicate invalid positions (the size_type is unsigned), so you will always enter the loop unless a match is found starting at character position 0, in which case you probably do want to enter the loop.
Here is a simple test content for sample.html.
<html>
<a href="foo.pdf"/>
</html>
Sticking the following inside the loop:
if ((line.find("href=") != std::string::npos) &&
(line.find(".pdf" ) != std::string::npos))
{
const std::size_t start_pos = line.find("href");
std::string temp = line.substr(start_pos+6);
const std::size_t stop_pos = temp.find("\"");
std::string result = temp.substr(0, stop_pos);
std::cout << "'" << result << "'" << std::endl;
}
I actually get the output
'foo.pdf'
However, as Jerry pointed out, you might not want to use this in a production environment. If this is a simple homework or exercise on how to use the <string>, <iostream> and <fstream> libraries, then go ahead with such a procedure.

tokenizing and converting to pig latin

This looks like homework stuff but please be assured that it isn't homework. Just an exercise in the book we use in our c++ course, I'm trying to read ahead on pointers..
The exercise in the book tells me to split a sentence into tokens and then convert each of them into pig latin then display them..
pig latin here is basically like this: ball becomes allboy in piglatin.. boy becomes oybay.. take the first letter out, put it at the end then add "ay"..
so far this is what i have:
#include <iostream>
using std::cout;
using std::cin;
using std::endl;
#include <cstring>
using std::strtok;
using std::strcat;
using std::strcpy;
void printPigLatin( char * );
int main()
{
char sentence[500];
char *token;
cout << "Enter string to tokenize and convert: ";
cin.getline( sentence, 500 );
token = strtok( sentence, " " );
cout << "\nPig latin for each token will be: " << endl;
while( token != NULL )
{
printPigLatin( token );
token = strtok( NULL, " " );
}
return 0;
}
void printPigLatin( char *word )
{
char temp[50];
for( int i = 0; *word != '\0'; i++ )
{
temp[i] = word[i + 1];
}
strcat( temp, "ay" );
cout << temp << endl;
}
I understand the tokenizing part quite clearly but I'm not sure how to do the pig latin.. i tried to start by simply adding "ay" to the token and see what the results will be .. not sure why the program goes into an infinite loop and keeps on displaying "ayay" .. any tips?
EDIT: this one works fine now but im not sure how to add the first letter of the token before adding the "ay"
EDIT: this is how i "see" it done but not sure how to correctly implement it ..
You're running over your input string with strcat. You need to either create a new string for each token, copying the token and "ay", or simply print the token and then "ay". However, if you're using C++ why not use istream iterators and STL algorithms?
To be honest, I severly doubt the quality of the C++ book, judging from your example. The “basic stuff” in C++ isn't the C pointer style programming. Rather, it's applying high-level library functionality. As “On Freund” pointed out, the C++ standard library provides excellent features to tackle your task. You might want to search for recommendations of better C++ books.
Concerning the problem: your printPigLatin could use the existing function strcpy (or better: strncpy which is safer in regards to buffer overflows). Your manual copy omits the first character from the input because you're using the i + 1st position. You also have a broken loop condition which always tests the same (first) character. Additionally, this should result in an overflow anyway.
As the people before me pointed out, there are several other methods of achieving what you want to do.
However, the actual problem with your code seems to be the use of strcat, I see that you changed it a bit in the edit. Here is an explanation of why the initial one did not work char* and size issues
Basically, the pointer does not allocate enough memory to add the "ay" to the string provided. If you create a pointer using the technique shown in the link, it should work fine.
I got your program to work, taking the strcat out and using
cout << word << "ay" << endl
Your loop is infinite because of *word != '\0'.
The word pointer is not changed at any time in the loop.
This seemed to have worked:
void printPigLatin( char *word )
{
cout << word + 1 << word[0] << "ay" << endl;
}
Just not sure if it's a good idea to do that.