C++ output duplicating lines when there is a '\n' - c++

I'm having some trouble with this member function of a class. Basically, it's meant to translate words to a different language while maintaining the same punctuation and spaces. lineToTranslate is an input argument which is an array of words, spaces and punctuation. Each word within this has to be taken out of the line individually and translated using the dict.translate() function. This is working properly.
However, the trouble is that when there is are multiple new lines, the previous line of words are output. Also whitespaces are not fully looked after. When there is more than one space in a sentence, only one space is output. Any idea's on where I might be going wrong? Any help would really be appreciated.
Updated code entered with most errors fixed. The only issue I'm having now is that spaces aren't added as needed between words. Where there are 2 blank spaces in a row, one blank space is being entered but where there is one blank space, none is being entered and the words areoutputlikethis.
int len = strlen(lineToTranslate);
string strComplete = "";
const char *cs;
for (int x = 0; x < len; x++)
{
if (!isspace(lineToTranslate[x]))
{
char temp[MAX_WORD_LEN];
int j = 0;
while(lineToTranslate[x] != ' ' && lineToTranslate[x] != '\t' && lineToTranslate[x] != '\n')
{
temp[j] = lineToTranslate[x];
x++;
j++;
}
temp[j] = '\0';
char returned[MAX_WORD_LEN];
if(temp[0] != '\0')
{
dict.translate(returned, temp);
strComplete = strComplete + returned;
}
}
else
{
strComplete = strComplete + lineToTranslate[x];
x++;
}
}
cs = strComplete.c_str();
strcpy(translatedLine, cs);

When your for loop is iterating for spaces or punctuation which dnt qualify to enter while loop you are still executing strComplete = strComplete + returned; which is appending \0 in the middle without any reason, so you have output string like -- <space>\0<space>\0
Please see Default values in array
So solution is to put strComplete = strComplete + returned; inside the if. Your array is uninitialized when it does not enter if(temp[0] != '\0') so you should not be appending returned.
Next....the two lines below should be out of the for loop, since you want the final result of strComplete to be copied to translatedLine instead of each iteration.
cs = strComplete.c_str();
strcpy(translatedLine, cs);

For whitespace I can say that for the first whitespace, it will check the if condition, then the iteration will terminate and we go back to the for loop, x is incremented and now lineToTranslate[x]=' '. Correct?
Okay so the while loop never runs. If condition
if (temp[0]!='\0')
is satisfied.
So now what does return have in store? It isn't initialised. Ur still appending it.
Maybe I am not much help but this is what I figured out.
Try debugging.

Related

Why does getline behave weirdly after 3 newlines?

I'll preface this by saying I'm relatively new to posting questions, as well as C++ in general, my title is a little lame as it doesn't really specifically address the problem I am dealing with, however I couldn't really think of another way to word it, so any suggestions on improving the title is appreciated.
I am working on a relatively simple function which is supposed to get a string using getline, and read the spaces and/or newlines in the string so that it can output how many words have been entered. After reaching the character 'q' it's basically supposed to stop reading in characters.
void ReadStdIn2() {
std::string userInput;
const char *inputArray = userInput.c_str();
int count = 0;
getline(std::cin, userInput, 'q');
for (int i = 0; i < strlen(inputArray); i++){
if ((inputArray[i] == ' ') || (inputArray[i] == '\n')){
count += 1;
}
}
std::cout << count << std::endl;
}
I want to be able to enter multiple words, followed by newlines, and have the function accurately display my number of words. I can't figure out why but for some reason after entering 3 newlines my count goes right back to 0.
For example, if I enter:
hello
jim
tim
q
the function works just fine, and returns 3 just like I expect it to. But if I enter
hello
jim
tim
bill
q
the count goes right to 0. I'm assuming this has something to do with my if statement but I'm really lost as to what is wrong, especially since it works fine up until the 3rd newline. Any help is appreciated
The behaviour of the program is undefined. Reading input into std::string potentially causes its capacity to increase. This causes pointers into the string to become invalid. Pointers such as inputArray. You then later attempt to read through the invalid pointer.
P.S. calculating the length of the string with std::strlen in every iteration of the loop is not a good idea. It is possible to get the size without calculation by using userInput.size().
To fix both issues, simply don't use inputArray. You don't need it:
for (int i = 0; i < userInput.size(); i++){
if ((userInput[i] == ' ') || (userInput[i] == '\n')){
...

Split a string in C++ after a space, if more than 1 space leave it in the string

I need to split a string by single spaces and store it into an array of strings. I can achieve this using the fonction boost:split, but what I am not being able to achieve is this:
If there is more than one space, I want to integrate the space in the vector
For example:
(underscore denotes space)
This_is_a_string. gets split into: A[0]=This A[1]=is A[2]=a A[3]=string.
This__is_a_string. gets split into: A[0]=This A[1] =_is A[2]=a A[4]=string.
How can I implement this?
Thanks
For this, you can use a combination of the find and substr functions for string parsing.
Suppose there was just a single space everywhere, then the code would be:
while (str.find(" ") != string::npos)
{
string temp = str.substr(0,str.find(" "));
ans.push_back(temp);
str = str.substr(str.find(" ")+1);
}
The additional request you have raised suggests that we call the find function after we are sure that it is not looking at leading spaces. For this, we can iterate over the leading spaces to count how many there are, and then call the find function to search from thereon. To use the find function from say after x positions (because there are x leading spaces), the call would be str.find(" ",x).
You should also take care of corner cases such as when the entire string is composed of spaces at any point. In that case the while condition in the current form will not terminate. Add the x parameter there as well.
This is by no means the most elegant solution, but it will get the job done:
void bizarre_string_split(const std::string& input,
std::vector<std::string>& output)
{
std::size_t begin_break = 0;
std::size_t end_break = 0;
// count how many spaces we need to add onto the start of the next substring
std::size_t append = 0;
while (end_break != std::string::npos)
{
std::string temp;
end_break = input.find(' ', begin_break);
temp = input.substr(begin_break, end_break - begin_break);
// if the string is empty it is because end_break == begin_break
// this happens because the first char of the substring is whitespace
if (!temp.empty())
{
std::string temp2;
while (append)
{
temp2 += ' ';
--append;
}
temp2 += temp;
output.push_back(temp2);
}
else
{
++append;
}
begin_break = end_break + 1;
}
}

How to count whitespace occurences in a string in c++

I have a project for my advanced c++ class that's supposed to do a number of things, but I'm trying to focus on this function first, because after it works I can tweak it to fulfill the other needs. This function searches through a file and performs a word count by counting the number of times ' ' appears in the document. Maybe not accurate, but it'll be a good starting place. Here's the code I have right now:
void WordCount()
{
int count_W = 0; //Varaible to store word count, will be written to label
int i, c = 0; //i for iterator
ifstream fPath("F:\Project_1_Text.txt");
FileStream input( "F:\Project_1_Text.txt", FileMode::Open, FileAccess::Read );
StreamReader fileReader( %input );
String ^ line;
//char ws = ' ';
array<Char>^ temp;
input.Seek( 0, SeekOrigin::Begin );
while ( ( line = fileReader.ReadLine() ) != nullptr )
{
Console::WriteLine( line );
c = line->Length;
//temp = line->ToCharArray();
for ( i = 0; i <= c; i++)
{
if ( line[i] == ' ' )
count_W++;
}
//line->ToString();
}
//Code to write to label
lblWordCount->Text = count_W.ToString();
}
All of this works except for one problem. When I try to run the program, and open the file, I get an error that tells me the Index is out of bounds. Now, I know what that means, but I don't get how the problem is occurring. And, if I don't know what's causing the problem, I can't fix it. I've read that it is possible to search through a string with a for loop, and of course that also holds true for a char array, and there is code in there to perform that conversion, but in both cases I get the same error. I know it is reading through the file correctly, because the final program also has to perform a character count (which is working), and it read back the size of each line in the target document perfectly from start to finish. Anyway, I'm out of ideas, so I thought I'd consult a higher power. Any ideas?
Counting whitespace is simple:
int spaces = std::count_if(s.begin(), s.end(),
[](unsigned char c){ return std::isspace(c); });
Two notes, though:
std::isspace() cannot be used immediately with char because char may be signed and std::isspace() takes an int which is required to be positive.
This counts the number of spaces, not the number of words (or words - 1): words may be separated by sequences of spaces consisting of more than one consecutive space.
It could be your loop. You're going from i=0 to i=c, but i=c is too far. You should go to i=c-1:
for ( i=0; i<c; i++)

C++: Removing all asterisks from a string where the asterisks are NOT multiplication symbols

So basically, I might have some string that looks like: "hey this is a string * this string is awesome 97 * 3 = 27 * this string is cool".
However, this string might be huge. I'm trying to remove all the asterisks from the string, unless that asterisk appears to represent multiplication. Efficiency is somewhat important here, and I'm having trouble coming up with a good algorithm to remove all the non-multiplication asterisks from this.
In order to determine whether an asterisk is for multiplication, I can obviously just check whether it's sandwiched in between two numbers.
Thus, I was thinking I could do something like (pseudocode):
wasNumber = false
Loop through string
if number
set wasNumber = true
else
set wasNumber = false
if asterisk
if wasNumber
if the next word is a number
do nothing
else
remove asterisk
else
remove asterisk
However, that^ is ugly and inefficient on a huge string. Can you think of a better way to accomplish this in C++?
Also, how could I actually check whether a word is a number? It's allowed to be a decimal. I know there's a function to check if a character is a number...
Fully functioning code:
#include <iostream>
#include <string>
using namespace std;
string RemoveAllAstericks(string);
void RemoveSingleAsterick(string&, int);
bool IsDigit(char);
int main()
{
string myString = "hey this is a string * this string is awesome 97 * 3 = 27 * this string is cool";
string newString = RemoveAllAstericks(myString);
cout << "Original: " << myString << "\n";
cout << "Modified: " << newString << endl;
system("pause");
return 0;
}
string RemoveAllAstericks(string s)
{
int len = s.size();
int pos;
for(int i = 0; i < len; i++)
{
if(s[i] != '*')
continue;
pos = i - 1;
char cBefore = s[pos];
while(cBefore == ' ')
{
pos--;
cBefore = s[pos];
}
pos = i + 1;
char cAfter = s[pos];
while(cAfter == ' ')
{
pos++;
cAfter = s[pos];
}
if( IsDigit(cBefore) && IsDigit(cAfter) )
RemoveSingleAsterick(s, i);
}
return s;
}
void RemoveSingleAsterick(string& s, int i)
{
s[i] = ' '; // Replaces * with a space, but you can do whatever you want
}
bool IsDigit(char c)
{
return (c <= 57 && c >= 48);
}
Top level overview:
Code searches the string until it encounters an *. Then, it looks at the first non-whitespace character before AND after the *. If both characters are numeric, the code decides that this is a multiplication operation, and removes the asterick. Otherwise, it is ignored.
See the revision history of this post if you'd like other details.
Important Notes:
You should seriously consider adding boundary checks on the string (i.e. don't try to access an index that is less than 0 or greater than len
If you are worried about parentheses, then change the condition that checks for whitespaces to also check for parentheses.
Checking whether every single character is a number is a bad idea. At the very least, it will require two logical checks (see my IsDigit() function). (My code checks for '*', which is one logical operation.) However, some of the suggestions posted were very poorly thought out. Do not use regular expressions to check if a character is numeric.
Since you mentioned efficiency in your question, and I don't have sufficient rep points to comment on other answers:
A switch statement that checks for '0' '1' '2' ..., means that every character that is NOT a digit, must go through 10 logical operations. With all due respect, please, since chars map to ints, just check the boundaries (char <= '9' && char >= '0')
You can start by implementing the slow version, it could be much faster than you think. But let's say it's too slow. It then is an optimization problem. Where does the inefficiency lies?
"if number" is easy, you can use a regex or anything that stops when it finds something that is not a digit
"if the next word is a number" is just as easy to implement efficiently.
Now, it's the "remove asterisk" part that is an issue to you. The key point to notice here is that you don't need to duplicate the string: you can actually modify it in place since you are only removing elements.
Try to run through this visually before trying to implement it.
Keep two integers or iterators, the first one saying where you are currently reading your string, and the second one saying where you are currently writing your string. Since you only erase stuff, the read one will always be ahead of the writing one.
If you decide to keep the current string, you just need to advance each of your integers/iterators one by one, and copying accordingly. If you don't want to keep it, just advance the reading string! Then you only have to cut the string by the amount of asterisks you removed. The complexity is simply O(n), without any additional buffer used.
Also note that your algorithm would be simpler (but equivalent) if written like this:
wasNumber = false
Loop through string
if number
set wasNumber = true
else
set wasNumber = false
if asterisk and wasNumber and next word is a number
do nothing // using my algorithm, "do nothing" actually copies what you intend to keep
else
remove asterisk
I found your little problem interesting and I wrote (and tested) a small and simple function that would do just that on a std::string. Here u go:
// TestStringsCpp.cpp : Defines the entry point for the console application.
//
#include "stdafx.h"
#include <string>
#include <iostream>
using namespace std;
string& ClearAsterisk(string& iString)
{
bool bLastCharNumeric = false;
string lString = "0123456789";
for (string::iterator it = iString.begin(); it != iString.end() ; ++it) {
switch (*it) {
case ' ': break;//ignore whitespace characters
case '*':
if (bLastCharNumeric) {
//asterisk is preceded by numeric character. we have to check if
//the following non space character is numeric also
for (string::iterator it2 = it + 1; it2 != iString.end() ; ++it2) {
if (*it2 != ' ') {
if (*it2 <= '9' && *it2 >= '0') break;
else iString.erase(it);
break; //exit current for
}
}
}
else iString.erase(it);;
break;
default:
if (*it <= '9' && *it >= '0') bLastCharNumeric= true;
else bLastCharNumeric = false; //reset flag
}
}
return iString;
}
int _tmain(int argc, _TCHAR* argv[])
{
string testString = "hey this is a string * this string is awesome 97 * 3 = 27 * this string is cool";
cout<<ClearAsterisk(testString).c_str();
cin >> testString; //this is just for the app to pause a bit :)
return 0;
}
It will work perfectly with your sample string but it will fail if you have a text like this: "this is a happy 5 * 3day menu" because it checks only for the first nonspace character after the '*'. But frankly I can't immagine a lot of cases you would have this kind of construct in a sentence.
HTH,JP.
A regular expression wouldn't necessarily be any more efficient, but it would let you rely on somebody else to do your string parsing and manipulation.
Personally, if I were worried about efficiency, I would implement your pseudocode version while limiting needless memory allocations. I might even mmap the input file. I highly doubt that you'll get much faster than that.

Cleaning a string of punctuation in C++

Ok so before I even ask my question I want to make one thing clear. I am currently a student at NIU for Computer Science and this does relate to one of my assignments for a class there. So if anyone has a problem read no further and just go on about your business.
Now for anyone who is willing to help heres the situation. For my current assignment we have to read a file that is just a block of text. For each word in the file we are to clear any punctuation in the word (ex : "can't" would end up as "can" and "that--to" would end up as "that" obviously with out the quotes, quotes were used just to specify what the example was).
The problem I've run into is that I can clean the string fine and then insert it into the map that we are using but for some reason with the code I have written it is allowing an empty string to be inserted into the map. Now I've tried everything that I can come up with to stop this from happening and the only thing I've come up with is to use the erase method within the map structure itself.
So what I am looking for is two things, any suggestions about how I could a) fix this with out simply just erasing it and b) any improvements that I could make on the code I already have written.
Here are the functions I have written to read in from the file and then the one that cleans it.
Note: the function that reads in from the file calls the clean_entry function to get rid of punctuation before anything is inserted into the map.
Edit: Thank you Chris. Numbers are allowed :). If anyone has any improvements to the code I've written or any criticisms of something I did I'll listen. At school we really don't get feed back on the correct, proper, or most efficient way to do things.
int get_words(map<string, int>& mapz)
{
int cnt = 0; //set out counter to zero
map<string, int>::const_iterator mapzIter;
ifstream input; //declare instream
input.open( "prog2.d" ); //open instream
assert( input ); //assure it is open
string s; //temp strings to read into
string not_s;
input >> s;
while(!input.eof()) //read in until EOF
{
not_s = "";
clean_entry(s, not_s);
if((int)not_s.length() == 0)
{
input >> s;
clean_entry(s, not_s);
}
mapz[not_s]++; //increment occurence
input >>s;
}
input.close(); //close instream
for(mapzIter = mapz.begin(); mapzIter != mapz.end(); mapzIter++)
cnt = cnt + mapzIter->second;
return cnt; //return number of words in instream
}
void clean_entry(const string& non_clean, string& clean)
{
int i, j, begin, end;
for(i = 0; isalnum(non_clean[i]) == 0 && non_clean[i] != '\0'; i++);
begin = i;
if(begin ==(int)non_clean.length())
return;
for(j = begin; isalnum(non_clean[j]) != 0 && non_clean[j] != '\0'; j++);
end = j;
clean = non_clean.substr(begin, (end-begin));
for(i = 0; i < (int)clean.size(); i++)
clean[i] = tolower(clean[i]);
}
The problem with empty entries is in your while loop. If you get an empty string, you clean the next one, and add it without checking. Try changing:
not_s = "";
clean_entry(s, not_s);
if((int)not_s.length() == 0)
{
input >> s;
clean_entry(s, not_s);
}
mapz[not_s]++; //increment occurence
input >>s;
to
not_s = "";
clean_entry(s, not_s);
if((int)not_s.length() > 0)
{
mapz[not_s]++; //increment occurence
}
input >>s;
EDIT: I notice you are checking if the characters are alphanumeric. If numbers are not allowed, you may need to revisit that area as well.
Further improvements would be to
declare variables only when you use them, and in the innermost scope
use c++-style casts instead of the c-style (int) casts
use empty() instead of length() == 0 comparisons
use the prefix increment operator for the iterators (i.e. ++mapzIter)
A blank string is a valid instance of the string class, so there's nothing special about adding it into the map. What you could do is first check if it's empty, and only increment in that case:
if (!not_s.empty())
mapz[not_s]++;
Style-wise, there's a few things I'd change, one would be to return clean from clean_entry instead of modifying it:
string not_s = clean_entry(s);
...
string clean_entry(const string &non_clean)
{
string clean;
... // as before
if(begin ==(int)non_clean.length())
return clean;
... // as before
return clean;
}
This makes it clearer what the function is doing (taking a string, and returning something based on that string).
The function 'getWords' is doing a lot of distinct actions that could be split out into other functions. There's a good chance that by splitting it up into it's individual parts, you would have found the bug yourself.
From the basic structure, I think you could split the code into (at least):
getNextWord: Return the next (non blank) word from the stream (returns false if none left)
clean_entry: What you have now
getNextCleanWord: Calls getNextWord, and if 'true' calls CleanWord. Returns 'false' if no words left.
The signatures of 'getNextWord' and 'getNextCleanWord' might look something like:
bool getNextWord (std::ifstream & input, std::string & str);
bool getNextCleanWord (std::ifstream & input, std::string & str);
The idea is that each function does a smaller more distinct part of the problem. For example, 'getNextWord' does nothing but get the next non blank word (if there is one). This smaller piece therefore becomes an easier part of the problem to solve and debug if necessary.
The main component of 'getWords' then can be simplified down to:
std::string nextCleanWord;
while (getNextCleanWord (input, nextCleanWord))
{
++map[nextCleanWord];
}
An important aspect to development, IMHO, is to try to Divide and Conquer the problem. Split it up into the individual tasks that need to take place. These sub-tasks will be easier to complete and should also be easier to maintain.