C++ is mixing my strings? - c++

I have this really simple c++ function I wrote myself.
It should just strip the '-' characters out of my string.
Here's the code
char* FastaManager::stripAlignment(char *seq, int seqLength){
char newSeq[seqLength];
int j=0;
for (int i=0; i<seqLength; i++) {
if (seq[i] != '-') {
newSeq[j++]=seq[i];
}
}
char *retSeq = (char*)malloc((--j)*sizeof(char));
for (int i=0; i<j; i++) {
retSeq[i]=newSeq[i];
}
retSeq[j+1]='\0'; //WTF it keeps reading from memory without this
return retSeq;
}
I think that comment speaks for itself.
I don't know why, but when I launch the program and print out the result, I get something like
'stripped_sequence''original_sequence'
However, if I try to debug the code to see if there's anything wrong, the flows goes just right, and ends up returning the correct stripped sequence.
I tried to print out the memory of the two variables, and here are the memory readings
memory for seq: http://i.stack.imgur.com/dHI8k.png
memory for *seq: http://i.stack.imgur.com/UqVkX.png
memory for retSeq: http://i.stack.imgur.com/o9uvI.png
memory for *retSeq: http://i.stack.imgur.com/ioFsu.png
(couldn't include links / pics because of spam filter, sorry)
This is the code I'm using to print out the strings
for (int i=0; i<atoi(argv[2]); i++) {
char *seq;
if (usingStructure) {
seq = fm.generateSequenceWithStructure(structure);
}else{
seq = fm.generateSequenceFromProfile();
}
cout<<">Sequence "<<i+1<<": "<<seq<<endl;
}
Now, I have really no idea about what's going on.

If you can use std::string, simply do this:
std::string FastaManager::stripAlignment(const std::string& str)
{
std::string result(str);
result.erase(std::remove(result.begin(), result.end(), '-'), result.end());
return result;
}
This is called "erase-remove idiom".

This happens because you put the terminating zero of a C string outside the allocated space. You should be allocating one extra character at the end of your string copy, and adding '\0' there. Or better yet, you should use std::string.
char *retSeq = (char*)malloc((j+1)*sizeof(char));
for (int i=0; i<j; i++) {
retSeq[i]=newSeq[i];
}
retSeq[j]='\0';
it keeps reading from memory without this
This is by design: C strings are zero-terminated. '\0' signals to string routines in C that the end of the string has been reached. The same convention holds in C++ when you work with C strings.

Personally, I think you would be best off using std::string unless you have really very good reason otherwise:
std::string FastaManager::stripAlignment(std::string value)
{
value.erase(std::remove(value.begin(), value.end(), value.begin(), '-'), value.end());
return value;
}
When you are using C strings you need to realize that they are null-terminated: C strings reach up to the first null character found. With code you posted you introduced an out of range assignment as you allocated 'j' elements and you assign to retSeq[j + 1] which is two character past the end of the string (surely you mean retSeq[j] = 0; anyway).

Related

Suggestions that could improve my string splitting function

I'm new to C++ and I'm trying to write some basic functions to get the hang of some of it, I decided on making a custom function to split up a string into tokens every time a specific delimiter is reached.
I've made it work successfully, but since I'm new, I'd like to hear from more experienced programmers on if there is a better way to go about it. This is my code:
vector<string> split(string const str, string const separator=" ") {
int str_len = str.length();
int sep_len = separator.length();
int current_index {0};
vector<string> strings {};
for(int i {0}; i < str_len; ++i) {
if(str.substr(i, sep_len) == separator) {
strings.push_back(str.substr(current_index, i-current_index));
current_index = i + sep_len;
}
}
strings.push_back(str.substr(current_index, str_len-current_index));
return strings;
}
One thing I will say is, I don't like how I had to put
strings.push_back(str.substr(current_index, str_len-current_index));
this after the entire iteration to get the final part of the string. I just can't think of any different methods.
Use std::string::find() to find separators in the string, which is probably much more efficient than your loop that checks for each possible position if the substring at that position matches the separator. Once you have that, you can make use of the fact that if the separator is not found, find() returns std::string::npos, which is the largest possible value of std::string::size_type, so just pass this to substr() to get everything from the current position to the end of the string. This way you can avoid the second push_back().
vector<string> split(string const &str, string const &separator=" ") {
string::size_type current_index {};
vector<string> strings;
while (true) {
auto separator_index = str.find(separator, current_index);
strings.push_back(str.substr(current_index, separator_index - current_index));
if (separator_index == str.npos)
break;
else
current_index = separator_index + separator.size();
}
return strings;
}
Note: ensure you pass the input parameters by reference to avoid unnecessary copies being made.

C++, Trouble with string and int conversion

I know how to convert the string when it's just made up of integers, or it begins with ints. I am trying to convert to an integer when the string starts with a char in the beginning, or middle. I've tried running through a for loop, checking if (isdigit(str[i]) before trying stoi, stringstream, atoi, etc... None of them really work. I have the same problem even without the for loop. I've tried Googling my problem, but no luck. Any suggestions, or anything that I can try?
You have to check character by character if it's a digit or not and, if it is, add it to a new string. In the end, you convert your new string to an int like you would normally. Look at the code below. Hope I could help!
string s = "pc2jjj10";
char temp;
string result;
for (int i = 0; i < s.length(); i++){
temp = s.at(i);
if (isdigit(temp)){
result.push_back(temp);
}
}
int number = stoi(result);

C++ string not printed

I have the next code:
void addContent ( const std::string& message ) {
std::string newMessage;
for ( int i = 0, remainder = textCapacity - currentText; i < remainder; i++ ) {
newMessage[i] = message[i];
std::cout << newMessage; //here nothing is printed
}
}
But nothing is printed.
Only if I change newMessage to newMessage[i] everything is good. And I dont undestand why?
newMessage is an empty std::string. Doing [i] to it is accessing invalid memory. The string is always empty, and you're just writing to invalid memory. That's a recipe for disaster, and you're (un)lucky it's not crashing on you.
I'm not sure what message[i] is, but you probably want newMessage[i] = message[i]. But you might as well skip the temporary newMessage variable and just print out message[i] itself.
newMessage is an empty string, so nothing will be printed. Also, std::cout is buffered, so in order to flush the buffer you should call std::endl or std::flush
I would rather change from this:
newMessage[i] = message[i];
to this:
newMessage += message[i];
And when printing:
std::cout << newMessage<<std::endl;
Using [i] on empty string is looking for trouble because your entering invalid out of bound memory. Sometimes it will do nothing sometimes your program will crash.
As Cornstalks said, you have an out-of-bounds access.
More importantly, the code is way too complex for the task. Don't use a manual loop to partially copy one std::string to another. To copy a part of message to newMessage, use substr on message and assign:
newMessage = message.substr(from_index, number_of_chars);
or the iterator-based stuff:
std::string newMessage(message.begin() + from_index, message.begin() + to_index);
The latter is more efficient. So you want
std::string newMessage(message.begin(), message.begin() + (textCapacity - currentText));
Using that string as newMessage[i] is implying that it an array of strings. Replace that line with std::string newMessage[textCapacity];.

How to count whitespace occurences in a string in c++

I have a project for my advanced c++ class that's supposed to do a number of things, but I'm trying to focus on this function first, because after it works I can tweak it to fulfill the other needs. This function searches through a file and performs a word count by counting the number of times ' ' appears in the document. Maybe not accurate, but it'll be a good starting place. Here's the code I have right now:
void WordCount()
{
int count_W = 0; //Varaible to store word count, will be written to label
int i, c = 0; //i for iterator
ifstream fPath("F:\Project_1_Text.txt");
FileStream input( "F:\Project_1_Text.txt", FileMode::Open, FileAccess::Read );
StreamReader fileReader( %input );
String ^ line;
//char ws = ' ';
array<Char>^ temp;
input.Seek( 0, SeekOrigin::Begin );
while ( ( line = fileReader.ReadLine() ) != nullptr )
{
Console::WriteLine( line );
c = line->Length;
//temp = line->ToCharArray();
for ( i = 0; i <= c; i++)
{
if ( line[i] == ' ' )
count_W++;
}
//line->ToString();
}
//Code to write to label
lblWordCount->Text = count_W.ToString();
}
All of this works except for one problem. When I try to run the program, and open the file, I get an error that tells me the Index is out of bounds. Now, I know what that means, but I don't get how the problem is occurring. And, if I don't know what's causing the problem, I can't fix it. I've read that it is possible to search through a string with a for loop, and of course that also holds true for a char array, and there is code in there to perform that conversion, but in both cases I get the same error. I know it is reading through the file correctly, because the final program also has to perform a character count (which is working), and it read back the size of each line in the target document perfectly from start to finish. Anyway, I'm out of ideas, so I thought I'd consult a higher power. Any ideas?
Counting whitespace is simple:
int spaces = std::count_if(s.begin(), s.end(),
[](unsigned char c){ return std::isspace(c); });
Two notes, though:
std::isspace() cannot be used immediately with char because char may be signed and std::isspace() takes an int which is required to be positive.
This counts the number of spaces, not the number of words (or words - 1): words may be separated by sequences of spaces consisting of more than one consecutive space.
It could be your loop. You're going from i=0 to i=c, but i=c is too far. You should go to i=c-1:
for ( i=0; i<c; i++)

Cleaning a string of punctuation in C++

Ok so before I even ask my question I want to make one thing clear. I am currently a student at NIU for Computer Science and this does relate to one of my assignments for a class there. So if anyone has a problem read no further and just go on about your business.
Now for anyone who is willing to help heres the situation. For my current assignment we have to read a file that is just a block of text. For each word in the file we are to clear any punctuation in the word (ex : "can't" would end up as "can" and "that--to" would end up as "that" obviously with out the quotes, quotes were used just to specify what the example was).
The problem I've run into is that I can clean the string fine and then insert it into the map that we are using but for some reason with the code I have written it is allowing an empty string to be inserted into the map. Now I've tried everything that I can come up with to stop this from happening and the only thing I've come up with is to use the erase method within the map structure itself.
So what I am looking for is two things, any suggestions about how I could a) fix this with out simply just erasing it and b) any improvements that I could make on the code I already have written.
Here are the functions I have written to read in from the file and then the one that cleans it.
Note: the function that reads in from the file calls the clean_entry function to get rid of punctuation before anything is inserted into the map.
Edit: Thank you Chris. Numbers are allowed :). If anyone has any improvements to the code I've written or any criticisms of something I did I'll listen. At school we really don't get feed back on the correct, proper, or most efficient way to do things.
int get_words(map<string, int>& mapz)
{
int cnt = 0; //set out counter to zero
map<string, int>::const_iterator mapzIter;
ifstream input; //declare instream
input.open( "prog2.d" ); //open instream
assert( input ); //assure it is open
string s; //temp strings to read into
string not_s;
input >> s;
while(!input.eof()) //read in until EOF
{
not_s = "";
clean_entry(s, not_s);
if((int)not_s.length() == 0)
{
input >> s;
clean_entry(s, not_s);
}
mapz[not_s]++; //increment occurence
input >>s;
}
input.close(); //close instream
for(mapzIter = mapz.begin(); mapzIter != mapz.end(); mapzIter++)
cnt = cnt + mapzIter->second;
return cnt; //return number of words in instream
}
void clean_entry(const string& non_clean, string& clean)
{
int i, j, begin, end;
for(i = 0; isalnum(non_clean[i]) == 0 && non_clean[i] != '\0'; i++);
begin = i;
if(begin ==(int)non_clean.length())
return;
for(j = begin; isalnum(non_clean[j]) != 0 && non_clean[j] != '\0'; j++);
end = j;
clean = non_clean.substr(begin, (end-begin));
for(i = 0; i < (int)clean.size(); i++)
clean[i] = tolower(clean[i]);
}
The problem with empty entries is in your while loop. If you get an empty string, you clean the next one, and add it without checking. Try changing:
not_s = "";
clean_entry(s, not_s);
if((int)not_s.length() == 0)
{
input >> s;
clean_entry(s, not_s);
}
mapz[not_s]++; //increment occurence
input >>s;
to
not_s = "";
clean_entry(s, not_s);
if((int)not_s.length() > 0)
{
mapz[not_s]++; //increment occurence
}
input >>s;
EDIT: I notice you are checking if the characters are alphanumeric. If numbers are not allowed, you may need to revisit that area as well.
Further improvements would be to
declare variables only when you use them, and in the innermost scope
use c++-style casts instead of the c-style (int) casts
use empty() instead of length() == 0 comparisons
use the prefix increment operator for the iterators (i.e. ++mapzIter)
A blank string is a valid instance of the string class, so there's nothing special about adding it into the map. What you could do is first check if it's empty, and only increment in that case:
if (!not_s.empty())
mapz[not_s]++;
Style-wise, there's a few things I'd change, one would be to return clean from clean_entry instead of modifying it:
string not_s = clean_entry(s);
...
string clean_entry(const string &non_clean)
{
string clean;
... // as before
if(begin ==(int)non_clean.length())
return clean;
... // as before
return clean;
}
This makes it clearer what the function is doing (taking a string, and returning something based on that string).
The function 'getWords' is doing a lot of distinct actions that could be split out into other functions. There's a good chance that by splitting it up into it's individual parts, you would have found the bug yourself.
From the basic structure, I think you could split the code into (at least):
getNextWord: Return the next (non blank) word from the stream (returns false if none left)
clean_entry: What you have now
getNextCleanWord: Calls getNextWord, and if 'true' calls CleanWord. Returns 'false' if no words left.
The signatures of 'getNextWord' and 'getNextCleanWord' might look something like:
bool getNextWord (std::ifstream & input, std::string & str);
bool getNextCleanWord (std::ifstream & input, std::string & str);
The idea is that each function does a smaller more distinct part of the problem. For example, 'getNextWord' does nothing but get the next non blank word (if there is one). This smaller piece therefore becomes an easier part of the problem to solve and debug if necessary.
The main component of 'getWords' then can be simplified down to:
std::string nextCleanWord;
while (getNextCleanWord (input, nextCleanWord))
{
++map[nextCleanWord];
}
An important aspect to development, IMHO, is to try to Divide and Conquer the problem. Split it up into the individual tasks that need to take place. These sub-tasks will be easier to complete and should also be easier to maintain.