Hamming Distance: Incorrect Count - c++

I'm trying to create a function to calculate the Hamming Distance between two strings. When I call this function, it should tell me the number of characters that do not match between the two strings.
My output is not correct. I keep getting random number results. Below is my code:
using namespace std;
// function to calculate Hamming distance
int HammingDistance(char seq1[], char seq2[])
{
int i = 0, count = 0;
while (seq1[i] != ' ')
{
if (seq1[i] != seq2[i])
count++;
i++;
}
return count;
}
int main()
{
char seq1[] = "doga";
char seq2[] = "dogb";
cout << HammingDistance(seq1, seq2) << endl;
return 0;
}
I keep getting random number results in my output, like 99 or 207.
When in this example, I should get 1.
Any help on where I'm going wrong is greatly appreciated! Thank you.

You should test the end of string with \0, not with (space).
Your while should, then, be: while (seq1[i] != '\0')

The condition seq1[i] != ' ' is not a good way of checking whether you have reached the end of the string. Assuming that your strings are null terminated then you could use seq1[i] != '\0' instead.
The reason that you are seeing "random" results is that the loop is not encountering a space within the string and is continuing to read past the end of the strings into other parts of the program's memory. The loop only stops when it encounters a byte of memory that happens to contain the same bits as the representation of ' '.
You should also think about how to handle cases where the two strings are different lengths.

Related

Why does getline behave weirdly after 3 newlines?

I'll preface this by saying I'm relatively new to posting questions, as well as C++ in general, my title is a little lame as it doesn't really specifically address the problem I am dealing with, however I couldn't really think of another way to word it, so any suggestions on improving the title is appreciated.
I am working on a relatively simple function which is supposed to get a string using getline, and read the spaces and/or newlines in the string so that it can output how many words have been entered. After reaching the character 'q' it's basically supposed to stop reading in characters.
void ReadStdIn2() {
std::string userInput;
const char *inputArray = userInput.c_str();
int count = 0;
getline(std::cin, userInput, 'q');
for (int i = 0; i < strlen(inputArray); i++){
if ((inputArray[i] == ' ') || (inputArray[i] == '\n')){
count += 1;
}
}
std::cout << count << std::endl;
}
I want to be able to enter multiple words, followed by newlines, and have the function accurately display my number of words. I can't figure out why but for some reason after entering 3 newlines my count goes right back to 0.
For example, if I enter:
hello
jim
tim
q
the function works just fine, and returns 3 just like I expect it to. But if I enter
hello
jim
tim
bill
q
the count goes right to 0. I'm assuming this has something to do with my if statement but I'm really lost as to what is wrong, especially since it works fine up until the 3rd newline. Any help is appreciated
The behaviour of the program is undefined. Reading input into std::string potentially causes its capacity to increase. This causes pointers into the string to become invalid. Pointers such as inputArray. You then later attempt to read through the invalid pointer.
P.S. calculating the length of the string with std::strlen in every iteration of the loop is not a good idea. It is possible to get the size without calculation by using userInput.size().
To fix both issues, simply don't use inputArray. You don't need it:
for (int i = 0; i < userInput.size(); i++){
if ((userInput[i] == ' ') || (userInput[i] == '\n')){
...

Trying to read a single character at a time into an array of indefinite size

I am a CS student working on a c++ project. We have been instructed to declare a struct and use it to read in an array of chars and keep a tally of how many letters are used in the string. We are not allowed to use a string; it MUST be an array of our declared struct.
The input must be as long as the user wants; the code has to be able to accept new lines of input and be terminated by '.'
I'm really struggling here. I don't even know where to begin. I've thrown together some code as best-guess for what to do, but it crashes after pressing "." then enter, and I don't know why.
//declare struct
struct data
{
int tally = 0;
char letter;
};
//size of string to read in at a time
const int SIZE_OF_CHUNK = 11;
int main()
{
//input chunk of struct
data input[SIZE_OF_CHUNK];
int placemark,
length;
cout << "Enter sequence of characters, '.' to terminate:" << endl;
do
{
for (int index = 0; (input[index].letter != '\0') && (input[index - 1].letter != '.'); index++)
{
cin >> input[index].letter;
placemark++;
}
//I intend to put something here to handle if the code
needs to read in another chunk, but I want to fix the crashing
problem first
}
while (input[placemark].letter != '.');
//print out what was read in, just to check
for (int index = 0; input[index].letter != '\0'; index++)
{
cout << input[index].letter;
}
return 0;
}
I've tried looking up how to read in a single character but haven't found anything helpful to my circumstances so far. Any tips on what I'm doing wrong, or where I can find helpful resources, would be very much appreciated.
Are you sure you must use a declared struct?
If you just want to count the number of times a character has appeared, you don't need to store the character; you just need to store the number of times it appeared. So just unsigned lettersCount[26], and each index maps to a letter (i.e. index 0 means a, index 1 means b). Whenever a letter appears, just increase the count of that index.
You can map a letter to the index by making use of ASCII. Every letter is represented by a decimal number that you can look it up at ASCII table. For example, the letter a is represented by the decimal value 97, b is 98 and so on. The number increases successively, which we can make use of. So if you want to map a letter to an index, all you need to do is just value - 97 or value - 'a'. For example, if you read in the letter a, take away 97 from that and you'll get 0, which is what you want. After getting the index, it's just a simple ++ to increment the count of that letter.
Regarding the treatment of uppercase and lowercase (i.e. treat them the same or differently), it'll be up to you to figure it out how to do it (which should be fairly simple if you can understand what I've explained).

How to count whitespace occurences in a string in c++

I have a project for my advanced c++ class that's supposed to do a number of things, but I'm trying to focus on this function first, because after it works I can tweak it to fulfill the other needs. This function searches through a file and performs a word count by counting the number of times ' ' appears in the document. Maybe not accurate, but it'll be a good starting place. Here's the code I have right now:
void WordCount()
{
int count_W = 0; //Varaible to store word count, will be written to label
int i, c = 0; //i for iterator
ifstream fPath("F:\Project_1_Text.txt");
FileStream input( "F:\Project_1_Text.txt", FileMode::Open, FileAccess::Read );
StreamReader fileReader( %input );
String ^ line;
//char ws = ' ';
array<Char>^ temp;
input.Seek( 0, SeekOrigin::Begin );
while ( ( line = fileReader.ReadLine() ) != nullptr )
{
Console::WriteLine( line );
c = line->Length;
//temp = line->ToCharArray();
for ( i = 0; i <= c; i++)
{
if ( line[i] == ' ' )
count_W++;
}
//line->ToString();
}
//Code to write to label
lblWordCount->Text = count_W.ToString();
}
All of this works except for one problem. When I try to run the program, and open the file, I get an error that tells me the Index is out of bounds. Now, I know what that means, but I don't get how the problem is occurring. And, if I don't know what's causing the problem, I can't fix it. I've read that it is possible to search through a string with a for loop, and of course that also holds true for a char array, and there is code in there to perform that conversion, but in both cases I get the same error. I know it is reading through the file correctly, because the final program also has to perform a character count (which is working), and it read back the size of each line in the target document perfectly from start to finish. Anyway, I'm out of ideas, so I thought I'd consult a higher power. Any ideas?
Counting whitespace is simple:
int spaces = std::count_if(s.begin(), s.end(),
[](unsigned char c){ return std::isspace(c); });
Two notes, though:
std::isspace() cannot be used immediately with char because char may be signed and std::isspace() takes an int which is required to be positive.
This counts the number of spaces, not the number of words (or words - 1): words may be separated by sequences of spaces consisting of more than one consecutive space.
It could be your loop. You're going from i=0 to i=c, but i=c is too far. You should go to i=c-1:
for ( i=0; i<c; i++)

ASCII and isalpha if statement issue

I am writing a program that takes a user inputted character, such as A, and a user inputted number, such as 7. The program checks the validity of the character, if true runs thru till it gets to this loop inside of a function. I am using ascii decimal for this loop inside of a function. This loop needs to check isalpha and if it is run the code inside the {}'s, it's doing that correctly. The else is not working the way I want and am not sure how to correct it. I need the else (is not alpha) to add a 1 back to the counter in the loop and increase the ascii by 1. If I run it as so, it gives off a retry/ignore/abort error. If I run it without the num++; it runs and stops after the loop ends. So, if you put in a Z and choose 3, it runs thru the loop 3 times and outputs just a Z. Any thoughts on how to fix this?
I need it to output something like: Input: Z Input: 4 it should output: Z A B C to the screen. It needs to ignore other ascii non alpha characters.
Thanks
string buildSeries(char A, int num)
{
//builds the output with the info the
//user inputted
stringstream str1;
string outted;
int DeC=(int)A, i = 0;
//loop builds the output
for(i=0;i<num;i++)
{
if (isalpha(DeC))
{
//converts the decimal to a letter
str1<<(char)DeC;
//adds a space
str1<<" ";
//increases the decimal
DeC++;
}
else
{
num++;
DeC++;
}
}
//builds the sstream and puts it in
//variable "outted"
outted = str1.str();
return outted;
}
If you need to loop back to 'A' at Z change your DeC++ to
if DecC == 'Z'
DecC = 'A'
else
DecC++;
Or you could get fancy and use the modulus operator
Edit
I think the problem may be that this stringstream insertion operator, >>, doesn't have an overload that handles a char. It's converting the char to a short or an int then inserting it. Try using string::append(size_t size, char c) instead. That should handle inserting a char.
That is replace you calls to str1<<(char)DeC; with outted.append(1, (char)DeC) and remove your use of the string stream
What is DeC? The phrase "ascii list" makes me suspect it's a 'C' string, in which case you are calling isAlpha() on the pointer not on the value in the string.
edit: If for example you have
char DeC[40];
// read in a string form somewhere
// DeC is a pointer to some memory it has a value of a 32 or 64bit number
if ( isAlpha(DeC) {
// what you might have meant is
if ( isAlpha(*DeC) { // the character value at the current position in DeC

C++: Removing all asterisks from a string where the asterisks are NOT multiplication symbols

So basically, I might have some string that looks like: "hey this is a string * this string is awesome 97 * 3 = 27 * this string is cool".
However, this string might be huge. I'm trying to remove all the asterisks from the string, unless that asterisk appears to represent multiplication. Efficiency is somewhat important here, and I'm having trouble coming up with a good algorithm to remove all the non-multiplication asterisks from this.
In order to determine whether an asterisk is for multiplication, I can obviously just check whether it's sandwiched in between two numbers.
Thus, I was thinking I could do something like (pseudocode):
wasNumber = false
Loop through string
if number
set wasNumber = true
else
set wasNumber = false
if asterisk
if wasNumber
if the next word is a number
do nothing
else
remove asterisk
else
remove asterisk
However, that^ is ugly and inefficient on a huge string. Can you think of a better way to accomplish this in C++?
Also, how could I actually check whether a word is a number? It's allowed to be a decimal. I know there's a function to check if a character is a number...
Fully functioning code:
#include <iostream>
#include <string>
using namespace std;
string RemoveAllAstericks(string);
void RemoveSingleAsterick(string&, int);
bool IsDigit(char);
int main()
{
string myString = "hey this is a string * this string is awesome 97 * 3 = 27 * this string is cool";
string newString = RemoveAllAstericks(myString);
cout << "Original: " << myString << "\n";
cout << "Modified: " << newString << endl;
system("pause");
return 0;
}
string RemoveAllAstericks(string s)
{
int len = s.size();
int pos;
for(int i = 0; i < len; i++)
{
if(s[i] != '*')
continue;
pos = i - 1;
char cBefore = s[pos];
while(cBefore == ' ')
{
pos--;
cBefore = s[pos];
}
pos = i + 1;
char cAfter = s[pos];
while(cAfter == ' ')
{
pos++;
cAfter = s[pos];
}
if( IsDigit(cBefore) && IsDigit(cAfter) )
RemoveSingleAsterick(s, i);
}
return s;
}
void RemoveSingleAsterick(string& s, int i)
{
s[i] = ' '; // Replaces * with a space, but you can do whatever you want
}
bool IsDigit(char c)
{
return (c <= 57 && c >= 48);
}
Top level overview:
Code searches the string until it encounters an *. Then, it looks at the first non-whitespace character before AND after the *. If both characters are numeric, the code decides that this is a multiplication operation, and removes the asterick. Otherwise, it is ignored.
See the revision history of this post if you'd like other details.
Important Notes:
You should seriously consider adding boundary checks on the string (i.e. don't try to access an index that is less than 0 or greater than len
If you are worried about parentheses, then change the condition that checks for whitespaces to also check for parentheses.
Checking whether every single character is a number is a bad idea. At the very least, it will require two logical checks (see my IsDigit() function). (My code checks for '*', which is one logical operation.) However, some of the suggestions posted were very poorly thought out. Do not use regular expressions to check if a character is numeric.
Since you mentioned efficiency in your question, and I don't have sufficient rep points to comment on other answers:
A switch statement that checks for '0' '1' '2' ..., means that every character that is NOT a digit, must go through 10 logical operations. With all due respect, please, since chars map to ints, just check the boundaries (char <= '9' && char >= '0')
You can start by implementing the slow version, it could be much faster than you think. But let's say it's too slow. It then is an optimization problem. Where does the inefficiency lies?
"if number" is easy, you can use a regex or anything that stops when it finds something that is not a digit
"if the next word is a number" is just as easy to implement efficiently.
Now, it's the "remove asterisk" part that is an issue to you. The key point to notice here is that you don't need to duplicate the string: you can actually modify it in place since you are only removing elements.
Try to run through this visually before trying to implement it.
Keep two integers or iterators, the first one saying where you are currently reading your string, and the second one saying where you are currently writing your string. Since you only erase stuff, the read one will always be ahead of the writing one.
If you decide to keep the current string, you just need to advance each of your integers/iterators one by one, and copying accordingly. If you don't want to keep it, just advance the reading string! Then you only have to cut the string by the amount of asterisks you removed. The complexity is simply O(n), without any additional buffer used.
Also note that your algorithm would be simpler (but equivalent) if written like this:
wasNumber = false
Loop through string
if number
set wasNumber = true
else
set wasNumber = false
if asterisk and wasNumber and next word is a number
do nothing // using my algorithm, "do nothing" actually copies what you intend to keep
else
remove asterisk
I found your little problem interesting and I wrote (and tested) a small and simple function that would do just that on a std::string. Here u go:
// TestStringsCpp.cpp : Defines the entry point for the console application.
//
#include "stdafx.h"
#include <string>
#include <iostream>
using namespace std;
string& ClearAsterisk(string& iString)
{
bool bLastCharNumeric = false;
string lString = "0123456789";
for (string::iterator it = iString.begin(); it != iString.end() ; ++it) {
switch (*it) {
case ' ': break;//ignore whitespace characters
case '*':
if (bLastCharNumeric) {
//asterisk is preceded by numeric character. we have to check if
//the following non space character is numeric also
for (string::iterator it2 = it + 1; it2 != iString.end() ; ++it2) {
if (*it2 != ' ') {
if (*it2 <= '9' && *it2 >= '0') break;
else iString.erase(it);
break; //exit current for
}
}
}
else iString.erase(it);;
break;
default:
if (*it <= '9' && *it >= '0') bLastCharNumeric= true;
else bLastCharNumeric = false; //reset flag
}
}
return iString;
}
int _tmain(int argc, _TCHAR* argv[])
{
string testString = "hey this is a string * this string is awesome 97 * 3 = 27 * this string is cool";
cout<<ClearAsterisk(testString).c_str();
cin >> testString; //this is just for the app to pause a bit :)
return 0;
}
It will work perfectly with your sample string but it will fail if you have a text like this: "this is a happy 5 * 3day menu" because it checks only for the first nonspace character after the '*'. But frankly I can't immagine a lot of cases you would have this kind of construct in a sentence.
HTH,JP.
A regular expression wouldn't necessarily be any more efficient, but it would let you rely on somebody else to do your string parsing and manipulation.
Personally, if I were worried about efficiency, I would implement your pseudocode version while limiting needless memory allocations. I might even mmap the input file. I highly doubt that you'll get much faster than that.