Objective Memory Comparison - c++

I would like to write an algorithm that compares one memory block with another and provides an objective value, so as to determine the quality of the match. I've investigated memcmp, and all it is really useful for is to determine whether two memory blocks are identical or not. I've written a recursive function to accomplish this, but it's not working quite right.
DWORD CMemory::Compare( LPBYTE pDst, LPBYTE pSrc, DWORD len )
{
DWORD dwDiff;
if ( len == 0 )
{
dwDiff = 0;
}
else
{
dwDiff = (*pSrc - *pDst) * len; // * len is attempt to weight difference by MSB
dwDiff += this->Compare( pSrc + 1, pDst + 1, len - 1 );
}
return dwDiff;
}
The idea is that the more closely the two memory spaces match, the lower the return value will be. For example, let's say there are three memory blocks containing Hello World 0 !, Hello World 1 !, and Hello World 2 !, respectively, and I would like to find out which memory block is a "best match" with candidate hello world 1 !. The idea is that I would run the Compare function three times comparing the candidate with each memory block in turn, and Compare should return the lowest value for the memory block containing Hello World 1 !. However, what it's doing in reality, is returning the lowest value for the last memory block containing Hellow World 2 !.
Does anyone have any ideas on how I can improve this function? Thanks.

I think you need to take the absolute value of (*pSrc - *pDst). In "Hello World 1 !", you're getting a 0 for the number position, while in "Hello World 2 !" you're getting a -1, and -1 is less than 0.
Also, if you use this on a long section of memory you could run into stack problems, so you might want to make it iterative.
Your algorithm won't account for a character inserted or deleted, since it does a position by position compare. If you're worried about that, the problem gets much harder.

Considered writing abs(*pSrc-*pDst)? As otherwise you get negative values, which are allways lower than the perfect match (0).

To improve this...
Supply a length for both the source and the destination.
Supply a value 'n', for comparing n bytes of the source and destination.
You need to handle the case when the source and destination aren't the same size, or you're going to have problems with walking off the end.
Don't use recursion, unless you're dealing with really small blocks of memory.
You can do the same work by just using a loop.
This method is really really expensive to call.

If you're comparing strings, you may want to look into soundex.

Related

C++ program crashes at delete only when character array gets the input of "1000!"

I wrote a quick and dirty factorial program using the GMP library in C++, which allocates some memory for a character array. The program works fine for all input values that I've tested, except 1000.
I've tried different ways to allocate the memory, and different ways to deallocate it, but none have worked so far and almost always crash on 1000.
This is everything that happens between creating the character array and deleting it. factor is the calculated factorial and mpz_sizeinbase returns the number of digits in the number in the specified base. gmp_sprintf just transforms the number into the character array.
int count = mpz_sizeinbase(factor, 10);
char* zeroes = new char[count];
gmp_sprintf(zeroes, "%Zd", factor);
printf("After conversion: %s\n", zeroes);
int trailingzeroes = 0;
for(int i = strlen(zeroes)-1; i > 0; i--){
if(zeroes[i] == '0')
trailingzeroes++;
else
break;
}
printf("Trailing zeroes: %i\n", trailingzeroes);
delete [] zeroes;
When the input is 1000, meaning that I want to calculate 1000!, I get the error
double free or corruption (!prev)
Aborted
at delete [].
All other inputs work, as far as I can tell.
What could I be doing wrong?
From the mpz_sizeinbase documentation: "The right amount of allocation is normally two more than the value returned by mpz_sizeinbase, one extra for a minus sign and one for the null-terminator." You probably don't have a negative factorial, but you are certainly not allocating enough space for including the terminating null character.
It would be instructive to check the return value of gmp_sprintf because it tells you how many characters it actually wrote. I would wager that in your case it returns count+1, which would mean it wrote past the end of the buffer (which is Undefined Behavior).
The fact that it sometimes might work is probably related to the fact that "the result will be either exact or 1 too big" (for bases other than 2). That and of course the unpredictable nature of UB.

How to use a string or a char vector (containing any chemical composition respectively formula) and calculate its molar mass?

I try to write a simple console application in C++ which can read any chemical formula and afterwards compute its molar mass, for example:
Na2CO3, or something like:
La0.6Sr0.4CoO3, or with brackets:
Fe(NO3)3
The problem is that I don't know in detail how I can deal with the input stream. I think that reading the input and storing it into a char vector may be in this case a better idea than utilizing a common string.
My very first idea was to check all elements (stored in a char vector), step by step: When there's no lowercase after a capital letter, then I have found e.g. an element like Carbon 'C' instead of "Co" (Cobalt) or "Cu" (Copper). Basically, I've tried with the methods isupper(...), islower(...) or isalpha(...).
// first idea, but it seems to be definitely the wrong way
// read input characters from char vector
// check if element contains only one or two letters
// ... and convert them to a string, store them into a new vector
// ... finally, compute the molar mass elsewhere
// but how to deal with the numbers... ?
for (unsigned int i = 0; i < char_vec.size()-1; i++)
{
if (islower(char_vec[i]))
{
char arr[] = { char_vec[i - 1], char_vec[i] };
string temp_arr(arr, sizeof(arr));
element.push_back(temp_arr);
}
else if (isupper(char_vec[i]) && !islower(char_vec[i+1]))
{
char arrSec[] = { char_vec[i] };
string temp_arrSec(arrSec, sizeof(arrSec));
element.push_back(temp_arrSec);
}
else if (!isalpha(char_vec[i]) || char_vec[i] == '.')
{
char arrNum[] = { char_vec[i] };
string temp_arrNum(arrNum, sizeof(arrNum));
stoechiometr_num.push_back(temp_arrNum);
}
}
I need a simple algorithm which can handle with letters and numbers. There also may be the possibility working with pointer, but currently I am not so familiar with this technique. Anyway I am open to that understanding in case someone would like to explain to me how I could use them here.
I would highly appreciate any support and of course some code snippets concerning this problem, since I am thinking for many days about it without progress… Please keep in mind that I am rather a beginner than an intermediate.
This problem is surely not for a beginner but I will try to give you some idea about how you can do that.
Assumption: I am not considering Isotopes case in which atomic mass can be different with same atomic number.
Model it to real world.
How will you solve that in real life?
Say, if I give you Chemical formula: Fe(NO3)3, What you will do is:
Convert this to something like this:
Total Mass => [1 of Fe] + [3 of NO3] => [1 of Fe] + [ 3 of [1 of N + 3 of O ] ]
=> 1 * Fe + 3 * (1 * N + 3 * O)
Then, you will search for individual masses of elements and then substitute them.
Total Mass => 1 * 56 + 3 * (1 * 14 + 3 * 16)
=> 242
Now, come to programming.
Trust me, you have to do the same in programming also.
Convert your chemical formula to the form discussed above i.e. Convert Fe(NO3)3 to Fe*1+(N*1+O*3)*3. I think this is the hardest part in this problem. But it can be done also by breaking down into steps.
Check if all the elements have number after it. If not, then add "1" after it. For example, in this case, O has a number after it which is 3. But Fe and N doesn't have it.
After this step, your formula should change to Fe1(N1O3)3.
Now, Convert each number, say num of above formula to:
*num+ If there is some element after current number.
*num If you encountered ')' or end of formula after it.
After this, your formula should change to Fe*1+(N*1+O*3)*3.
Now, your problem is to solve the above formula. There is a very easy algorithm for this. Please refer to: https://www.geeksforgeeks.org/expression-evaluation/. In your case, your operands can be either a number (say 2) or an element (say Fe). Your operators can be * and +. Parentheses can also be present.
For finding individual masses, you may maintain a std::map<std::string, int> containing element name as key and its mass as value.
Hope this helps a bit.

How do I convert this function into a loop?

I have an array of letters of an unknown number of elements which contains lower case letters. I have written a function for converting a lower case number to its ASCII value
int returnVal (char x)
{
return (int) x;
}
I am trying to combine all of these values into one number. Subtracting 87 from each of these means that the value is always a 2 digit number. I am able to combine an array made up if two elements by:
returnVal (foo[0]) - 87) + returnVal (foo[1] - 87) * 100
an array made up of three elements by
returnVal (foo[0]) - 87) + returnVal (foo[1] -87) * 100 + returnVal (foo[2] - 87) * 100 * 100
I am multiplying each element by 100^its position in the array and summing them. This means that [a,b,c] would become 121110 (yes, the 'flip' having the value for 'c' first and 'a' last is intentional). Could anybody programme this (for an array of an unknown number of elements)?
EDIT: I have received no form of schooling at programming/computer science at any pojnt in my life, this is not homework. I am trying to teach myself and I have got stuck; I don't know anybody in person who I could go to for help so I asked here, apologies to those of you who are offended.
EDIT2: I know that this opinion is going to annoy a lot of people; what is the purpose of stackoverflow.com if it is not to exchange information? If I were a child who was stuck with my homework (I'm not) surely that is a valid reason for using stack overflow? Many people on this website seem to have the mindset that if a problem is asked by a beginner then it is not worth answering, which is completely fine because your time is your own. However, what genuinely bugs me is the people who see a question which they deem trivial and say "homework" and vote it down immediately. I think that this website would be far better if there wasn't an "minimum-level" knowledge required in order to ask questions, the "elitist" mindset is just childish in my opinion.
Since this is a learning exercise, here are some hints for you to complete the task yourself:
Prepare a value that will server as the "running total" for your number so far.
Start the running total at zero.
When you convert a number, say, "1234", to an int, this value would first become 1, then 12, then 123, and finally 1234
The final value of the running total is your end result
To go from a previous value to the next, multiply the prior value by ten, and add the value of the current digit to it
Your returnVal does not make sense, because in C you can very often avoid an explicit conversion of char to int. You can definitely avoid it in this case.
Making a function int digit(char c) that returns a value of decimal digit, i.e. c-'a', would be a lot more useful, because it would let you get rid of your c-87 in multiple spots.
char array[SIZE];
long factor=1;
long result=0;
for(int i=0; i<SIZE; i++)
{
result+=returnVal(foo[i])-87)*factor;
factor*=100;
}
This should work for as long as long is large enough to hold the value of 100^the position and, of course, as long as the result does not overflow.

getline() Adding Character to Front of String? -- Actually substr syntax error

I'm writing a program that will balance Chemistry Equations; I thought it'd be a good challenge and help reinforce the information I've recently learned.
My program is set up to use getline(cin, std::string) to receive the equation. From there it separates the equation into two halves: a left side and right side by making a substring when it encounters a =.
I'm having issues which only concerns the left side of my string, which is called std::string leftSide. My program then goes into a for loop that iterates over the length of leftSide. The first condition checks to see if the character is uppercase, because chemical formulas are written with the element symbols and a symbol consists of either one upper case letter, or an upper case and one lower case letter. After it checks to see if the current character is uppercase, it checks to see if the next character is lower case; if it's lower case then I create a temporary string, combine leftSide[index] with leftSide[index+1] in the temp string then push the string to my vector.
My problem lies on the first iteration; I've been using CuFe3 = 8 (right side doesn't matter right now) to test it out. The only thing stored in std::string temp is C. I'm not sure why this happening; also, I'm still getting numbers in my final answer and I don't understand why. Some help fixing these two issues, along with an explanation, would be greatly appreciated.
[CODE]
int index = 0;
for (it = leftSide.begin(); it!=leftSide.end(); ++it, index++)
{
bool UPPER_LETTER = isupper(leftSide[index]);
bool NEXT_LOWER_LETTER = islower(leftSide[index+1]);
if (UPPER_LETTER)// if the character is an uppercase letter
{
if (NEXT_LOWER_LETTER)
{
string temp = leftSide.substr(index, (index+1));//add THIS capital and next lowercase
elementSymbol.push_back(temp); // add temp to vector
temp.clear(); //used to try and fix problem initially
}
else if (UPPER_LETTER && !NEXT_LOWER_LETTER) //used to try and prevent number from getting in
{
string temp = leftSide.substr(index, index);
elementSymbol.push_back(temp);
}
}
else if (isdigit(leftSide[index])) // if it's a number
num++;
}
[EDIT] When I entered in only ASDF, *** ***S ***DF ***F was the output.
string temp = leftSide.substr(index, (index+1));
substr takes the first index and then a length, rather than first and last indices. You want substr(index, 2). Since in your example index is 0 you're doing: substr(index, 1) which creates a string of length 1, which is "C".
string temp = leftSide.substr(index, index);
Since index is 0 this is substr(index, 0), which creates a string of length 0, that is, an empty string.
When you're processing parts of the string with a higher index, such as Fe in "CuFe3" the value you pass in as the length parameter is higher and so you're creating strings that are longer. F is at index 2 and you call substr(index, 3), which creates the string "Fe3".
Also the standard library usually uses half open ranges, so even if substr took two indices (which, again, it doesn't) you would do substr(index, index+2) to get a two character string.
bool NEXT_LOWER_LETTER = islower(leftSide[index+1]);
You might want to check that index+1 is a valid index. If you don't want to do that manually you might at least switch to using the bounds checked function at() instead of operator[].

c++ ignoring same number in an array

I have an array of random numbers, for example
6 5 4 4 8
I need to sort it and remove/ignore the same numbers while printing afterwards, so what I did is I sorted everything with bubble sorth algorithm and got something like this
4 4 5 6 8
Now in order to print only different numbers I wrote this for loop
for(int i=0;i<n;i++){
if(mrst[i]!=mrst[i-1] && mrst[i]>0){
outFile << mrst[i] << " ";
}
}
My question is, the array I have is at the interval of [0:12], though the first time when I call it, it checks an array index of -1 to see if there was the same number before, but it doesn't really exist, but the value stored in there usually is a huge one, so is there a possibility that there may be stored 4 and because of it, the first number won't be printed out. If so, how to prevent it, rewrite the code so it would be optimal?
Perhaps, you're looking for std::unique algorithm:
std::sort(mrst, mrst + n);
auto last = std::unique(mrst, mrst + n);
for(auto elem = mrst; elem != last; ++elem)
outFile << *elem << " ";
Well, as you noted already, you cannot do the check mrst[i] != mrst[i-1] in case i == 0. So I'm sure you can think of a way of not doing that check in exactly this case ... (This looks very much like a homework assignment, so I'm not really willing to give you a complete solution, but I guess I hinted enough)
Note also that it's undefined behaviour to access memory outside the boundaries of an array, so what you're doing there can do anything from working correctly to crashing your program, entirely at the discretion of the compiler.
Basically you can read from any place in heap. So mrst[-1] may give you some garbage from the memory. But you really should avoid doing this. In your case you can just change "mrst[i]!=mrst[i-1] && mrst[i]>0" to "i==0 || mrst[i]!=mrst[i-1]".
In c++ "A || B" don't execute "B" if the "A" is ok.