MD5 Hashing function outputs garbage

MD5 Hashing function outputs garbage - c++

I'm working on a C++ class, and we're learning about the MD5 hashing function. I'm running into this issue however, where I do something like:
string input = "testInput";
unsigned char *toHash = new unsigned char[input.size()+1];
strcpy( (char*)toHash, input.c_str() );
unsigned char output[MD5_DIGEST_LENGTH];
MD5(toHash, input.size(), output);
cout << hex << output << endl;
But I always get some weird garbage characters instead of what I'm looking for, something like a long string of numbers / letters. What's going on?
~
Very confused by low level C++

Don't get fooled by the "unsigned char" type of the array, all that means here is that each value in the array is going to be an 8-bit unsigned integer. In particular it doesn't imply that the data written to the array will be human-readable ASCII characters.
If you wanted to see the contents of the array in human-readable hex form, you could do this (instead of the cout command):
for (int i=0; i< MD5_DIGEST_LENGTH; i++) printf("%02x ", output[i]);
printf("\n");

Related

How do I find 8-bit substrings in strings with ascii values exceeding 127?

I'm struggling to work through an issue I'm running into trying to work with bitwise substrings in strings. In the example below, this simple little function does what it is supposed to for values 0-127, but fails if I attempt to work with ASCII values greater than 127. I assume this is because the string itself is signed. However, if I make it unsigned, I not only run into issues because apparently strlen() doesn't operate on unsigned strings, but I get a warning that it is a multi-char constant. Why the multiple chars? I think I have tried everything. Is there something I could do to make this work on values > 127?
#include <iostream>
#include <cstring>
const unsigned char DEF_KEY_MINOR = 0xAD;
const char *buffer = { "jhsi≠uhdfiwuui73" };
size_t isOctetInString(const char *buffer, const unsigned char octet)
{
size_t out = 0;
for (size_t i = 0; i < strlen(buffer); ++i)
{
if(!(buffer[i] ^ octet))
{
out = i;
break;
}
}
return out;
}
int main() {
std::cout << isOctetInString(buffer, 'i') << "\n";
std::cout << isOctetInString(buffer, 0x69) << "\n";
std::cout << isOctetInString(buffer, '≠') << "\n";
std::cout << isOctetInString(buffer, 0xAD) << "\n";
return 0;
}
output
3
3
0
0
Edit
Based on comments I have tried a few different things including casting the octet and buffer to unsigned int, and wchar_t, and removing the unsigned char from the octet parameter type. With either of these the outputs I am getting are
3
3
6
0
I even tried substituting the ≠ char in the buffer with
const char *buffer = {'0xAD', "jhsiuhdfiwuui73"};
however I still get warnings about multibyte characters.
As I said before, my main concern is to be able to find the bit sequence 0xAD within a string, but I am seeing now that using ascii characters or any construct making use of the ascii character set will cause issues. Since 0xAD is only 8 bits, there must be a way of doing this. Does anyone know a method for doing so?

Sign extension -- buffer[i]^octet is really unsigned(int(buffer[i])) ^ unsigned(octet). If you want buffer[] to be unsigned char, you have to define it that way.

There are multiple sources of confusion in your problem:
searching for an unsigned char value in a string can be done with strchr() which converts both the int argument and the characters in the char array to unsigned char for the comparison.
your function uses if(!(buffer[i] ^ octet)) to detect a match, which does not work if char is signed because the expression is evaluated as if(!((int)buffer[i] ^ (int)octet)) and the sign extension only occurs for buffer[i]. A simple solution is:
if ((unsigned char)buffer[i] == octet)
Note that the character ≠ might be encoded as multiple bytes on your target system, both in the source code and the terminal handling, for example code point ≠ is 8800 or 0x2260 is encoded as 0xE2 0x89 0xA0 in UTF-8. The syntax '≠' would then pose a problem. I'm not sure how C++ deals with multi-byte character constants, but C would accept them with an implementation specific value.
To see how your system handles non-ASCII bytes, you could add these lines to your main() function:
std::cout << "≠ uses " << sizeof("≠") - 1 << "bytes\n";
std::cout << "'≠' has the value " << (int)'≠' << "\n";
or more explicitly:
printf("≠ is encoded as");
for (size_t i = 0; i < sizeof("≠") - 1; i++) {
printf(" %02hhX", "≠"[i]);
}
printf(" and '≠' has a value of 0x%X\n", '≠');
On my linux system, the latter outputs:
≠ is encoded as E2 89 A0 and '≠' has a value of 0xE289A0
On my MacBook, compilation fails with this error:
notequal.c:8:48: error: character too large for enclosing character literal type
printf(" and '≠' has a value of 0x%X\n", '≠');

Difference in bitset<10> and bitset<2>(input[i]) , need explanation

I just learned some simple encryption today and wrote a simple program to convert my text to 10-bit binary. Im not sure if i'm doing it correctly, but the commented section of the code and the actual code has 2 different 10-bit outputs. I am confused. Can someone explain it to me in layman terms?
#include <iostream>
#include <string>
#include <bitset>
#include "md5.h"
using namespace std;
using std::cout;
using std::endl;
int main(int argc, char *argv[])
{
string input ="";
cout << "Please enter a string:\n>";
getline(cin, input);
cout << "You entered: " << input << endl;
cout << "md5 of " << input << ": " << md5("input") << endl;
cout << "Binary is: ";
// cout << bitset<10>(input[1]);
for (int i=0; i<5; i++)
cout << bitset<2>(input[i]);
cout << endl;
return 0;
}

tl;dr : A char is 8 bits, and the string operator[] returns the different chars, as such you accessed different chars and took the first two bits of those. The solution comes in treating a char as exactly that: 8 bits. By doing some clever bit manipulation, we can achieve the desired effect.
The problem
While I still have not completely understood, what you were trying to do, I can answer what a problem could be with this code:
By calling
cout<<bitset<10>(input[1]);
you are reading the 10 bits starting from the second character ( input[0] would start from the first character).
Now, the loop does something entirely different:
for (int i=0; i<5; i++)
cout << bitset<2>(input[i]);
It uses the i-th character of the string and constructs a bitset from it.
The reference of the bitset constructor tells us this means the char is converted to an unsigned long long, which is then converted to a bitset.
Okay, so let's see how that works with a simple input string like
std::string input = "aaaaa";
The first character of this string is 'a', which gives you the 8 bits of '01100001' (ASCII table), and thus the 10 bit bitset that is constructed from that turns out to print
0001100001
where we see a clear padding for the bits to the left (more significant).
On the other hand, if you go through the characters with your loop, you access each character and take only 2 of the bits.
In our case of the character 'a'='01100001', these bits are '01'. So then your program would output 01 five times.
Now, the way to fix it is to actually think more about the bits you are actually accessing.
A possible solution
Do you want to get the first ten bits of the character string in any case?
In that case, you'd want to write something like:
std::bitset<10>(input[0]);
//Will pad the first two bits of the bitset as '0'
or
for(int i=0;i<5;++i){
char referenced = input[i/4];
std::bitset<2>((referenced>>(6-(i%4)*2)));
}
The loop code was redesigned to read the whole string sequentially into 2 bit bitsets.
So since in a char there are 8 bits, we can read 4 of those sets out of a single char -> that is the reason for the "referenced".
The bitshift in the lower part of the loop makes it so it starts with a shift of 6, then 4, then 2, then 0, and then resets to 6 for the next char, etc...
(That way, we can extract the 2 relevant bits out of each 8bit char)
This type of loop will actually read through all parts of your string and do the correct constructions.
A last remark
To construct a bitset directly from your string, you would have to use the raw memory in bits and from that construct the bitset.
You could construct 8 bit bitsets from each char and append those to each other, or create a string from each 8 bit bitset, concatenate those and then use the final string of 1 and 0 to construct a large bitset of arbitrary size.
I hope it helped.

C++ read and edit jpg file using ifstream

I'm trying to work on a simple image encryption project and I have a few questions I want to ask.
Should I store each byte of data from ifstream into a character like I did in my code?
Each byte printed is a weird symbol (which is correct), but why does adding 10(an int) to that always results in a number when printed?
int main() {
vector <char> data; // Stores each byte from image.jpg
ifstream fileIn("image.jpg", ios::binary);
int i = 0; // Used for accessing each item in data vector
while (fileIn){
//Add each character from the image file into the vector
data.push_back(fileIn.get());
cout << "Original: " << data[i] << endl; // Print each character from image.jgp
cout << "Result after adding: " << data[i] + 10 << endl; // This line is where I need help with
i++;
system("pause");
}
fileIn.close();
system("pause");
return 0;
}
Output:
Original: å
Result after adding: -112
Original: Æ
Result after adding: -100
Original:
Result after adding: 12
As you can see, adding 10 always results in a number. How do I increment these values correctly so that I can change it back later?
Thank you for any help.

When you do an arithmetic operation (like addition) with a value of a type that is smaller than int (like char in your case) then that value will be promoted to int and the operation is done using two int values.
So the expression data[i] + 10 is equivalent to static_cast<int>(data[i]) + 10.
Read more about integral promotion and arithmetic operator conversions.
As for how to solve your problem, first you have to make sure that the result of the operation actually fits in a char. What if the byte you have read is 127 and you add 10? Then the result is out of bounds of a signed char (which seems to be what you have).
If the result is not out of bounds, then you can just cast it:
char result = static_cast<char>(data[i] + 10);
As a small side-note, if you're reading binary data you are not really reading characters, so I suggest using a fixed-with integer type like int8_t or uint8_t instead of char. On supported platforms (which is just about all these days) they are just aliases for signed char and unsigned char (respectively) but using the aliases is more informative for the readers of your code.

c++ error when converting from hex string to int

I am writing a program that accepts a text file of hex values. I store these hex values in a vector<string> and then use stol to convert the hex string to an integer then I store that in a new vector<int>.
vector<string> flir_times;
vector<int> flir_dec;
for(int i = 0; i < flir_times.size() ; i++){
int x = stol(flir_times[i], nullptr, 16);
flir_dec.push_back(x);
cout << flir_dec[i] << endl;
}
The program was originally working; but today for some reason it doesn't seem to be converting some new hex values correctly. Here is a short snippet of the hex values that need to be converted:
These are the values that the program should be converting them to:
However when I run my program it converts the hex values into large negative numbers then it crashes. Does anyone have any idea what could be causing the program to not convert the hex numbers correctly then crash?

Your program continues to work correctly, it's just that the added hex numbers that you are trying to read are representations of negative 32-bit integers. For example, the most significant byte of A4B844A2 is 10100100. It has 1 in the most significant "sign" bit, so the number is actually negative.
Switch to unsigned numbers, and use std::stoul to parse input to fix this problem:
vector<string> flir_times;
vector<unsigned> flir_dec;
for(int i = 0; i < flir_times.size() ; i++){
unsigned x = stoul(flir_times[i], nullptr, 16);
flir_dec.push_back(x);
cout << flir_dec[i] << endl;
}

Printing hexadecimal values to console in C++

#include <iostream>
#include <iomanip>
using namespace std;
int main()
{
char array[10];
for(int i = 0; i<10;i++)
{
array[i] = 'a' + i;
}
char* test = array;
printf("%x\n", test);
cout << hex << test << endl;
}
The output for this is:
bffff94e
abcdefghijN???
Why is it not printing the same thing?

cout << hex << test << endl;
It prints the string, not the address. It is because there is an overload of operator<< which takes char const* as argument and this overload treats the argument as string.
If you want to print the address, cast the argument to void* so that other overload of operator<< will be invoked which will print the address.
cout << hex << static_cast<void*>(test) << endl;
will print the address, in hexadecimal format.
Note that hex stream-manipulator is not needed here, as the address will be printed in hexadecimal format anway. So
cout << static_cast<void*>(test) << endl;
is enough.

Because your program has undefined behavior. And because you ask it to
print different things.
Your invocation of printf is illegal, and results in undefined
behavior (and is a good example of why you should never use printf).
Your format specifier says to extract an unsigned int from the
argument list, and output that in hexadecimal. Passing it anything but
an unsigned int is undefined behavior. As it happens, given the way
varargs are generally implemented, if you're on a machine where
unsigneds and pointers have the same size, you'll probably output the
value of the pointer, treating its bits as if it were an unsigned.
Other behaviors are certainly possible, however. (If I'm not mistaken,
g++ will warn about this construct; it's also possible that on some
platforms, it will crash.)
In the case of std::cout, you're passig it a char*. By definition,
the char* is treated as a '\0' string, not as a pointer (and
certainly not as an unsigned int). And again, you have undefined
behavior, since your char* doesn't point to a '\0' terminated string;
you never put a '\0' at the end. (This probably explains the "N???"
you see at the end of your output. But again, undefined behavior is,
well, undefined. The code could just as easily have crashed.)
Finally, you're using both printf and std::cout; the results are not
really specified unless you do a flush of the stream between the two.
(In practice, if you're outputting to an interactive device, the flush
should occur when you output the '\n' character. If you redirect the
output to a file, however, you're likely to get something different.)
It's not clear what you want. If you want to output the address of
array, it would be:
printf( "%p\n", test );
std::cout << static_cast<void*>( test ) << std::endl;
If you want to output the string you've generated, then append a '\0' to
the end of it (without overflowing the buffer), and then:
printf( "%s\n", test );
std::cout << test << std::endl;
I'm not sure what you're trying to make "hex"; there is no such thing as
a hex representation of a string, and the representation of a pointer is
implementation defined, and not required to take into account any
formatting parameters in iostream. (Typically, on most modern machines,
it will be hex. But I've worked on more than a few where it would be
octal, and at least one where it wouldn't be just a number, regardless
of the base.) If you want a hex dump of array, you'll have to loop,
outputting each value as an unsigned in hex:
for ( int i = 0; i < 10; ++ i ) {
printf( "%x", static_cast<unsigned char>( test[i] ) );
}
printf( "\n" );
std::cout.setf( std::ios_base::hex, std::ios::basefield );
for ( int i = 0; i < 10; ++ i ) {
std::cout << static_cast<unsigned>( static_cast<unsigned char>( test[i] ) );
}
std::cout.setf( std::ios_base::dec, std::ios::basefield );
std::cout << std::endl;
Finally: a word about the casts: plain char may be either signed or
unsigned; if it is signed, converting it to an int or an
unsigned, might produce either a negative value (int) or a very
large positive value (unsigned). Thus, the first conversion to
unsigned char, which guarantees a result in the range [0, UINT_MAX].
Second, of course, we have to convert the unsigned char to unsigned:
in the case of printf, because we would otherwise have undefined
behavior, but the conversion is implicit, since passing an unsigned
char as a vararg automatically promotes it to unsigned; and
in the case std::cout, because the rules are that any character
type be output as a character, not as a numerical value (and since the
type is used here in function overload resolution, and is not being
passed to a vararg or an unsigned, there is no implicit conversion).

test itself is a pointer, i.e. it stores an address. Your printf statement prints the hexadecimal value of that address.
The cout << statement then prints the entire string, because the std::hex manipulator does not affect the way strings are printed. It only affects the way integers are printed.
What you can do is
Loop through the characters of the array
Convert each to an integer and print using the std::hex manipulator
That would look like this:
for (int i = 0 ; i < 10 ; ++i)
std::cout << std::hex << static_cast<int>(array[i]) << '\n';

cout << HEX <<
can't be used to a char* to print a hex char string,
but you can use it for int ,double,float,etc.
And, as you second print, why the string has some garbled strings is that you haven't gived
a '\n' to the string which means the end of string

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

MD5 Hashing function outputs garbage - c++

Related

How do I find 8-bit substrings in strings with ascii values exceeding 127?

Difference in bitset<10> and bitset<2>(input[i]) , need explanation

C++ read and edit jpg file using ifstream

c++ error when converting from hex string to int

Printing hexadecimal values to console in C++

Categories

Resources