C++ read and edit jpg file using ifstream - c++

I'm trying to work on a simple image encryption project and I have a few questions I want to ask.
Should I store each byte of data from ifstream into a character like I did in my code?
Each byte printed is a weird symbol (which is correct), but why does adding 10(an int) to that always results in a number when printed?
int main() {
vector <char> data; // Stores each byte from image.jpg
ifstream fileIn("image.jpg", ios::binary);
int i = 0; // Used for accessing each item in data vector
while (fileIn){
//Add each character from the image file into the vector
data.push_back(fileIn.get());
cout << "Original: " << data[i] << endl; // Print each character from image.jgp
cout << "Result after adding: " << data[i] + 10 << endl; // This line is where I need help with
i++;
system("pause");
}
fileIn.close();
system("pause");
return 0;
}
Output:
Original: å
Result after adding: -112
Original: Æ
Result after adding: -100
Original:
Result after adding: 12
As you can see, adding 10 always results in a number. How do I increment these values correctly so that I can change it back later?
Thank you for any help.

When you do an arithmetic operation (like addition) with a value of a type that is smaller than int (like char in your case) then that value will be promoted to int and the operation is done using two int values.
So the expression data[i] + 10 is equivalent to static_cast<int>(data[i]) + 10.
Read more about integral promotion and arithmetic operator conversions.
As for how to solve your problem, first you have to make sure that the result of the operation actually fits in a char. What if the byte you have read is 127 and you add 10? Then the result is out of bounds of a signed char (which seems to be what you have).
If the result is not out of bounds, then you can just cast it:
char result = static_cast<char>(data[i] + 10);
As a small side-note, if you're reading binary data you are not really reading characters, so I suggest using a fixed-with integer type like int8_t or uint8_t instead of char. On supported platforms (which is just about all these days) they are just aliases for signed char and unsigned char (respectively) but using the aliases is more informative for the readers of your code.

Related

How do I find 8-bit substrings in strings with ascii values exceeding 127?

I'm struggling to work through an issue I'm running into trying to work with bitwise substrings in strings. In the example below, this simple little function does what it is supposed to for values 0-127, but fails if I attempt to work with ASCII values greater than 127. I assume this is because the string itself is signed. However, if I make it unsigned, I not only run into issues because apparently strlen() doesn't operate on unsigned strings, but I get a warning that it is a multi-char constant. Why the multiple chars? I think I have tried everything. Is there something I could do to make this work on values > 127?
#include <iostream>
#include <cstring>
const unsigned char DEF_KEY_MINOR = 0xAD;
const char *buffer = { "jhsi≠uhdfiwuui73" };
size_t isOctetInString(const char *buffer, const unsigned char octet)
{
size_t out = 0;
for (size_t i = 0; i < strlen(buffer); ++i)
{
if(!(buffer[i] ^ octet))
{
out = i;
break;
}
}
return out;
}
int main() {
std::cout << isOctetInString(buffer, 'i') << "\n";
std::cout << isOctetInString(buffer, 0x69) << "\n";
std::cout << isOctetInString(buffer, '≠') << "\n";
std::cout << isOctetInString(buffer, 0xAD) << "\n";
return 0;
}
output
3
3
0
0
Edit
Based on comments I have tried a few different things including casting the octet and buffer to unsigned int, and wchar_t, and removing the unsigned char from the octet parameter type. With either of these the outputs I am getting are
3
3
6
0
I even tried substituting the ≠ char in the buffer with
const char *buffer = {'0xAD', "jhsiuhdfiwuui73"};
however I still get warnings about multibyte characters.
As I said before, my main concern is to be able to find the bit sequence 0xAD within a string, but I am seeing now that using ascii characters or any construct making use of the ascii character set will cause issues. Since 0xAD is only 8 bits, there must be a way of doing this. Does anyone know a method for doing so?
Sign extension -- buffer[i]^octet is really unsigned(int(buffer[i])) ^ unsigned(octet). If you want buffer[] to be unsigned char, you have to define it that way.
There are multiple sources of confusion in your problem:
searching for an unsigned char value in a string can be done with strchr() which converts both the int argument and the characters in the char array to unsigned char for the comparison.
your function uses if(!(buffer[i] ^ octet)) to detect a match, which does not work if char is signed because the expression is evaluated as if(!((int)buffer[i] ^ (int)octet)) and the sign extension only occurs for buffer[i]. A simple solution is:
if ((unsigned char)buffer[i] == octet)
Note that the character ≠ might be encoded as multiple bytes on your target system, both in the source code and the terminal handling, for example code point ≠ is 8800 or 0x2260 is encoded as 0xE2 0x89 0xA0 in UTF-8. The syntax '≠' would then pose a problem. I'm not sure how C++ deals with multi-byte character constants, but C would accept them with an implementation specific value.
To see how your system handles non-ASCII bytes, you could add these lines to your main() function:
std::cout << "≠ uses " << sizeof("≠") - 1 << "bytes\n";
std::cout << "'≠' has the value " << (int)'≠' << "\n";
or more explicitly:
printf("≠ is encoded as");
for (size_t i = 0; i < sizeof("≠") - 1; i++) {
printf(" %02hhX", "≠"[i]);
}
printf(" and '≠' has a value of 0x%X\n", '≠');
On my linux system, the latter outputs:
≠ is encoded as E2 89 A0 and '≠' has a value of 0xE289A0
On my MacBook, compilation fails with this error:
notequal.c:8:48: error: character too large for enclosing character literal type
printf(" and '≠' has a value of 0x%X\n", '≠');

Difference in bitset<10> and bitset<2>(input[i]) , need explanation

I just learned some simple encryption today and wrote a simple program to convert my text to 10-bit binary. Im not sure if i'm doing it correctly, but the commented section of the code and the actual code has 2 different 10-bit outputs. I am confused. Can someone explain it to me in layman terms?
#include <iostream>
#include <string>
#include <bitset>
#include "md5.h"
using namespace std;
using std::cout;
using std::endl;
int main(int argc, char *argv[])
{
string input ="";
cout << "Please enter a string:\n>";
getline(cin, input);
cout << "You entered: " << input << endl;
cout << "md5 of " << input << ": " << md5("input") << endl;
cout << "Binary is: ";
// cout << bitset<10>(input[1]);
for (int i=0; i<5; i++)
cout << bitset<2>(input[i]);
cout << endl;
return 0;
}
tl;dr : A char is 8 bits, and the string operator[] returns the different chars, as such you accessed different chars and took the first two bits of those. The solution comes in treating a char as exactly that: 8 bits. By doing some clever bit manipulation, we can achieve the desired effect.
The problem
While I still have not completely understood, what you were trying to do, I can answer what a problem could be with this code:
By calling
cout<<bitset<10>(input[1]);
you are reading the 10 bits starting from the second character ( input[0] would start from the first character).
Now, the loop does something entirely different:
for (int i=0; i<5; i++)
cout << bitset<2>(input[i]);
It uses the i-th character of the string and constructs a bitset from it.
The reference of the bitset constructor tells us this means the char is converted to an unsigned long long, which is then converted to a bitset.
Okay, so let's see how that works with a simple input string like
std::string input = "aaaaa";
The first character of this string is 'a', which gives you the 8 bits of '01100001' (ASCII table), and thus the 10 bit bitset that is constructed from that turns out to print
0001100001
where we see a clear padding for the bits to the left (more significant).
On the other hand, if you go through the characters with your loop, you access each character and take only 2 of the bits.
In our case of the character 'a'='01100001', these bits are '01'. So then your program would output 01 five times.
Now, the way to fix it is to actually think more about the bits you are actually accessing.
A possible solution
Do you want to get the first ten bits of the character string in any case?
In that case, you'd want to write something like:
std::bitset<10>(input[0]);
//Will pad the first two bits of the bitset as '0'
or
for(int i=0;i<5;++i){
char referenced = input[i/4];
std::bitset<2>((referenced>>(6-(i%4)*2)));
}
The loop code was redesigned to read the whole string sequentially into 2 bit bitsets.
So since in a char there are 8 bits, we can read 4 of those sets out of a single char -> that is the reason for the "referenced".
The bitshift in the lower part of the loop makes it so it starts with a shift of 6, then 4, then 2, then 0, and then resets to 6 for the next char, etc...
(That way, we can extract the 2 relevant bits out of each 8bit char)
This type of loop will actually read through all parts of your string and do the correct constructions.
A last remark
To construct a bitset directly from your string, you would have to use the raw memory in bits and from that construct the bitset.
You could construct 8 bit bitsets from each char and append those to each other, or create a string from each 8 bit bitset, concatenate those and then use the final string of 1 and 0 to construct a large bitset of arbitrary size.
I hope it helped.

shifting the binary numbers in c++

#include <iostream>
int main()
{
using namespace std;
int number, result;
cout << "Enter a number: ";
cin >> number;
result = number << 1;
cout << "Result after bitshifting: " << result << endl;
}
If the user inputs 12, the program outputs 24.
In a binary representation, 12 is 0b1100. However, the result the program prints is 24 in decimal, not 8 (0b1000).
Why does this happen? How may I get the result I except?
Why does the program output 24?
You are right, 12 is 0b1100 in its binary representation. That being said, it also is 0b001100 if you want. In this case, bitshifting to the left gives you 0b011000, which is 24. The program produces the excepted result.
Where does this stop?
You are using an int variable. Its size is typically 4 bytes (32 bits) when targeting 32-bit. However, it is a bad idea to rely on int's size. Use stdint.h when you need specific sizes variables.
A word of warning for bitshifting over signed types
Using the << bitshift operator over negative values is undefined behavior. >>'s behaviour over negative values is implementation-defined. In your case, I would recommend you to use an unsigned int (or just unsigned which is the same), because int is signed.
How to get the result you except?
If you know the size (in bits) of the number the user inputs, you can use a bitmask using the & (bitwise AND) operator. e.g.
result = (number << 1) & 0b1111; // 0xF would also do the same

OpenCV documentation says that "uchar" is "unsigned integer" datatype. How?

I got confused with the openCV documentation mentioned here.
As per the documentation, if i create an image with "uchar", the pixels of that image can store unsigned integer values but if i create an image using the following code:
Mat image;
image = imread("someImage.jpg" , 0); // Read an image in "UCHAR" form
or by doing
image.create(10, 10, CV_8UC1);
for(int i=0; i<image.rows; i++)
{
for(int j=o; j<image.cols; j++)
{
image.at<uchar>(i,j) = (uchar)255;
}
}
and then if i try to print the values using
cout<<" "<<image.at<uchar>(i,j);
then i get some wierd results at terminal but if i use the following statement then i can get the values inbetween 0-255.
cout<<" "<<(int)image.at<uchar>(i,j); // with TYPECAST
Question: Why do i need to do typecast to get print the values in range 0-255 if the image itself can store "unsigned integer" values.
If you try to find definition of uchar (which is pressing F12 if you are using Visual Studio), then you'll end up in OpenCV's core/types_c.h:
#ifndef HAVE_IPL
typedef unsigned char uchar;
typedef unsigned short ushort;
#endif
which standard and reasonable way of defining unsigned integral 8bit type (i.e. "8-bit unsigned integer") since standard ensures that char always requires exactly 1 byte of memory. This means that:
cout << " " << image.at<uchar>(i,j);
uses the overloaded operator<< that takes unsigned char (char), which prints passed value in form of character, not number.
Explicit cast, however, causes another version of << to be used:
cout << " " << (int) image.at<uchar>(i,j);
and therefore it prints numbers. This issue is not related to the fact that you are using OpenCV at all.
Simple example:
char c = 56; // equivalent to c = '8'
unsigned char uc = 56;
int i = 56;
std::cout << c << " " << uc << " " << i;
outputs: 8 8 56
And if the fact that it is a template confuses you, then this behavior is also equivalent to:
template<class T>
T getValueAs(int i) { return static_cast<T>(i); }
typedef unsigned char uchar;
int main() {
int i = 56;
std::cout << getValueAs<uchar>(i) << " " << (int)getValueAs<uchar>(i);
}
Simply, because although uchar is an integer type, the stream operation << prints the character it represents, not a sequence of digits. Passing the type int you get a different overload of that same stream operation, which does print a sequence of digits.

MD5 Hashing function outputs garbage

I'm working on a C++ class, and we're learning about the MD5 hashing function. I'm running into this issue however, where I do something like:
string input = "testInput";
unsigned char *toHash = new unsigned char[input.size()+1];
strcpy( (char*)toHash, input.c_str() );
unsigned char output[MD5_DIGEST_LENGTH];
MD5(toHash, input.size(), output);
cout << hex << output << endl;
But I always get some weird garbage characters instead of what I'm looking for, something like a long string of numbers / letters. What's going on?
~
Very confused by low level C++
Don't get fooled by the "unsigned char" type of the array, all that means here is that each value in the array is going to be an 8-bit unsigned integer. In particular it doesn't imply that the data written to the array will be human-readable ASCII characters.
If you wanted to see the contents of the array in human-readable hex form, you could do this (instead of the cout command):
for (int i=0; i< MD5_DIGEST_LENGTH; i++) printf("%02x ", output[i]);
printf("\n");