Decoding and saving image files from base64 C++ - c++

I'm trying to write a program in c++ that can encode images into base64 and also decode base64 into images. I believe the encoder function is working fine and some websites can take the base64 code I generate and decode it into the image fine, but for some reason once I decode the base64 into a string and then write it to a file and save it as a png it says it can't be opened in an image viewer.
I confirmed that the string that is being written to the new file is exactly the same as the existing file (when opened in a text editor), but for some reason, the new file can't be opened but the existing one can be. I have even tried just making a new file in a text editor, and copying the text from the old file into it, but it still doesn't open in an image viewer.
I believe that both of the encode functions and the base64 decode function all work fine. I think the problem is in the Image Decode function.
Image Encode Function
string base64_encode_image(const string& path) {
vector<char> temp;
std::ifstream infile;
infile.open(path, ios::binary); // Open file in binary mode
if (infile.is_open()) {
while (!infile.eof()) {
char c = (char)infile.get();
temp.push_back(c);
}
infile.close();
}
else return "File could not be opened";
string ret(temp.begin(), temp.end() - 1);
ret = base64_encode((unsigned const char*)ret.c_str(), ret.size());
return ret;
}
Image Decode Function
void base64_decode_image(const string& input) {
ofstream outfile;
outfile.open("test.png", ofstream::out);
string temp = base64_decode(input);
outfile.write(temp.c_str(), temp.size());
outfile.close();
cout << "file saved" << endl;
}
Encode Function base64
string base64_encode(unsigned const char* input, unsigned const int len) {
string ret;
size_t i = 0;
unsigned char bytes[3];
unsigned char sextets[4];
while (i <= (len - 3)) {
bytes[0] = *(input++);
bytes[1] = *(input++);
bytes[2] = *(input++);
sextets[0] = (bytes[0] & 0xfc) >> 2; // Cuts last two bits off of first byte
sextets[1] = ((bytes[0] & 0x03) << 4) + ((bytes[1] & 0xf0) >> 4); // Takes last two bits from first byte and adds it to first 4 bits of 2nd byte
sextets[2] = ((bytes[1] & 0x0f) << 2) + ((bytes[2] & 0xc0) >> 6); // Takes last 4 bits of 2nd byte and adds it to first 2 bits of third byte
sextets[3] = bytes[2] & 0x3f; // takes last 6 bits of third byte
for (size_t j = 0; j < 4; ++j) {
ret += base64_chars[sextets[j]];
}
i += 3; // increases to go to third byte
}
if (i != len) {
size_t k = 0;
size_t j = len - i; // Find index of last byte
while (k < j) { // Sets first bytes
bytes[k] = *(input++);
++k;
}
while (j < 3) { // Set last bytes to 0x00
bytes[j] = '\0';
++j;
}
sextets[0] = (bytes[0] & 0xfc) >> 2; // Cuts last two bits off of first byte
sextets[1] = ((bytes[0] & 0x03) << 4) + ((bytes[1] & 0xf0) >> 4); // Takes last two bits from first byte and adds it to first 4 bits of 2nd byte
sextets[2] = ((bytes[1] & 0x0f) << 2) + ((bytes[2] & 0xc0) >> 6); // Takes last 4 bits of 2nd byte and adds it to first 2 bits of third byte
// No last one is needed, because if there were 4, then (i == len) == true
for (j = 0; j < (len - i) + 1; ++j) { // Gets sextets that include data
ret += base64_chars[sextets[j]]; // Appends them to string
}
while ((j++) < 4) // Appends remaining ='s
ret += '=';
}
return ret;
}
Decode Function base64
string base64_decode(const string& input) {
string ret;
size_t i = 0;
unsigned char bytes[3];
unsigned char sextets[4];
while (i < input.size() && input[i] != '=') {
size_t j = i % 4; // index per sextet
if (is_base64(input[i])) sextets[j] = input[i++]; // set sextets with characters from string
else { cerr << "Non base64 string included in input (possibly newline)" << endl; return ""; }
if (i % 4 == 0) {
for (j = 0; j < 4; ++j) // Using j as a seperate index (not the same as it was originally used as, will later be reset)
sextets[j] = indexof(base64_chars, strlen(base64_chars), sextets[j]); // Change value to indicies of b64 characters and not ascii characters
bytes[0] = (sextets[0] << 2) + ((sextets[1] & 0x30) >> 4); // Similar bitshifting to before
bytes[1] = ((sextets[1] & 0x0f) << 4) + ((sextets[2] & 0x3c) >> 2);
bytes[2] = ((sextets[2] & 0x03) << 6) + sextets[3];
for (j = 0; j < 3; ++j) // Using j seperately again to iterate through bytes and adding them to full string
ret += bytes[j];
}
}
if (i % 4 != 0) {
for (size_t j = 0; j < (i % 4); ++j)
sextets[j] = indexof(base64_chars, strlen(base64_chars), sextets[j]);
bytes[0] = (sextets[0] << 2) + ((sextets[1] & 0x30) >> 4); // Similar bitshifting to before
bytes[1] = ((sextets[1] & 0x0f) << 4) + ((sextets[2] & 0x3c) >> 2);
for (size_t j = 0; j < (i % 4) - 1; ++j)
ret += bytes[j]; // Add final bytes
}
return ret;
}
When I try to open the files produced by Image decode function It says that the file format isn't supported, or that it has been corrupted.
The base64 produced by the encode function that I'm trying to decode is in this link
https://pastebin.com/S5D90Fs8

When you open outfile in base64_decode_image, you do not specify the ofstream::binary flag like you do in base64_encode_image when reading the image. Without that flag, you're writing in text mode which can alter the data you're writing (when adjusting for newlines).

Related

Facing issues trying to decode base64 image

I have a JPEG image, which is represented as a base64 encoded string. I want to save it as a decoded byte array using the Win32 API WriteFile() function.
Because I will use WriteFile(), I need a C string, and I need to know its length, strlen() is bad, because, as I understand, it counts to \0 which could not be the exact end of file. So, I need a function that decodes base64 and returns a char* and outputs the exact byte count.
I have read this answer, and chose code from here (some stuff changed, I marked it):
static const unsigned char base64_table[65] =
"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";
unsigned char * base64_decode(const unsigned char *src, size_t len,
size_t *out_len)
{
unsigned char dtable[256], *out, *pos, block[4], tmp;
size_t i, count, olen;
int pad = 0;
memset(dtable, 0x80, 256); // CHANGED
for (i = 0; i < sizeof(base64_table) - 1; i++)
dtable[base64_table[i]] = (unsigned char) i;
dtable['='] = 0;
count = 0;
for (i = 0; i < len; i++) {
if (dtable[src[i]] != 0x80)
count++;
}
if (count == 0 || count % 4)
return NULL;
olen = count / 4 * 3;
pos = out = new unsigned char[olen]; // CHANGED
if (out == NULL)
return NULL;
count = 0;
for (i = 0; i < len; i++) {
tmp = dtable[src[i]];
if (tmp == 0x80)
continue;
if (src[i] == '=')
pad++;
block[count] = tmp;
count++;
if (count == 4) {
*pos++ = (block[0] << 2) | (block[1] >> 4);
*pos++ = (block[1] << 4) | (block[2] >> 2);
*pos++ = (block[2] << 6) | block[3];
count = 0;
if (pad) {
if (pad == 1)
pos--;
else if (pad == 2)
pos -= 2;
else {
/* Invalid padding */
free(out); // CHANGED
return NULL;
}
break;
}
}
}
*out_len = pos - out;
return out;
}
Usage
unsigned char base[]="data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQABAAD/2wCEAAkGBwgHBgkIBwgKCgkLDRYPDQwMDRsUFRAWIB0iIiAdHx8kKDQsJCYxJx8fLT0tMTU3Ojo6Iys/RD84QzQ5OjcBCgoKDQwNGg8PGjclHyU3Nzc3Nzc3Nzc3Nzc3Nzc3Nzc3Nzc3Nzc3Nzc3Nzc3Nzc3Nzc3Nzc3Nzc3Nzc3Nzc3N//AABEIAGgAaAMBIgACEQEDEQH/xAAbAAADAQEBAQEAAAAAAAAAAAAABAUGAwIBB//EAD0QAAEDAgQDBQUECAcAAAAAAAEAAgMEEQUSITEGUXMTNIGxwSJBYXGRMnKh8RQjJDNjwtHwFUJSU2Kisv/EABgBAAMBAQAAAAAAAAAAAAAAAAIDBAEA/8QAHxEAAgICAwEBAQAAAAAAAAAAAAECEQMyEiExImET/9oADAMBAAIRAxEAPwD9xU7Gu7R9QeRVFT8Z7szqDyKGfgUfSOAvewuV8CXqMxksdvcp2POwniBt2gXUPbz05qZK3RIGufHT1UWb2mxksv7vcs5Ubxsh4xxVVSYg5lNUugiBOQNNtB/mctdwvWS1mGj9Im7aaM5XPtq4czb3r82w7OcQc9oLpHRsDbi+tidR4rccEvzurezLHQjs2te03Dy1uUu+ungEuEvofkxpQNLM0OYQpV+zeD72aeCsEAhSZ2WnLTs7T5JkiaIw92anB+Ca4P7vVj+P6BToHE01ju3QqlwiLQ1nW/lCPHsZPVl9CEKgQCQxju7PvjyKfSGMd3Z1B5FDPw2PpJXyZmdmm41svq9ZsjS7kpx4lIw5dRZRpqF888nZEaNym/MqzPITdx3UHDcWa/GX0WRzZM5sdw/2QfDZBSbpjI3VoZwnh6CkkM8gMk7iC5ztibW22Vijw+mw8CanpoYAXbRxhoPPQL3me54ttde8cfMMInfT2a+Jl72vbUE+Nrp/FRV0KcpSdNjlxfTT4ckhiDBv4grlhMufD4ZG7Ea/PmmKhwdH+CVytWFxcXQlE62ccwCq/CotHWdb+UKNcD3a7K1wv+6qjzlHkEWPYHJqXEIQqRAJDGO7s++PIp9T8Z7uzqDyKGepsfSWuEz7uy8l2GqWkPtu+amZQheqIbG4/BZnBW07MSdiNTOGXeWxC+5Itc+BHiVbxqXssPqH8oz5LK00Z7KzzZoO43uNB4aXWRVyGXxgzfUlRBq7t2HLuQbhq6YliFP/AIeWU0sU5ma5vsOzDkb/AFX55hVQG403tjEIQ4tc4kgjQWt42+C3OEUMEZe4DOb5sxNy48zzTJSdcULUa+mdcPYaaibDktfYckOuN0zLpvoFwN36ZbDml1XRt27FJPeQrXCutPUH+IP/ACFFnblJCtcKd2qOr6BFi2Byal1CEKonBT8Z7szqDyKoKfjXdmdQeRQz1YUfSRewNkrqd910mfYBvNcwCN91KyhEbiV9qER3t2jw38VHojDURvETs8bnFwd8/aHmm+NH2pYWt+0ZWgW+an4OI45yIgOymBc0DYEHKR/1CZiXfI6bXFIIaSWMySSMa6ITFjJCP+IOv1/BbbBH5qZvMCyn0tIKjBqqID2u0L2/MAfku3Dkl4i3f3o+V2KaLD7bnVLvu466AJh+qXkSWMQrUG4VnhTus/V9Aok+xVrhPutR1fQLcexmTUuoQhVE4KbjndWdQeRVJTcd7pH1B5FDPVhR9RIbYpd7rucV1aUsTvdSMpMdxRO+XG4KYEBjWnUnZxG/0SOKwTQ0j5KKR8b6R7ZBlNs8bi1rx9crvzXrH2ifiOSN1j7B08GL469PUwxaEVOeFwDbWBY4+YCoS+ALqZsuDw9mEMdISXSPc/X+/gpbsRnwniCWgLGsjdZ0UltXNP8AZHgq3D0magBb9nN7PysElxzS9pQQVzG/rKWQXI3DXaH8bKW3x6KEo/0aZpg8OaHN2IuFzekMDqjU4bG8nVOPNhcok7QpqpUJ1BKt8Im9LUdX0Cz9XMLezqrnBhcaSpLv930CPHuDk0NEhCFUTApmPn9kj6o8iqalcQ90j6o8ihnqwobIigpZx3XYFLSmxcFIypGGxQOPFE8hBDWs3tzA/ovdS/tcUoB/oka5w+84DyB+qpY7BFZ0shs47G6z+H1OeqfK++Vj22dblsn45p9AThXZseFayJ0D6XN+tike0j7psdfp9Vdq2Q1FNLTz6xytLHD5rH4HUNGKYk1pykSMeG32uwAnx9VoGTgblTvroa1bsS4WfJDSz08n24XljtPeDZVXHNqdVJpGhuN1pbK4h2V2TSwu0fD4KndCjZ9uznMNFoODxakqOr6BZ+YrQ8Id0qOr6BNxbisuhfQhCrJQUniPucfVHkV9QhnqwobIzbpWNOpueQS8rjI4lpy38UIUZYkJVGHRVH74lx5lJs4ep2ZsksoDjctBsEIWr8OZ9pOHqakqv0mG3aC+UuaCW33sd9U3LSTEXY8k/ByELGjbJeFVFWcXmY6GEtLhmPbHO0DS+W3w95WkuhCw2XpymOi0fBx/ZKjq+gQhMxbicuhoEIQqyU//2Q==";
unsigned char *g = base64_decode(base, 2568, &re); // length is appearing when you hover mouse on char[] in Visual Studio
// after call re equals 1921
HANDLE f2 = CreateFile(L"img.jpeg", GENERIC_WRITE, 0, 0, CREATE_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL);
DWORD wr2;
WriteFile(f2, g, re, &wr2, 0);
CloseHandle(f2);
The file I am getting is not viewable, the Photos app says it is corrupted. The main problem - it weights 1.87 kb, but should be 2.31 (I download this image from a browser).
What am I doing wrong?
As #IngoLeonhardt pointed out, I should not pass the data:image/jpeg;base64, part to the function. Now it works.

Different base64 encoding for same Image

I need to convert base64 to opencv::mat file. So I took a set of 10 frames from a video and appended base64(A)(using base64.b64encode in python).
After that, I used the code from here to convert that base64 to Mat. Images looked fine.
But they have larger file size than the original images, plus when I encoded these final images to base64(B)(using base64.b64encode in python), the encoded base64 is different from original base64(A). I can't understand why? This is also affecting the output of my application that is using the cv::mat output.
For base64 to Mat I am using code from here(asposted above).
Edited: Following is my python script to convert set of jpeg to base64 (.txt)
def img2txt(file_directory):
imageFiles = glob.glob(file_directory+"/*.jpg")
imageFiles.sort()
fileWrite='base64encoding.txt'
#print fileWrite
for i in range(0,len(imageFiles)):
image = open(imageFiles[i],'rb')
image_read = image.read()
image_64_encode = base64.b64encode(image_read)
with open (fileWrite, 'a') as f:
f.writelines(image_64_encode+'\n')
base64decode function : from here
static const std::string base64_chars =
"ABCDEFGHIJKLMNOPQRSTUVWXYZ"
"abcdefghijklmnopqrstuvwxyz"
"0123456789+/";
static inline bool is_base64(unsigned char c) {
return (isalnum(c) || (c == '+') || (c == '/'));
}
std::string base64_decode(std::string const& encoded_string) {
int in_len = encoded_string.size();
int i = 0;
int j = 0;
int in_ = 0;
unsigned char char_array_4[4], char_array_3[3];
std::string ret;
while (in_len-- && (encoded_string[in_] != '=') && is_base64(encoded_string[in_])) {
char_array_4[i++] = encoded_string[in_]; in_++;
if (i == 4) {
for (i = 0; i < 4; i++)
char_array_4[i] = base64_chars.find(char_array_4[i]);
char_array_3[0] = (char_array_4[0] << 2) + ((char_array_4[1] & 0x30) >> 4);
char_array_3[1] = ((char_array_4[1] & 0xf) << 4) + ((char_array_4[2] & 0x3c) >> 2);
char_array_3[2] = ((char_array_4[2] & 0x3) << 6) + char_array_4[3];
for (i = 0; (i < 3); i++)
ret += char_array_3[i];
i = 0;
}
}
if (i) {
/*for (j = i; j < 4; j++)
char_array_4[j] = 0;*/
for (j = 0; j < i; j++)
char_array_4[j] = base64_chars.find(char_array_4[j]);
char_array_3[0] = (char_array_4[0] << 2) + ((char_array_4[1] & 0x30) >> 4);
char_array_3[1] = ((char_array_4[1] & 0xf) << 4) + ((char_array_4[2] & 0x3c) >> 2);
//char_array_3[2] = ((char_array_4[2] & 0x3) << 6) + char_array_4[3];
for (j = 0; (j < i - 1); j++)
ret += char_array_3[j];
}
return ret;
}
C++ Main function:
int main()
{
ifstream in("TestBase64.txt");
if(!in) {
cout << "Cannot open input file.\n";
return 1;
}
int i=0;
string encoded_string;
while (getline(in, encoded_string))
{
string decoded_string = base64_decode(encoded_string);
vector<uchar> data(decoded_string.begin(), decoded_string.end());
cv::imwrite("/Frames_from_B_to_Mat/Frames_from_B_to_Mat"+b+".jpg");
i++;
}
return 0;
}
On the line:
vector<uchar> data(decoded_string.begin(), decoded_string.end());
you are presumably holding the JPEG-encoded representation of one image in data. So you may as well just write this to a binary file, rather than use cv::imwrite() which is for writing a Mat to a file.
If, for some inexplicable reason, you want to use cv::imwrite(), you need to pass it a Mat. So you would end up decoding the JPEG representation to a Mat and then encoding to JPEG and writing - which seems silly:
cv::Mat img = cv::imdecode(data, cv::IMREAD_COLOR);
cv::imwrite('result.jpg',img);
TLDR;
What I'm saying is that your data is already JPEG-encoded, you read it from a JPEG file.

C++: How to encode a std::string into base64 without losing NUL character

I am working on getting someone else's code up and running. The code is written in C++. The part that is failing is when it converts a std::string to base64:
std::string tmp = "\0";
tmp.append(strUserName);
tmp.append("\0");
tmp.append(strPassword);
tmp = base64_encode(tmp.c_str(), tmp.length());
where base64 is:
std::string base64_encode(char const* bytes_to_encode, unsigned int in_len) {
std::string ret;
int i = 0;
int j = 0;
unsigned char char_array_3[3];
unsigned char char_array_4[4];
while (in_len--) {
char_array_3[i++] = *(bytes_to_encode++);
if (i == 3) {
char_array_4[0] = (char_array_3[0] & 0xfc) >> 2;
char_array_4[1] = ((char_array_3[0] & 0x03) << 4) + ((char_array_3[1] & 0xf0) >> 4);
char_array_4[2] = ((char_array_3[1] & 0x0f) << 2) + ((char_array_3[2] & 0xc0) >> 6);
char_array_4[3] = char_array_3[2] & 0x3f;
for(i = 0; (i <4) ; i++)
ret += base64_chars[char_array_4[i]];
i = 0;
}
}
if (i)
{
for(j = i; j < 3; j++)
char_array_3[j] = '\0';
char_array_4[0] = (char_array_3[0] & 0xfc) >> 2;
char_array_4[1] = ((char_array_3[0] & 0x03) << 4) + ((char_array_3[1] & 0xf0) >> 4);
char_array_4[2] = ((char_array_3[1] & 0x0f) << 2) + ((char_array_3[2] & 0xc0) >> 6);
char_array_4[3] = char_array_3[2] & 0x3f;
for (j = 0; (j < i + 1); j++)
ret += base64_chars[char_array_4[j]];
while((i++ < 3))
ret += '=';
}
return ret;
}
It uses the 'tmp' string to make a call to a server and it's imperative that the base64 string has the two NUL characters embedded within it (before strUserName and before strPassword). However, it seems that since the code is passing tmp as a c_str(), the NUL characters are being stripped. Is there a good solution for this? Thanks.
Update I guess I should add that the code includes "#include <asm/errno.h>" which I googled for and didn't find compatibility for macOS so I just commented it out.. Not sure if that is making things not work but I doubt it. Full disclosure.
std::string tmp = "\0"; and tmp.append("\0"); don't add any '\0' characters to tmp. The versions of std::string::string and std::string::append that take a const char* take a NUL-terminated C-style string, so they stop as soon as they see a NUL character.
To actually add a NUL character to your string, you'll need to use the constructor and append methods that take a length along with a const char*, or the versions that take a count and a char:
std::string tmp("\0", 1);
tmp.append(strUserName);
tmp.append("\0", 1);
tmp.append(strPassword);
tmp = base64_encode(tmp.c_str(), tmp.length());

C++ ShiftJIS to UTF8 conversion

I need to convert Doublebyte characters. In my special case Shift-Jis into something better to handle, preferably with standard C++.
the following Question ended up without a workaround:
Doublebyte encodings on MSVC (std::codecvt): Lead bytes not recognized
So is there anyone with a suggestion or a reference on how to handle this conversion with C++ standard?
Normally I would recommend using the ICU library, but for this alone, using it is way too much overhead.
First a conversion function which takes an std::string with Shiftjis data, and returns an std::string with UTF8 (note 2019: no idea anymore if it works :))
It uses a uint8_t array of 25088 elements (25088 byte), which is used as convTable in the code. The function does not fill this variable, you have to load it from eg. a file first. The second code part below is a program that can generate the file.
The conversion function doesn't check if the input is valid ShiftJIS data.
std::string sj2utf8(const std::string &input)
{
std::string output(3 * input.length(), ' '); //ShiftJis won't give 4byte UTF8, so max. 3 byte per input char are needed
size_t indexInput = 0, indexOutput = 0;
while(indexInput < input.length())
{
char arraySection = ((uint8_t)input[indexInput]) >> 4;
size_t arrayOffset;
if(arraySection == 0x8) arrayOffset = 0x100; //these are two-byte shiftjis
else if(arraySection == 0x9) arrayOffset = 0x1100;
else if(arraySection == 0xE) arrayOffset = 0x2100;
else arrayOffset = 0; //this is one byte shiftjis
//determining real array offset
if(arrayOffset)
{
arrayOffset += (((uint8_t)input[indexInput]) & 0xf) << 8;
indexInput++;
if(indexInput >= input.length()) break;
}
arrayOffset += (uint8_t)input[indexInput++];
arrayOffset <<= 1;
//unicode number is...
uint16_t unicodeValue = (convTable[arrayOffset] << 8) | convTable[arrayOffset + 1];
//converting to UTF8
if(unicodeValue < 0x80)
{
output[indexOutput++] = unicodeValue;
}
else if(unicodeValue < 0x800)
{
output[indexOutput++] = 0xC0 | (unicodeValue >> 6);
output[indexOutput++] = 0x80 | (unicodeValue & 0x3f);
}
else
{
output[indexOutput++] = 0xE0 | (unicodeValue >> 12);
output[indexOutput++] = 0x80 | ((unicodeValue & 0xfff) >> 6);
output[indexOutput++] = 0x80 | (unicodeValue & 0x3f);
}
}
output.resize(indexOutput); //remove the unnecessary bytes
return output;
}
About the helper file: I used to have a download here, but nowadays I only know unreliable file hosters. So... either http://s000.tinyupload.com/index.php?file_id=95737652978017682303 works for you, or:
First download the "original" data from ftp://ftp.unicode.org/Public/MAPPINGS/OBSOLETE/EASTASIA/JIS/SHIFTJIS.TXT . I can't paste this here because of the length, so we have to hope at least unicode.org stays online.
Then use this program while piping/redirecting above text file in, and redirecting the binary output to a new file. (Needs a binary-safe shell, no idea if it works on Windows).
#include<iostream>
#include<string>
#include<cstdio>
using namespace std;
// pipe SHIFTJIS.txt in and pipe to (binary) file out
int main()
{
string s;
uint8_t *mapping; //same bigendian array as in converting function
mapping = new uint8_t[2*(256 + 3*256*16)];
//initializing with space for invalid value, and then ASCII control chars
for(size_t i = 32; i < 256 + 3*256*16; i++)
{
mapping[2 * i] = 0;
mapping[2 * i + 1] = 0x20;
}
for(size_t i = 0; i < 32; i++)
{
mapping[2 * i] = 0;
mapping[2 * i + 1] = i;
}
while(getline(cin, s)) //pipe the file SHIFTJIS to stdin
{
if(s.substr(0, 2) != "0x") continue; //comment lines
uint16_t shiftJisValue, unicodeValue;
if(2 != sscanf(s.c_str(), "%hx %hx", &shiftJisValue, &unicodeValue)) //getting hex values
{
puts("Error hex reading");
continue;
}
size_t offset; //array offset
if((shiftJisValue >> 8) == 0) offset = 0;
else if((shiftJisValue >> 12) == 0x8) offset = 256;
else if((shiftJisValue >> 12) == 0x9) offset = 256 + 16*256;
else if((shiftJisValue >> 12) == 0xE) offset = 256 + 2*16*256;
else
{
puts("Error input values");
continue;
}
offset = 2 * (offset + (shiftJisValue & 0xfff));
if(mapping[offset] != 0 || mapping[offset + 1] != 0x20)
{
puts("Error mapping not 1:1");
continue;
}
mapping[offset] = unicodeValue >> 8;
mapping[offset + 1] = unicodeValue & 0xff;
}
fwrite(mapping, 1, 2*(256 + 3*256*16), stdout);
delete[] mapping;
return 0;
}
Notes:
Two-byte big endian raw unicode values (more than two byte not necessary here)
First 256 chars (512 byte) for the single byte ShiftJIS chars, value 0x20 for invalid ones.
Then 3 * 256*16 chars for the groups 0x8???, 0x9??? and 0xE???
= 25088 byte
For those looking for the Shift-JIS conversion table data, you can get the uint8_t array here:
https://github.com/bucanero/apollo-ps3/blob/master/include/shiftjis.h
Also, here's a very simple function to convert basic Shift-JIS chars to ASCII:
const char SJIS_REPLACEMENT_TABLE[] =
" ,.,..:;?!\"*'`*^"
"-_????????*---/\\"
"~||--''\"\"()()[]{"
"}<><>[][][]+-+X?"
"-==<><>????*'\"CY"
"$c&%#&*#S*******"
"*******T><^_'='";
//Convert Shift-JIS characters to ASCII equivalent
void sjis2ascii(char* bData)
{
uint16_t ch;
int i, j = 0;
int len = strlen(bData);
for (i = 0; i < len; i += 2)
{
ch = (bData[i]<<8) | bData[i+1];
// 'A' .. 'Z'
// '0' .. '9'
if ((ch >= 0x8260 && ch <= 0x8279) || (ch >= 0x824F && ch <= 0x8258))
{
bData[j++] = (ch & 0xFF) - 0x1F;
continue;
}
// 'a' .. 'z'
if (ch >= 0x8281 && ch <= 0x829A)
{
bData[j++] = (ch & 0xFF) - 0x20;
continue;
}
if (ch >= 0x8140 && ch <= 0x81AC)
{
bData[j++] = SJIS_REPLACEMENT_TABLE[(ch & 0xFF) - 0x40];
continue;
}
if (ch == 0x0000)
{
//End of the string
bData[j] = 0;
return;
}
// Character not found
bData[j++] = bData[i];
bData[j++] = bData[i+1];
}
bData[j] = 0;
return;
}

C++ - zlib uncompress function crashes program

I should note that this program is adhering (at least, trying to) to the Tiled API.
I'm trying to use the uncompress() function in zlib, but for some reason my program crashes whenever I call the function. This is what I have, and all of the parameters look right, so I'm not really sure what the problem is.
// const char* filedata passed in function is Zlib compressed and Base64 encoded
uLong inLen = static_cast<uLong>((strlen(filedata)*6)/8); // Calculate the length
std::string inBuffer = BASE64_DECODE(filedata); // My data
uLongf outLen = static_cast<uLongf>(width*height*4); // Tiled API specification
Bytef* outBuffer = new Bytef(outLen); // Destination
int ret = uncompress(outBuffer, &outLen,
reinterpret_cast<Bytef*>(&inBuffer[0]), inLen);
ret returns nothing, the program crashes. Does anybody have any ideas? Here is the BASE64_DECODE function:
std::string BASE64_DECODE(std::string const& encoded_string)
{
int in_len = encoded_string.size();
int i = 0;
int j = 0;
int in_ = 0;
unsigned char char_array_4[4], char_array_3[3];
std::string ret;
while(in_len-- && ( encoded_string[in_] != '=') && is_base64(encoded_string[in_]))
{
char_array_4[i++] = encoded_string[in_]; in_++;
if (i == 4)
{
for (i = 0; i <4; i++)
char_array_4[i] = base64_chars.find(char_array_4[i]);
char_array_3[0] = (char_array_4[0] << 2) + ((char_array_4[1] & 0x30) >> 4);
char_array_3[1] = ((char_array_4[1] & 0xf) << 4) + ((char_array_4[2] & 0x3c) >> 2);
char_array_3[2] = ((char_array_4[2] & 0x3) << 6) + char_array_4[3];
for (i = 0; (i < 3); i++)
ret += char_array_3[i];
i = 0;
}
}
if(i)
{
for (j = i; j <4; j++)
char_array_4[j] = 0;
for (j = 0; j <4; j++)
char_array_4[j] = base64_chars.find(char_array_4[j]);
char_array_3[0] = (char_array_4[0] << 2) + ((char_array_4[1] & 0x30) >> 4);
char_array_3[1] = ((char_array_4[1] & 0xf) << 4) + ((char_array_4[2] & 0x3c) >> 2);
char_array_3[2] = ((char_array_4[2] & 0x3) << 6) + char_array_4[3];
for (j = 0; (j < i - 1); j++) ret += char_array_3[j];
}
return ret;
}
EDIT: If you're looking at this in the future, make sure you delete the outBuffer variable later in the program to prevent memory leaks.
Bytef isn't used with a constructor that takes a length argument. You probably meant Bytef* outBuffer = new Bytef[outLen]; (square brackets).
I think Bytef is usually typedef-ed to some primitive type anyway, so using it is analogous to something like new int[len] or new uint64_t[len].