Is java.util.zip.GZIPOutputStream's output byte array portable?

Is java.util.zip.GZIPOutputStream's output byte array portable? - compression

I've this following code to serialize and compress a String:
private byte[] toZip(String xml) {
try{
ByteArrayOutputStream bos = new ByteArrayOutputStream();
GZIPOutputStream gz = new GZIPOutputStream(bos);
ObjectOutputStream oos = new ObjectOutputStream(gz);
oos.writeObject(xml);
oos.flush();
oos.close();
return bos.toByteArray();
} catch (IOException e){
log.error("Error", e);
if(log.isEnabledFor(MucamLogger.FINEST))log.finest(xml);
return null;
}
}
Are the returned byte[] portable?. I store it on a blob field on the data base. Could it be retrieved and decompress with any non-Java program (C++, .Net)?. Do this non-Java program recover the original String text?.

Yes, gzip streams are portable and the original uncompressed data will be recovered exactly. Assuming of course that you are faithfully transporting the binary compressed data. I often see that get messed up due to end-of-line conversions or unicode text conversions that could be easily avoided.

Related

Cannot decode base64+deflate data

I am lost with decoding the following base64 string
nVJPb4IwFL/7KUjvAgUM8CIuZiabicsSNR68deXhWKBteGVx336FbJnz4MG+U997/f1L5yTaxsBGn3Rvt0hGK0LPO7eNIhhnBes7BVpQTaBEiwRWwm75soHID8F02mqpGzZZrwpGScZjkUgpMolpFCfRLH/DPKlmaZXGMkqrMq/CMi6Zd8COaq0K5lCYtybqca3ICmVdK+TZlIfTONxzDtEMeHZk3grJ1krY8dW7tQaCgEepH7rikLoTEHaf2AWNPtXqodUlFonDVr++9rpgH1jq82BsusT8eWPa1yd9RLHdf7HFZD4MYBTTXWRwOwJBjnZQxRaDKnKy6tL4RFrWnWzQl7qdBxfIPzwGdlbYnu4I+wrh0Tm9A8U7iKbH28s0EsCulxKJBuLgmvm693f//6sW3w==
It should be valid base64 data representing deflate data of original XML. When I try online decoder here: https://www.samltool.com/decode.php it gives me the proper XML.
I am doing these two steps:
string text = MyClass::decode_base64(input);
text = MyClass::stringDeflateDecode(text);
First I decode the base64 string:
string MyClass::decode_base64(string str)
{
using namespace boost::archive::iterators;
typedef transform_width<binary_from_base64<remove_whitespace<string::const_iterator> >, 8, 6> ItBinaryT;
try {
boost::erase_all(str, "\r");
boost::erase_all(str, "\n");
// If the input isn't a multiple of 4, pad with =
size_t num_pad_chars((4 - str.size() % 4) % 4);
str.append(num_pad_chars, '=');
size_t pad_chars(std::count(str.begin(), str.end(), '='));
std::replace(str.begin(), str.end(), '=', 'A'); // replace '=' by base64 encoding of '\0'
string output(ItBinaryT(str.begin()), ItBinaryT(str.end()));
output.erase(output.end() - pad_chars, output.end());
return output;
} catch (...) {
return string("");
}
}
The code is basically from here Decode Base64 String Using Boost and it was working fine for text-only base64 decoding (no binary deflate data).
Then I would like to decode the deflate:
string MyClass::stringDeflateDecode(const std::string& data)
{
stringstream compressed(data);
stringstream decompressed;
boost::iostreams::filtering_streambuf<boost::iostreams::input> in;
in.push(boost::iostreams::zlib_decompressor());
in.push(compressed);
boost::iostreams::copy(in, decompressed);
return decompressed.str();
}
but ::copy operation throws an exception: zlib error: iostream error
Thanks for any hints!

That is Base-64 encoded raw deflate data. That means compressed data in the deflate format, but no zlib nor gzip wrapper around that deflate data. It looks like zlib_decompressor has a noheader option that you should set to true.

Wikipedia specifies:
SAML requests or responses transmitted via HTTP Redirect have a SAMLRequest or SAMLResponse query string parameter, respectively. Before it's sent, the message is deflated (without header and checksum), base64-encoded, and URL-encoded, in that order. Upon receipt, the process is reversed to recover the original message.
The problem here is the absense of the header and checksum. I don't think boost has the library functions you need.

URL encoding in C++ and decoding in nodejs

I am transferring data from a C++ client to a nodejs server.
I compress the string using zlib deflate first, then I use curl_easy_escape to url encode the compressed string.
std::string s = zlib_compress(temp.str());
std::cout << s <<"\n";
CURL *handle = curl_easy_init();
char* o = curl_easy_escape(handle, s.data(), s.size());
std::cout << o <<"\n";
Then I send it using:
std::string bin(o);
curl_easy_setopt(handle, CURLOPT_POSTFIELDSIZE, bin.size());
curl_easy_setopt(handle, CURLOPT_POSTFIELDS, bin.data());
curl_easy_perform(handle);
When I run this, I get the output:
x??с??Ҵ4?
x%DA%D3%D1%81%80%E2%92%D2%B44%1D%03%00%1BW%03%E5
Now, I receive the second encoded string on my nodejs server as it is.
I now try to decode it.
var x = req.params;
for (var key in req.body)
{
console.log(key);
var x = unescape(key);
var buffer = new Buffer(x);
console.log(x);
zlib.inflate(buffer, function(err, buffer) {
console.log(err+" here");
});
}
Which outputs:
x%DA%D3%D1%81%80%E2%92%D2%B44%1D%03%00%1BW%03%E5
xÚÓÑâÒ´4å
Error: incorrect header check here
What is the problem here? How do I debug it?

You can debug it by printing the decimal value for each byte in the compressed string in C++ and node.js code. For C++ that code would be:
for(int i=0; i<s.size(); i++) {
std::cout << static_cast<int>(s[i]);
}
In node.js code you would need to print the decimal value for each byte contained in variable buffer.
If the decimal values for each byte are identical in both C++ and node.js parts, then zlib libraries are incompatible or functions do not match: e.g. zlib_compress in C++ may correspond to something else than zlib.inflate in node.js: maybe there is a function like zlib.decompress() .
The root cause can be in that characters are 1-byte in C++ std::string and 2-byte in node.js . Specifying the encoding when constructing Buffer in node.js may solve the problem if that is it:
var buffer = new Buffer(x, 'binary');
See https://nodejs.org/api/buffer.html#buffer_new_buffer_str_encoding
As the data is zlib compressed here, or in a general compressed case, the encoding should be binary.

Load byte array from string isn't working correctly?

I'm trying to encrypt a string, save it in a file and then later read the string from the file and decrypt it. But when I run the code I just get "length of the data to decrypt is invalid" error :/ By debugging I manged to find out that the byte array (array^ bytes) for some reason has a length of 12 when I try to decrypt the string, and it has a length of 8 when I encrypt the string.
Here is the code to encrypt the string:
String^ EncryptS(){
String^ DecryptedS;
MD5CryptoServiceProvider^ md5Crypt = gcnew MD5CryptoServiceProvider();
UTF8Encoding^ utf8Crypt = gcnew UTF8Encoding();
TripleDESCryptoServiceProvider^ crypt = gcnew TripleDESCryptoServiceProvider();
crypt->Key = md5Crypt->ComputeHash(utf8Crypt->GetBytes("123"));
crypt->Mode = CipherMode::ECB;
crypt->Padding = PaddingMode::PKCS7;
ICryptoTransform^ transCrypt = crypt->CreateEncryptor();
DecryptedS = utf8Crypt->GetString(transCrypt->TransformFinalBlock(utf8Crypt->GetBytes(form1::passwordTextBox->Text), 0, utf8Crypt->GetBytes(form1::passwordTextBox->Text)->Length));
return DecryptedS; }
And here is the code to decrypt the string
String^ decryptS(String^ encryptedS){
String^ decryptedS;
array<Byte>^ bytes;
MD5CryptoServiceProvider^ md5Crypt = gcnew MD5CryptoServiceProvider();
UTF8Encoding^ utf8Crypt = gcnew UTF8Encoding();
UTF8Encoding^ utf8ToByte = gcnew UTF8Encoding();
TripleDESCryptoServiceProvider^ crypt = gcnew TripleDESCryptoServiceProvider();
crypt->Key = md5Crypt->ComputeHash(utf8Crypt->GetBytes("123"));
crypt->Mode = CipherMode::ECB;
crypt->Padding = PaddingMode::PKCS7;
ICryptoTransform^ transCrypt = crypt->CreateDecryptor();
bytes = utf8ToByte->GetBytes(encryptedS);
return decryptedS = utf8Crypt->GetString(transCrypt->TransformFinalBlock(bytes, 0, bytes->Length)); }
I've been trying to fix this in hours now, but with no success, help would be much appreciated :)
Sorry for my bad English.

You're trying to convert an arbitrary byte array into a string using UTF-8. That's like trying to load some random text file as if it were a JPEG, and expecting it to be a valid image.
You should only use Encoding.GetString(byte[]) when the byte array really is text encoded with that encoding.
If you want to represent "arbitrary" binary data (which compressed or encrypted data typically is) you should use base64 or perhaps hex, depending on your requirements. (Convert.ToBase64String and Convert.FromBase64String are your friends.)

Uncompressing zlib data using boost::iostreams::filtering_streambuf trouble

I'm trying to write a small class that will load the chunk data from part of a minecraft world file. I'm to the point where I have stored some data in a char array which was compressed with zlib and need to decompress it.
I'm trying to use the boost filtering_streambuf to do this.
char * rawChunk = new char[length - 1];
// Load chunk data
stringstream ssRawChunk(rawChunk);
boost::iostreams::filtering_istream in;
in.push(boost::iostreams::zlib_decompressor());
in.push(ssRawChunk);
stringstream ssOut;
boost::iostreams::copy(in, ssOut);
My problem is that rawChunk contains null data, so when coping data from (char*) rawChunk to (stringstream) ssRawChunk, it terminates at ~257 instead of the expected length 2154.
Is there any way to use filtering_streambuf without stringstream to allow for null data or is there a way to stop stringstream to not terminate on null data?

You should store rawChunk into std::string which allows null-characters.

Do I need to encode using Base 64 in my web service?

I am transferring messages to mobile devices via a web service. The data is an xml string, which I compress using GZipStream, then encode using Base64.
I am getting out-of memory exceptions in the emulator and looking to optimise the process so I have stopped passing the string around by value and removed unecessary copies of byte arrays.
Now I'm wondering about the Base64 encoding. It increases the size of the message, the processing and the memory requirements. Is it strictly necessary?
Edit: Here is how I decompress:
public static byte[] ConvertMessageStringToByteArray(ref string isXml)
{
return fDecompress(Convert.FromBase64String(isXml));
}
public static byte[] fDecompress(byte[] ivBytes)
{
const int INT_BufferSize = 2048;
using (MemoryStream lvMSIn = new MemoryStream(ivBytes))
using (GZipInputStream lvZipStream = new GZipInputStream(lvMSIn, ivBytes.Length))
using (MemoryStream lvMSOut = new MemoryStream())
{
byte[] lvBuffer = new byte[INT_BufferSize];
int liSize;
while (true)
{
liSize = lvZipStream.Read(lvBuffer, 0, INT_BufferSize);
if (liSize <= 0)
break;
lvMSOut.Write(lvBuffer, 0, liSize);
}
return lvMSOut.ToArray();
}
}

gzip (which is inside the GZipStream) produces binary data - they won't fit into a 7-bit text message (SOAP is a text message) unless you do somethning like base64 encoding on them.
Perhaps the solution is to not gzip/encode (decode/ungzip) a whole buffer, but use streams for that - connect a gzipping stream to an encoding stream and read the result from the output of the latter (or connect the decoding stream to the ungzipping stream). This way you have a chance to consume less memory.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Is java.util.zip.GZIPOutputStream's output byte array portable? - compression

Yes, gzip streams are portable and the original uncompressed data will be recovered exactly. Assuming of course that you are faithfully transporting the binary compressed data. I often see that get messed up due to end-of-line conversions or unicode text conversions that could be easily avoided.

Related

Cannot decode base64+deflate data

URL encoding in C++ and decoding in nodejs

Load byte array from string isn't working correctly?

Uncompressing zlib data using boost::iostreams::filtering_streambuf trouble

Do I need to encode using Base 64 in my web service?

Categories

Resources