character encoding of the server logs in s3

character encoding of the server logs in s3 - amazon-web-services

As the question suggest, I lneed to know the encoding of the data in the server logs.
I am getting the server logs using S3ObjectInputStream. as following:
amazonS3Client as3c;
S3ObjectInputStream is = as3c.getObject(bucketName, key).getObjectContent();
//read it for processing using buffered input stream.
BufferedReader br = new BufferedReader(new InputStreamReader(is,..unknown..));
//need character encoding(charset eg: UTF-8, UTF-16 etc.) of the data in the object
//to pass it to InputStreamReader.
In the docs, I only see getContentEncoding() function but I do not think that it fits my purpose.
Useful references:
ObjectMetadata
AmazonS3Interface

Did you check the other constructor of InputStreamReader? There is a constructor that receives only the InputStream as a parameter.
http://docs.oracle.com/javase/7/docs/api/java/io/InputStreamReader.html
As far as I know, the files in S3 are saved using the encoding the writer has chosen. Anyway I would suggest you to try the UTF-8 encoding and check whether it throws a UnsupportedEncodingException.

Related

Curl replacing \u in response to \\u in c++

I am sending a request using libcurl in windows and the response I get has some universal characters in them that start with \u. Libcurl is not recognizing this universal character and as a result, it escapes the \ turning the universal character to \\u.
Is there any way to fix this? I have tried using str.replace but it can not replace escaped sequences
the code I used to implent this was
#include <iostream>
#include <string>
#include <cpr/cpr.h>
int main()
{
auto r = cpr::Get(cpr::Url{"http://prayer.osamaanees.repl.co/api"});
std::string data = r.text;
std::cout << data << std::endl;
return 0;
}
This code uses the cpr library which is a wrapper for curl.
It prints out the following:
{
"times":{"Fajr":"04:58 AM","Sunrise":"06:16 AM","Dhuhr":"12:30 PM","Asr":"04:58 PM","Maghrib":"06:43 PM","Isha":"08:00 PM"},
"date":"Tuesday, 20 Mu\u1e25arram 1442AH"
}
Notice the word Mu\u1e25arram, it should have been Muḥarram but since curl escaped the \ before u it prints out as \u1e25

Your analysis is wrong. Libcurl is not escaping anything. Load the URL in a web browser of your choosing and look at the raw data that is actually being sent. For example, this is what I see in Firefox:
The server really is sending Mu\u1e25arram, not Muḥarram like you are expecting. And this is perfectly fine, because the server is sending back JSON data, and JSON is allowed to escape Unicode characters like this. Read the JSON spec, particularly Section 9 on how Unicode codepoints may be encoded using hexidecimal escape sequences (which is optional in JSON, but still allowed). \u1e25 is simply the JSON hex-escaped form of ḥ.
You are merely printing out the JSON content as-is, exactly as the server sent it. You are not actually parsing it at all. If you were to use an actual JSON parser, Mu\u1e25arram would be decoded to Muḥarram for you. For example, here is how Firefox parses the JSON:
It is not libcurl's job to decode JSON data. Its job is merely to give you the data that the server sends. It is your job to interpret the data afterwards as needed.

I would like to thank Remy for pointing out how wrong I was in thinking curl or the JSON parser was the problem when in reality I needed to convert my console to UTF-8 mode.
It was after I fixed my Codepage I was able to get the output I wanted.
For future reference, I am adding the code that fixed my problem:
We need to include Windows.h
#include <Windows.h>
Then at the start of our code:
UINT oldcp = GetConsoleOutputCP();
SetConsoleOutputCP(CP_UTF8);
After this we need to reset the console back to the original codepage with:
SetConsoleOutputCP(oldcp);

C++: Decode a HTTP response which is Base64 encoded and UTF-8 decoded

I have a C++ program which is receiving encoded binary data as a HTTP response. The response needs to be decoded and stored as a binary file. The HTTP server that is sending the binary data is written in Python and following is an example code that performs the encoding.
#!/usr/bin/env python3
import base64
# content of the file is string "abc" for testing, the file could be an image file
with open('/tmp/abc', 'rb') as _doc:
data = _doc.read()
# Get Base-64 encoded bytes
data_bytes = base64.b64encode(data)
# Convert the bytes to a string
data_str = data_bytes.decode('utf-8')
print(data_str)
Now, I want to decode the received data_str using a C++ program. I could make the python equivalent as below to work properly.
_data = data_str.encode('utf-8')
bin_data = base64.b64decode(_data)
But, with C++, I tried to use the Boost library's from_utf8 method, but no avail. Could anyone please guide the best way of decoding and getting the binary data in C++ (preferably using boost, since it is portable)?

Is it possible to write to s3 via a stream using s3 java sdk

Normally when a file has to be uploaded to s3, it has to first be written to disk, before using something like the TransferManager api to upload to the cloud. This cause data loss if the upload does not finish on time(application goes down and restarts on a different server, etc). So I was wondering if it's possible to write to a stream directly across the network with the required cloud location as the sink.

You don't say what language you're using, but I'll assume Java based on your capitalization. In which case the answer is yes: TransferManager has an upload() method that takes a PutObjectRequest, and you can construct that object around a stream.
However, there are two important caveats. The first is in the documentation for PutObjectRequest:
When uploading directly from an input stream, content length must be specified before data can be uploaded to Amazon S3
So you have to know how much data you're uploading before you start. If you're receiving an upload from the web and have a Content-Length header, then you can get the size from it. If you're just reading a stream of data that's arbitrarily long, then you have to write it to a file first (or the SDK will).
The second caveat is that this really doesn't prevent data loss: your program can still crash in the middle of reading data. One thing that it will prevent is returning a success code to the user before storing the data in S3, but you could do that anyway with a file.

Surprisingly this is not possible (at time of writing this post) with standard Java SDK. Anyhow thanks to this 3rd party library you can atleast avoid buffering huge amounts of data to either memory or disk since it buffers internally ~5MB parts and uploads them automatically within multipart upload for you.
There is also github issue open in SDK repository one can follow to get updates.

It is possible:
AmazonS3 s3Client = AmazonS3ClientBuilder.standard()
.build();
s3Client.putObject("bucket", "key", youtINputStream, s3MetData)
AmazonS3.putObject

public void saveS3Object(String key, InputStream inputStream) throws Exception {
List<PartETag> partETags = new ArrayList<>();
InitiateMultipartUploadRequest initRequest = new
InitiateMultipartUploadRequest(bucketName, key);
InitiateMultipartUploadResult initResponse =
s3.initiateMultipartUpload(initRequest);
int partSize = 5242880; // Set part size to 5 MB.
try {
byte b[] = new byte[partSize];
int len = 0;
int i = 1;
while ((len = inputStream.read(b)) >= 0) {
// Last part can be less than 5 MB. Adjust part size.
ByteArrayInputStream partInputStream = new ByteArrayInputStream(b,0,len);
UploadPartRequest uploadRequest = new UploadPartRequest()
.withBucketName(bucketName).withKey(key)
.withUploadId(initResponse.getUploadId()).withPartNumber(i)
.withFileOffset(0)
.withInputStream(partInputStream)
.withPartSize(len);
partETags.add(
s3.uploadPart(uploadRequest).getPartETag());
i++;
}
CompleteMultipartUploadRequest compRequest = new
CompleteMultipartUploadRequest(
bucketName,
key,
initResponse.getUploadId(),
partETags);
s3.completeMultipartUpload(compRequest);
} catch (Exception e) {
s3.abortMultipartUpload(new AbortMultipartUploadRequest(
bucketName, key, initResponse.getUploadId()));
}
}

Convert .tif image on ASW S3 to base64 string by C#

I have a tif image stored on AWS S3 with a path. Because some browser don't support display .tif file, so I must convert it to base64 string.
On my local, it works successfully. But, when I deploy my website to AWS, base64 string which is generated is different with on my local. So, I can't display.
This is my code:
byte[] data = (new WebClient()).DownloadData(filePath);
using (var ms = new MemoryStream(data))
{
var image = Image.FromStream(ms);
image.Save(ms, System.Drawing.Imaging.ImageFormat.Png);
byte[] imageBytes = ms.ToArray();
string base64 = Convert.ToBase64String(imageBytes);
}
Anybody has experience with this problem?
Thank you very much!

I noticed that you are reusing the MemoryStream for your source as the MemoryStream for your output.
I think you should use a separate memory stream for image.Save()

Curlpp, incomplete data from request

I am using Curlpp to send requests to various webservices to send and receive data.
So far this has worked fine since i have only used it for sending/receiving JSON data.
Now i have a situation where a webservice returns a zip file in binary form. This is where i encountered a problem where the data received is not complete.
I first had Curl set to write any data to a ostringstream by using the option WriteStream, but this proved not to be the correct approach since the data contained null characters, and thus the data stopped at the first null char.
After that, instead of using WriteStream i used WriteFunction with a callback function.
The problem in this case is that this function is always called 2 or 3 times, regardless of the amount of data.
This results in always having a few chunks of data that don't seem to be the first part of the file, although the data always contains PK as the first 2 characters, indicating a zip file.
I used several tools to verify that the data is entirely being sent to my application so this is not a problem of the webservice.
Here the code. Do note that the options like hostname, port, headers and postfields are set elsewhere.
string requestData;
size_t WriteStringCallback(char* ptr, size_t size, size_t nmemb)
{
requestData += ptr;
int totalSize= size*nmemb;
return totalSize;
}
const string CurlRequest::Perform()
{
curlpp::options::WriteFunction wf(WriteStringCallback);
this->request.setOpt( wf );
this->request.perform();
return requestData;
}
I hope anyone can help me out with this issue because i've run dry of any leads on how to fix this, also because curlpp is poorly documented(and even worse since the curlpp website disappeared).

The problem with the code is that the data is put into a std::string, despite having the data in binary (ZIP) format. I'd recommend to put the data into a stream (or a binary array).
You can also register a callback to retrieve the response headers and act in the WriteCallback according to the "Content-type".
curlpp::options::HeaderFunction to register a callback to retrieve response-headers.

std::string is not a problem, but the concatenation is:
requestData += ptr;
C string (ptr) is terminated with zero, if the input contains any zero bytes, the input will be truncated. You should wrap it into a string which knows the length of its data:
requestData += std::string(ptr, size*nmemb);

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

character encoding of the server logs in s3 - amazon-web-services

Related

Curl replacing \u in response to \\u in c++

C++: Decode a HTTP response which is Base64 encoded and UTF-8 decoded

Is it possible to write to s3 via a stream using s3 java sdk

Convert .tif image on ASW S3 to base64 string by C#

Curlpp, incomplete data from request

Categories

Resources