AWS CPP TransferManager vs GetObjectRequest Stream to File fstream OOM - c++

I am using the AWS CPP SDK ( https://github.com/aws/aws-iot-device-sdk-cpp ) to download a file from S3 on a small Linux system ( only 32 MB RAM ). I was using the GetObjectRequest class as shown below. It worked great and downloaded the file to the FStream on my system so that it did not consume too much RAM.
Now, I want to convert the download code to the TransferManager method in order to get progress callbacks. I've rewritten that part of the code and it is shown below as well. It starts out fine, prints the percentage downloaded, but when it reached ~14 MB RAM ( roughly the amount available in Linux at the time of download ) it is killed by the kernel for using too much RAM.
I created a file stream just like I did for the GetObjectRequest. What am I doing wrong? How can I fix this? Thanks.
Old way that did not use all the RAM:
// Old way
GetObjectRequest getObjectRequest;
getObjectRequest.SetBucket(bucket.c_str());
getObjectRequest.SetKey(keyName.c_str());
getObjectRequest.SetResponseStreamFactory([&destination](){
return Aws::New<Aws::FStream>(
"s3file", destination, std::ios_base::out); });
GetObjectOutcome getObjectOutcome = SessionClient->GetObject(getObjectRequest);
if(getObjectOutcome.IsSuccess())
{
std::cout << "<AWS DOWNLOAD> Get FW success!" << std::endl;
}
else
{
std::cout << "<AWS DOWNLOAD> Get FW failed: " << getObjectOutcome.GetError().GetMessage() << std::endl;
exit(1);
}
New way that eventually uses too much RAM and is killed by the kernel:
// New way
Aws::Transfer::TransferManagerConfiguration transferConfig;
transferConfig.s3Client = SessionClient;
std::shared_ptr<Aws::Transfer::TransferHandle> requestPtr(nullptr);
transferConfig.downloadProgressCallback =
[](const Aws::Transfer::TransferManager*, const Aws::Transfer::TransferHandle& handle)
{
std::cout << "\r" << "<AWS DOWNLOAD> Download Progress: " << static_cast<int>(handle.GetBytesTransferred() * 100.0 / handle.GetBytesTotalSize()) << " Percent " << handle.GetBytesTransferred() << " bytes\n";
};
Aws::Transfer::TransferManager transferManager(transferConfig);
requestPtr = transferManager.DownloadFile(bucket.c_str(), keyName.c_str(), [&destination](){
Aws::FStream *stream = Aws::New<Aws::FStream>("s3file", destination, std::ios_base::out);
stream->rdbuf()->pubsetbuf(NULL, 0);
return stream; });
requestPtr->WaitUntilFinished();
size_t retries = 0;
//just make sure we don't fail because a download part failed. (e.g. network problems or interuptions)
while (requestPtr->GetStatus() == Aws::Transfer::TransferStatus::FAILED && retries++ < 5)
{
std::cout << "<AWS DOWNLOAD> FW Download trying download again!" << std::endl;
transferManager.RetryDownload(requestPtr);
requestPtr->WaitUntilFinished();
}
// Check status
if ( requestPtr->GetStatus() == Aws::Transfer::TransferStatus::COMPLETED ) {
if ( requestPtr->GetBytesTotalSize() == requestPtr->GetBytesTransferred() ) {
std::cout << "<AWS DOWNLOAD> Get FW success!" << std::endl;
exit(0);
}
else {
std::cout << "<AWS DOWNLOAD> Get FW failed - Bytes downloaded did not equal requested number of bytes: " << requestPtr->GetBytesTotalSize() << requestPtr->GetBytesTransferred() << std::endl;
exit(1);
}
}
else {
std::cout << "<AWS DOWNLOAD> Get FW failed - download was never completed even after retries" << std::endl;
exit(1);
}

TransferManager only really makes things easier once you are in the land of 10mb or larger and you want to take advantage of parallelization. It will allocate the max heap size up front and not grow the heap larger than that. Given your RAM constraints, I wouldnt use TransferManager. You can still receive the progress notifications. Check the callback mechanisms in the AmazonWebServiceRequest class.

Related

How do I stream data from retrieved "GetObject()" file to localfile on disk - C++ Aws SDK

I am trying to stream the data in the retrieved s3 file into a local file on disk. However, when I try my current code, I do not get a file stored on my computer, or in any location near the code.
I first request the object from s3 using a getObjectOutcome. After success, I want to create an ofstream and redirect the objects stream buffer to my local object so that I can create a file on disc. However, the following code does not create a file on my computer. What am I doing wrong?
Here is the get object function:
bool GetObject(const Aws::String& objectKey,
const Aws::String& fromBucket,
const Aws::Client::ClientConfiguration& clientConfig) {
Aws::S3::S3Client client(clientConfig);
Aws::S3::Model::GetObjectRequest request;
request.SetBucket(fromBucket);
request.SetKey(objectKey);
Aws::S3::Model::GetObjectOutcome outcome =
client.GetObject(request);
if (!outcome.IsSuccess()) {
const Aws::S3::S3Error& err = outcome.GetError();
std::cerr << "Error: GetObject: " <<
err.GetExceptionName() << ": " << err.GetMessage() << std::endl;
}
else {
std::cout << "Successfully retrieved '" << objectKey << "' from '"
<< fromBucket << "'." << std::endl;
std::ofstream localfile;
localfile.open(objectKey.c_str(), std::ios::out | std::ios::binary);
auto retrieved = outcome.GetResult().GetBody().rdbuf();
localfile << retrieved;
std::cout << "Done!";
}
return outcome.IsSuccess();
}
Here is an image of the memory for local file and retrieved:
Would someone teach me what I am doing this wrong, or how to correctly download data from s3 to disc?
Thanks.
I tried downloading some data from s3 to disc. I am having trouble outputting this data via stream buffer to local file. I have been looking online and cannot find a similar problem.
Update:
I am now on my second day of trying to figure this out to no avail. For some reason, the code will not even output a file after it has been created to the directory I have set up for the .nc files to be written to.
I have tried the following updates:
bool GetObject(const Aws::String& objectKey,
const Aws::String& fromBucket,
const Aws::Client::ClientConfiguration& clientConfig) {
Aws::S3::S3Client client(clientConfig);
Aws::S3::Model::GetObjectRequest request;
request.SetBucket(fromBucket);
request.SetKey(objectKey);
Aws::S3::Model::GetObjectOutcome outcome =
client.GetObject(request);
if (!outcome.IsSuccess()) {
const Aws::S3::S3Error& err = outcome.GetError();
std::cerr << "Error: GetObject: " <<
err.GetExceptionName() << ": " << err.GetMessage() << std::endl;
}
else {
std::cout << "Successfully retrieved '" << objectKey << "' from '"
<< fromBucket << "'." << std::endl;
//create the filename, which will be the objectKey
std::string local_file_name = "./netcdf/" + objectKey;
std::ofstream local_file(local_file_name, std::ios::binary);
auto &retrieved = outcome.GetResult().GetBody();
local_file << retrieved.rdbuf();
std::cout << "Done!";
}
return outcome.IsSuccess();
}
Then, after opening the ./netcdf folder, there is no output.
Here is the file structure inside my project for reference with the code:
I am still confused as to what I need to do here.
Thank you for all of the help you can offer!
You are using a folder with "./" at the front. This means that the file will be relative to the current working directory (cwd) of the binary. That is likely not the src folder
Just to get past your problem, use a full absolute path to see if the rest of your code works.
Also, try adding
// You need "#include <filesystem>" for the next line
cout << std::filesystem::current_path();
To see where the files you made might be

Trying to generate a timed connection for serial device ttys0 on unix-system

i am trying to generate a class for reading from a specific serial device.
For the start process it is necessary to send a char '1', then i have to wait for a response (254 and 255).
Within a period of 10 milliseconds i must sent the next command to the device, but this time the command length is 5 char.
When the communication hasn´t been send in the correct time, the device will run into a timeout and is sending me 255,255,255,2,4.
So i need different sizes of reading and the most importing thing for me is a timeout for the communication, cause otherwise the system will stop working by missing some values.
Therefore i have tried to generate a class using boost::asio::async_read.
It is working in the correct way, i can define the timeout,also the size of bytes to be read. When the device isn´t sending the correct size, the routine is going to be left.
But only the first time, when i try it a second time, the device isn´t sending me something. I have tried to use .open again, but it isn´t solving the issue. Also deactivating the close-function isn´t solving the issue, then the routine is running into an error.
Can someone give me a small tip for my issue. Maybe i am to blind to see my problem.... Bernd
ConnectionWithTimeout::ConnectionWithTimeout(int timeout_)
: timer_(io_service_, boost::posix_time::milliseconds(timeout_))
, serial_port_(io_service_) {
}
void ConnectionWithTimeout::ReadNumberOfChars(int numberOfCharactersToRead_)
{
buffer_.resize(numberOfCharactersToRead_);
for (int i = 0; i < numberOfCharactersToRead_; ++i) {
std::cout << "Clear Buffer[" << i << "]" << std::endl;
buffer_[i] = 0;
}
timer_.async_wait(boost::bind(&::ConnectionWithTimeout::Stop, this));
//async read from serial port
boost::asio::async_read(serial_port_, boost::asio::buffer(buffer_),
boost::bind(&ConnectionWithTimeout::ReadHandle, this,
boost::asio::placeholders::error));
io_service_.run();
}
void ConnectionWithTimeout::Stop() {
std::cout << "Connection is being closed." << std::endl;
serial_port_.close();
std::cout << "Connection has been closed." << std::endl;
}
void ConnectionWithTimeout::ReadHandle(const boost::system::error_code& ec) {
if (ec) {
std::cout << "The amount of data is to low: " << ec << std::endl;
for (std::vector<char>::iterator it = buffer_.begin();
it != buffer_.end(); ++it)
{
std::cout << int(*it) << std::endl;
}
}
else {
std::cout << "The amount of data is correct: " << ec << std::endl;
for (std::vector<char>::iterator it = buffer_.begin(); it !=
buffer_.end(); ++it)
{
std::cout << int(*it) << std::endl;
}
}
}

Increase blob upload performance

I am working on integrating a c++ application with Azure blob storage.
To achieve this I have implemented a wrapper class around wastorage 4.0 APIs.
#include "stdafx.h"
#include "AzureStorage.h"
// Microsoft Azure Library Header Includes.
#include "was\storage_account.h"
#include "was\blob.h"
struct RadAzureData{
azure::storage::cloud_storage_account storage_account;
azure::storage::cloud_blob_client blob_client;
azure::storage::cloud_blob_container container;
};
RadAzureStorage::RadAzureStorage():
RadCloudStorageInterface(RAD_STORAGE_TYPE::AZURE_CLOUD)
{
}
RadAzureStorage::RadAzureStorage(std::string accountName1, std::string accountKey1, std::string containerName1) : RadCloudStorageInterface(RAD_STORAGE_TYPE::AZURE_CLOUD)
{
std::wstring accountNameWS(accountName1.begin(), accountName1.end());
std::wstring accountKeyWS(accountKey1.begin(), accountKey1.end());
std::wstring containerNameWS(containerName1.begin(), containerName1.end());
d = new RadAzureData();
accountName = accountNameWS;
accountKey = accountKeyWS;
containerName = containerNameWS;
std::wstring connStr1 = L"AccountName=" + accountName + L";AccountKey=" + accountKey + L";DefaultEndpointsProtocol=https";
d->storage_account = azure::storage::cloud_storage_account::parse(connStr1.c_str());
// Create a blob container
d->blob_client = d->storage_account.create_cloud_blob_client();
d->container = d->blob_client.get_container_reference(containerName.c_str());
CreateContainer();
}
bool RadAzureStorage::CreateContainer()
{
try
{
d->container.create_if_not_exists();
}
catch (const azure::storage::storage_exception& e)
{
cout<<"Exception in container creation: " << e.what()<<endl;
cout <<"The request that started at:" << e.result().start_time().to_string().c_str() << " and ended at " << e.result().end_time().to_string().c_str() << " resulted in HTTP status code " << e.result().http_status_code() << " and the request ID reported by the server was " << e.result().service_request_id().c_str()<<endl;
return false;
}
return true;
}
bool RadAzureStorage::UploadFile(std::string blockBlobName, std::string dicomFileLocation)
{
std::wstring blockBlobNameWS(blockBlobName.begin(), blockBlobName.end());
std::wstring dicomFileLocationWS(dicomFileLocation.begin(), dicomFileLocation.end());
// Create a Block Blob Object.
azure::storage::cloud_block_blob block_blob = d->container.get_block_blob_reference(blockBlobNameWS.c_str());
// Upload Block Blob to container.
try
{
block_blob.upload_from_file(dicomFileLocationWS.c_str());
}
catch (const azure::storage::storage_exception& e)
{
cout<< "Exception in file upload: " << e.what() << endl;
cout<< "The request that started at:" << e.result().start_time().to_string().c_str() << " and ended at " << e.result().end_time().to_string().c_str() << " resulted in HTTP status code " << e.result().http_status_code() << " and the request ID reported by the server was " << e.result().service_request_id().c_str() << endl;
return false;
}
return true;
}
#undef __FILENAME__
From the application, it instantiates RadAzureStorage class and calls UploadFile API.
RadAzureStorage* clsi = new RadAzureStorage(accountname, acesskey, containername);
<<timer.start>>
clsi->UploadFile(blockBlobName, file);
<<timer.end>>
cout << timer.ellapsedMilliSeconds<< "ms"<< endl;
UploadFile API takes 14-16 ms for file sizes ranging between 190-250 KB.
Are there some parameters on wastorage initialization that can be modified to achieve this under 10 ms.
The current testing environment is hosted in Azure South India:
1. VM: Windows server 2016, 4v Core, 16 GB ram.
2. Hot tier access, storage account.
Please note: similar logic is implemented on C# where upload per file is being achieved under 10 ms for the same dataset.
This delay could be also caused by the type of storage accounts used. When using Premium storage accounts, you get higher performance vs GPV1.
there's also a new product that's still in Azure Premium blobs. Premium storage accounts are stored on SSD vs HDD, so you get a better performance as well when using GPV2 premium.

libarchive returns error on some entries while 7z can extract normally

I'm having trouble with libarchive version 3.3.2. I wrote a program to read selected entries in 7z archives, that look like:
file.7z
|__ file.xml
|__ file.fog
|__ file_1.fog
However, the program failed to read file_1.fog for most of my archives, and failed to read file.fog for some. I tried to use archive_error_string() to see what happens, and the errors were either corrupted archive or truncated RAR archive or Decompressing internal error.
Here's the trouble code:
void list_archive(string name) {
struct archive *a;
struct archive_entry *entry;
// create new archive struct for the file
a = archive_read_new();
archive_read_support_filter_all(a);
archive_read_support_format_all(a);
// open 7z file
int r = archive_read_open_filename(a, name.c_str(), 1024);
if (r != ARCHIVE_OK) {
cout << "cannot read file: " << name << endl;
cout << "read error: " << archive_error_string(a) << endl;
}
// looping through entries
for (;;) {
int status = archive_read_next_header(a, &entry);
// if there's no more header
if (status != ARCHIVE_OK) break;
// print some status messages to stdout
string pathname(archive_entry_pathname(entry));
cout << "working on: " << pathname << endl;
size_t entry_size = archive_entry_size(entry);
// load the entry's content
char * content;
content = (char*)malloc(entry_size);
r = archive_read_data(a, content, entry_size);
// check if archive_read_data was successful
if (r > 0) {
cout << "read " << r << " of " << entry_size << " bytes successfully\n";
// we are interested in .fog file only
if (pathname.back() == 'g') {
// do something with the .fog file
}
}
else // usually the error happens here
if (archive_errno(a) != ARCHIVE_OK) cout << "read error: " << archive_error_string(a) << endl;
// free the content and clear the entry
archive_read_data_skip(a);
free(content);
archive_entry_clear(entry);
cout << "-----" << endl;
}
// we are done with the current archive, free it
r = archive_read_free(a);
if (r != ARCHIVE_OK) {
cout << "Failed to free archive object. Error: " << archive_error_string(a) << endl;
exit(1);
}
}
I found the troublemaker and answer here if future users have the same problem.
int r = archive_read_open_filename(a, name.c_str(), 1024);
Apparently 1024 is too small for a buffer size. I increased it to 102400 and was able to read/extract all my archives.
Be aware, technically buffer size should not break functionality, it's OK to reduce speed but it's not acceptable to break the operation, therefore I think the way it's processing archives is not that reliable.

Http upload with progress info, in C++ (Poco/Boost)

I'm trying to upload a big file to my http server.
And i need to show a upload progress.
How can i get HOW MANY BYTES are sent during the upload?
Need send events to my GUI.
In poco, i don't know where to put the callback.
_session.sendRequest(_request)
_session.receiveResponse(_response)
Any ideas? or links, Thanks!!
This was 'partially' discussed in 08. Ironically I am looking for exactly the same thing.
http://sourceforge.net/mailarchive/message.php?msg_id=20619477
EDIT: 02/14/12
This is not the best, but it works... probably would best to write 1k blocks at a time.
I'd like to see your suggestions.
std::string szMessage;
.... /* fill your szMessage such as with a Form.write() */ ..
CountingOutputStream _cos( _session.sendRequest(_request) )
std::streamsize len = 0;
string::iterator it;
for ( it=szMessage.begin() ; it < szMessage.end(); it++ ) {
len ++;
_cos.put(*it);
if(len %4096 ==0)
cout << "len: " << len << endl;
}
cout << "Chars printed: " << len << endl;
std::istream& rsout = _session.receiveResponse(_response)
std::ostringstream ostr;
StreamCopier::copyStream(rsout, ostr);
// Retrieve response is not necessary if we have the resp code
std::cout << endl; response.write(cout);
std::cout << ostr.str();
int code = response.getStatus();
if (code != nRespCode) {
stringstream s;
s << "HTTP Error(*): " << code;
throw Poco::IOException(s.str());
}