How to access AWS "subfolders" using ListObjects() in C++ SDK? - c++

I am trying to list the objects in the "noaa-goes16/GLM-L2-LCFA/2021/140/05" bucket. The AWS C++ SDK says that in order to list the objects, you need to use the following function:
bool ListObjects(const Aws::String& bucketName,
const Aws::Client::ClientConfiguration& clientConfig) {
Aws::S3::S3Client s3_client(clientConfig);
Aws::S3::Model::ListObjectsRequest request;
request.WithBucket(bucketName);
auto outcome = s3_client.ListObjects(request);
if (!outcome.IsSuccess()) {
std::cerr << "Error: ListObjects: " <<
outcome.GetError().GetMessage() << std::endl;
}
else {
Aws::Vector<Aws::S3::Model::Object> objects =
outcome.GetResult().GetContents();
for (Aws::S3::Model::Object& object : objects) {
std::cout << object.GetKey() << std::endl;
}
}
return outcome.IsSuccess();
}
After passing the string to the function, I get the following error:
Aws::SDKOptions options;
Aws::InitAPI(options);
Aws::Client::ClientConfiguration clientConfig;
std::string object = "noaa-goes16/GLM-L2-LCFA/2021/140/05";
ListObject(object, clientConfig);
Output:
Error: ListObjects: The specified key does not exist.
noaa-goes16/GLM-L2-LCFA/2021/140/05
I have tried to add a "/" to the bucket name, and it states again that it does not exist.
If I just try "noaa-goes16" as the bucket name, it lists 1000 files that are not relevant to my application. How do I list the files in the "/05" subfolder using the AWS C++ SDK?
I tried to list the files in a subfolder of the noaa-goes16 bucket using the ListObjects() function in the AWS C++ SDK. I recieved an error that the subfolder I asked for does not exist. However, I know that the subfolder I am asking for does exist. I have tried to add a "/" to the end of the bucket, thinking this would resolve the error. This did not help. I have found that if I only put the main bucket object as the name ("noaa-goes16"), it works. However, I cannot list objects in a specific subfolder and would like to know how.

All! I have just figured it out! Hopefully, this thread helps others.
In order to get the subfolder, you need to use a method on the request object.
For example, in order to list all of the objects in the "GLM-L2-LCFA/2021/140/05" subfolder, add the Aws::String &prefix:
bool ListObjects(const Aws::String& bucketName, const Aws::String &prefix,
const Aws::Client::ClientConfiguration& clientConfig) {
Aws::S3::S3Client s3_client(clientConfig);
Aws::S3::Model::ListObjectsRequest request;
request.WithBucket(bucketName).WithPrefix(prefix);
auto outcome = s3_client.ListObjects(request);
if (!outcome.IsSuccess()) {
std::cerr << "Error: ListObjects: " <<
outcome.GetError().GetMessage() << std::endl;
}
else {
Aws::Vector<Aws::S3::Model::Object> objects =
outcome.GetResult().GetContents();
for (Aws::S3::Model::Object& object : objects) {
std::cout << object.GetKey() << std::endl;
}
}
return outcome.IsSuccess();
Now, pass the following:
Aws::Client::ClientConfiguration clientConfig;
std::string bucket = "noaa-goes16";
std::string prefix = "GLM-L2-LCFA/2021/140/05";
ListObjects(bucket, prefix, clientConfig);
Viola!

Related

Is there a way to list more than 1000 buckets using aws sdk C++?

The AWS SDK page shows this example:
Aws::S3::S3Client s3_client;
Aws::S3::Model::ListBucketsOutcome outcome = s3_client.ListBuckets();
This, however, allows returning up to 1000 buckets ONLY!
In our organization we have more than 1k buckets.
boto3 or java interface using ECS allows me to do pagination.
I can find NOTHING, however, for C++ and I was already digging in the dark corners of the Internet.
Anyone has any idea how to do that pagination in C++ since ListBuckets() does not get any request as the argument?
NOTE: I am not looking for workarounds like executing a boto script or jni within my C++ to solve that list buckets issue. I am interested to find a proper way to use SDK, which I personally, for unknown reason to me does not exist
I was having the same problem and I came across a solution for java here (https://stackoverflow.com/a/15352712/1856251) There are a number of solution there but the one in the link showed me the direction I needed.
It seems that for listing objects there is the concept of a Marker. This marker refers to the key to start with when listing. When it is empty the first key in the bucket. If you set this to the last key returned from the previous call to ListObjects then the next call will start from there. To do that you can call the member function.
void Aws::S3::Model::ListObjectsRequest::SetMarker( Aws::String && value)
More info can be found:
https://sdk.amazonaws.com/cpp/api/0.14.3/class_aws_1_1_s3_1_1_model_1_1_list_objects_request.html#a72bef4f7da7f91661a7642da7dc3aa36
Here is the code.
void myListObjects(const Aws::String& bucketName,
const Aws::String& region)
{
Aws::Client::ClientConfiguration config;
if (!region.empty())
{
config.region = region;
}
Aws::S3::S3Client s3_client(config);
Aws::S3::Model::ListObjectsRequest request;
request.WithBucket(bucketName);
std::string prelastMarker, lastMarker;
std::cout << "Objects in bucket '" << bucketName << "': "
<< std::endl << std::endl;
do {
auto outcome = s3_client.ListObjects(request);
prelastMarker = lastMarker;
if (outcome.IsSuccess())
{
Aws::Vector<Aws::S3::Model::Object> keyList =
outcome.GetResult().GetContents();
for (Aws::S3::Model::Object& object : keyList)
{
std::cout << object.GetKey() << std::endl;
}
lastMarker = keyList[keyList.size() - 1].GetKey();//outcome.GetResult().GetNextMarker();
request.SetMarker(lastMarker.c_str());
outcome = s3_client.ListObjects(request);
}
} while (prelastMarker != lastMarker);
}

How to list files in S3 within given start & end date range using AWS C++ SDK

I could not find any method or sample code to list files from S3 created within a given date range.
I tried WithIfModifiedSince, SetIfModifiedSince with GetObjectRequest.
#include <aws/core/Aws.h>
#include <aws/s3/S3Client.h>
#include <aws/s3/model/ListObjectsRequest.h>
#include <aws/s3/model/Object.h>
/* list_files_s3_create_withen_given_date_range */
{
Aws::S3::S3Client s3_client;
Aws::S3::Model::ListObjectsRequest objects_request;
objects_request.WithBucket(bucket_name);
/* This lists all the files in s3 bucket. */
/* But how to get files within a given date range. */
//Aws::S3::Model::GetObjectRequest object_request;
//object_request.SetBucket(bucket_name);
//object_request.WithIfModifiedSince(DateTime)
auto list_objects_outcome = s3_client.ListObjects(objects_request);
if (list_objects_outcome.IsSuccess())
{
Aws::Vector<Aws::S3::Model::Object> object_list =
list_objects_outcome.GetResult().GetContents();
for (auto const &s3_object : object_list)
{
std::cout << "* " << s3_object.GetKey() << std::endl;
}
}
}
//------------------------------------
Aws::Utils::DateTime startdt = Aws::Utils::DateTime::DateTime("2019-10-23T10:00:00Z", Aws::Utils::DateFormat::ISO_8601);
Aws::Utils::DateTime enddt = Aws::Utils::DateTime::Now(); Aws::S3::Model::GetObjectRequest object_request; object_request.SetBucket(bucket_name); object_request.WithIfModifiedSince(startdt);
//object_request.SetIfModifiedSince(startdt);
auto object_outcome = s3_client.GetObject(objects_request);
if (object_outcome.IsSuccess())
{
std::cout << object_outcome.GetResultWithOwnership().GetETag() << std::endl;
}
The code is not returning any object(file) from s3 bucket why ?
I got the solution:
List all the files from the AWS S3 bucket using ListObjectRequest() & ListObjects()
iterate through all the files 1 by 1
pass file name as key to getobjectrequest()
pass startdate to GetObjectRequest using WithIfModifiedSince()
pass enddate to GetObjectRequest using WithIfUnmodifiedSince()
call GetObject()
This will return you the object from s3 bucket only if its created or modified within given datetime range.
But its not the most efficient solution, since we are sending separate GetObject () request for each file to aws s3 which is increasing the requests made to s3.
Please let me know if there is a more efficient way to get all files within a given datetime range in a single request.

Accessing specified key from s3 bucket?

I have a S3 bucket xxx. I wrote one lambda function to access data from s3 bucket and writing those details to a RDS PostgreSQL instance. I can do it with my code. I added one trigger to the lambda function for invoking the same when a file falls on s3.
But from my code I can only read file having name 'sampleData.csv'. consider my code given below
public class LambdaFunctionHandler implements RequestHandler<S3Event, String> {
private AmazonS3 s3 = AmazonS3ClientBuilder.standard().build();
public LambdaFunctionHandler() {}
// Test purpose only.
LambdaFunctionHandler(AmazonS3 s3) {
this.s3 = s3;
}
#Override
public String handleRequest(S3Event event, Context context) {
context.getLogger().log("Received event: " + event);
String bucket = "xxx";
String key = "SampleData.csv";
System.out.println(key);
try {
S3Object response = s3.getObject(new GetObjectRequest(bucket, key));
String contentType = response.getObjectMetadata().getContentType();
context.getLogger().log("CONTENT TYPE: " + contentType);
// Read the source file as text
AmazonS3 s3Client = new AmazonS3Client();
String body = s3Client.getObjectAsString(bucket, key);
System.out.println("Body: " + body);
System.out.println();
System.out.println("Reading as stream.....");
System.out.println();
BufferedReader br = new BufferedReader(new InputStreamReader(response.getObjectContent()));
// just saving the excel sheet data to the DataBase
String csvOutput;
try {
Class.forName("org.postgresql.Driver");
Connection con = DriverManager.getConnection("jdbc:postgresql://ENDPOINT:5432/DBNAME","USER", "PASSWORD");
System.out.println("Connected");
// Checking EOF
while ((csvOutput = br.readLine()) != null) {
String[] str = csvOutput.split(",");
String name = str[1];
String query = "insert into schema.tablename(name) values('"+name+"')";
Statement statement = con.createStatement();
statement.executeUpdate(query);
}
System.out.println("Inserted Successfully!!!");
}catch (Exception ase) {
context.getLogger().log(String.format(
"Error getting object %s from bucket %s. Make sure they exist and"
+ " your bucket is in the same region as this function.", key, bucket));
// throw ase;
}
return contentType;
} catch (Exception e) {
e.printStackTrace();
context.getLogger().log(String.format(
"Error getting object %s from bucket %s. Make sure they exist and"
+ " your bucket is in the same region as this function.", key, bucket));
throw e;
}
}
From my code you can see that I mentioned key="SampleData.csv"; is there any way to get the key inside a bucket without specifying a specific file name?
These couple of links would be of help.
http://docs.aws.amazon.com/AmazonS3/latest/dev/ListingKeysHierarchy.html
http://docs.aws.amazon.com/AmazonS3/latest/dev/ListingObjectKeysUsingJava.html
You can list objects using prefix and delimiter to find the key you are looking for without passing a specific filename.
If you need to get the event details on S3, you can actually enable the s3 event notifier to lambda function. Refer the link
You can enable this by,
Click on 'Properties' inside your bucket
Click on 'Events '
Click 'Add notification'
Give a name and select the type of event (eg. Put, delete etc.)
Give prefix and suffix if necessary or else leave blank which consider all events
Then 'Sent to' Lambda function and provide the Lambda ARN.
Now the event details will be sent lambda function as a json format. You can fetch the details from that json. The input will be like this:
{"Records":[{"eventVersion":"2.0","eventSource":"aws:s3","awsRegion":"ap-south-1","eventTime":"2017-11-23T09:25:54.845Z","eventName":"ObjectRemoved:Delete","userIdentity":{"principalId":"AWS:AIDAJASDFGZTLA6UZ7YAK"},"requestParameters":{"sourceIPAddress":"52.95.72.70"},"responseElements":{"x-amz-request-id":"A235BER45D4974E","x-amz-id-2":"glUK9ZyNDCjMQrgjFGH0t7Dz19eBrJeIbTCBNI+Pe9tQugeHk88zHOY90DEBcVgruB9BdU0vV8="},"s3":{"s3SchemaVersion":"1.0","configurationId":"sns","bucket":{"name":"example-bucket1","ownerIdentity":{"principalId":"AQFXV36adJU8"},"arn":"arn:aws:s3:::example-bucket1"},"object":{"key":"SampleData.csv","sequencer":"005A169422CA7CDF66"}}}]}
You can access the key as objectname = event['Records'][0]['s3']['object']['key'](Oops, this is for python)
and then sent this info to RDS.

findAndGetString() in DCMTK returns null for the tag

I am developing a quick DICOM viewer using DCMTK library and I am following the example provided in this link.
The buffer from the API always returns null for any tag ID, eg: DCM_PatientName.
But the findAndGetOFString() API works fine but returns only the first character of the tag in ASCII, is this how this API should work?
Can someone let me know why the buffer is empty the former API?
Also the DicomImage API also the same issue.
Snippet 1:
DcmFileFormat fileformat;
OFCondition status = fileformat.loadFile(test_data_file_path.toStdString().c_str());
if (status.good())
{
OFString patientName;
char* name;
if (fileformat.getDataset()->findAndGetOFString(DCM_PatientName, patientName).good())
{
name = new char[patientName.length()];
strcpy(name, patientName.c_str());
}
else
{
qDebug() << "Error: cannot access Patient's Name!";
}
}
else
{
qDebug() << "Error: cannot read DICOM file (" << status.text() << ")";
}
In the above snippet name has the ASCII value "50" and the actual name is "PATIENT".
Snippet 2:
DcmFileFormat file_format;
OFCondition status = file_format.loadFile(test_data_file_path.toStdString().c_str());
std::shared_ptr<DcmDataset> dataset(file_format.getDataset());
qDebug() << "\nInformation extracted from DICOM file: \n";
const char* buffer = nullptr;
DcmTagKey key = DCM_PatientName;
dataset->findAndGetString(key,buffer);
std::string tag_value = buffer;
qDebug() << "Patient name: " << tag_value.c_str();
In the above snippet, the buffer is null. It doesn't read the name.
NOTE:
This is only a sample. I am just playing around the APIs for learning
purpose.
The following sample method reads the patient name from a DcmDataset object:
std::string getPatientName(DcmDataset& dataset)
{
// Get the tag's value in ofstring
OFString ofstring;
OFCondition condition = dataset.findAndGetOFString(DCM_PatientName, ofstring);
if(condition.good())
{
// Tag found. Put it in a std::string and return it
return std::string(ofstring.c_str());
}
// Tag not found
return ""; // or throw if you need the tag
}
I have tried your code with your datasets. I just replaced the output to QT console classes to std::cout. It works for me - i.e. it prints the correct patient name (e.g. "PATIENT2" for scan2.dcm). Everything seems correct, except for the fact that you apparently want to transfer the ownership for the dataset to a smart pointer.
To obtain the ownership for the DcmDataset from the DcmFileFormat, you must call getAndRemoveDataset() instead of getDataset(). However, I do not think that your issue is related that. You may want to try my modified snippet:
DcmFileFormat file_format;
OFCondition status = file_format.loadFile("d:\\temp\\StackOverflow\\scan2.dcm");
std::shared_ptr<DcmDataset> dataset(file_format.getAndRemoveDataset());
std::cout << "\nInformation extracted from DICOM file: \n";
const char* buffer = nullptr;
DcmTagKey key = DCM_PatientName;
dataset->findAndGetString(key, buffer);
std::string tag_value = buffer;
std::cout << "Patient name: " << tag_value.c_str();
It probably helps you to know that your code and the dcmtk methods you use are correct, but that does not solve your problem. Another thing I would recommend is to verify the result returned by file_format.loadFile(). Maybe there is a surprise in there.
Not sure if I can help you more, but my next step would be to verify your build environment, e.g. the options that you use for building dcmtk. Are you using CMake to build dcmtk?

Boost log select destination file

Is it possible with one instance of Boost log, to log into severeal files.
I mean is it possible to specify in which file the log will be written:
BOOST_LOG_..(...) << "aaa" <- go to **A.log**
BOOST_LOG_..(...) << "bbb" <- go to **B.log**
Yes, it's possible - using filters.
How you do it exactly depends on your preferences, but here's an example with scoped logger tags:
void SomeFunction()
{
{
// everything in this scope gets logged to A.log
BOOST_LOG_SCOPED_LOGGER_TAG(lg, "Log", std::string, "LogA")
BOOST_LOG(lg) << "aaa";
BOOST_LOG(lg) << "aaa2";
}
{
// everything in this scope gets logged to B.log
BOOST_LOG_SCOPED_LOGGER_TAG(lg, "Log", std::string, "LogB")
BOOST_LOG(lg) << "bbb";
BOOST_LOG(lg) << "bbb2";
}
}
// This is your log initialization routine
void InitLogs()
{
// Initialize sinkA to use a file backend that writes to A.log and sinkB to B.log.
// ...
// ...
// Make sink A only accept records with the Log attribute "LogA"
// while sink B will only accept records where it is "LogB".
sinkA.set_filter(flt::attr<std::string>("Log") == "LogA");
sinkB.set_filter(flt::attr<std::string>("Log") == "LogB");
}