Deleting pointer after async call - c++

I m trying to upload a file to azure storage and I would like to leverage the async feature. however I m having hard time to know if the cleanup is done correctly.
I would like to delete file data, and release all streams but obviously it should be done after the upload is complete.
Any comments on how I can improve that to make it more robust?
Concurrency::task<void> BlobService::UploadAsync(
cloud_blob_container container, const wchar_t* blobName,
uint8_t * data, size_t dataLength,
const wchar_t* contentType, const wchar_t* cacheControl) {
rawptr_buffer<uint8_t>* buffer = new rawptr_buffer<uint8_t>(data, dataLength);
istream inputStream = buffer->create_istream();
cloud_block_blob blob = container.get_block_blob_reference(utility::string_t(blobName));
blob.properties().set_content_type(utility::string_t(contentType));
blob.properties().set_cache_control(utility::string_t(cacheControl));
azure::storage::blob_request_options options;
options.set_maximum_execution_time(std::chrono::seconds(10800));
options.set_server_timeout(std::chrono::seconds(10800));
azure::storage::access_condition access;
azure::storage::operation_context context;
return blob.upload_from_stream_async(inputStream, access, options, context).then([buffer, inputStream, data] {
buffer->close().wait();
inputStream.close().wait();
delete[] data;
delete buffer;
});
}

Related

LibTorch's save_to & load_from function - Model Archiving

I'm fairly new to LibTorch and it's Model Archiving system.
At the moment, I'm trying to save my model configuration from one module, and load it a different module, but I'm raising an error in LibTorch that I don't quite understand.
To do this, i've been reading the API documentation here: https://pytorch.org/cppdocs/api/classtorch_1_1serialize_1_1_output_archive.html#class-documentation
which doesn't seem to be all that helpful on the matter.
I've been trying utilise as much of LibTorch as possible here, but I'm suspecting a JSON or similar storage structure might infact be easier. I'm doing it this way (rather than using a .clone() or similar) as I intend to send the data at some point in the future.
I've simplified my code below;
torch::serialize::OutputArchive
NewModule::getArchive()
{
torch::serialize::OutputArchive archive;
auto params = named_parameters(true /*recurse*/);
auto buffers = named_buffers(true /*recurse*/);
for (const auto& val : params)
{
if (!is_empty(val.value()))
archive.write(val.key(), val.value());
// Same again with a write-buffers.
return archive;
}
}
This function aims to copy the contents to a torch::serialise::OutputArchive, which can then be saved to disk, passed into an ostream or "writer function". It's the last of these I'm struggling to get working successfully.
Torch specifies that the writer function must be of type std::function<size_t(const void*, size_t)> I'm assuming (as the docs don't specify!) that the const void* is actually an array of bytes which length is determined by the second parameter, size_t I am unsure why the return value is also a size_t here.
My next block of code reads this data blob and attempts to read it using torch::serialise::InputArchive. Calling load_from here produces the error: "PytorchStreamReader failed reading zip archive: failed finding central directory"
Can anyone help resolve why this is the case?
Code below:
void
NewModule::LoadFromData(const char* data, size_t data_size)
{
torch::serialize::InputArchive archive;
archive.load_from(data, data_size);
auto params = named_parameters(true);
auto buffers = named_buffers(true);
for (auto& val : params)
{
archive.read(val.key(), val.value());
}
// Same again with a copy buffers
}
torch::serialize::OutputArchive
NewModule::copyArchive()
{
NewModule other_module;
auto archive = getArchive();
std::function<size_t(const void*, size_t)> writer_lambda = [this, other_module](const void* data, size_t size) mutable -> size_t {
other_module.LoadFromData(reinterpret_cast<const char*>(data), size);
}
archive.save_to(writer_lambda);
}

How to access serialized data of Cap'n'Proto?

I'm working with Cap'n'Proto and my understanding is there is no need to do serialization as it's already being done. So my question is, how would I access the serialized data and get it's size so that I can pass it in as a byte array to another library.
// person.capnp
struct Person {
name #0 :Text;
age #1 :Int16;
}
// ...
::capnp::MallocMessageBuilder message;
Person::Builder person = message.initRoot<Person>();
person.setName("me");
person.setAge(20);
// at this point, how do I get some sort of handle to
// the serialized data of 'person' as well as it's size?
I've seen the writePackedMessageToFd(fd, message); call, but didn't quite understand what was being passed and couldn't find any API docs on it. I also wasn't trying to write to a file descriptor as I need the serialized data returned as const void*.
Looking in Capnproto's message.h file is this function which is in the base class for MallocMessageBuilder which says it gets the raw data making up the message.
kj::ArrayPtr<const kj::ArrayPtr<const word>> getSegmentsForOutput();
// Get the raw data that makes up the message.
But even then, Im' not sure how to get it as const void*.
Thoughts?
::capnp::MallocMessageBuilder message;
is your binary message, and its size is
message.sizeInWords()
(size in bytes divided by 8).
This appears to be whats needed.
// ...
::capnp::MallocMessageBuilder message;
Person::Builder person = message.initRoot<Person>();
person.setName("me");
person.setAge(20);
kj::Array<capnp::word> dataArr = capnp::messageToFlatArray(message);
kj::ArrayPtr<kj::byte> bytes = dataArr.asBytes();
std::string data(bytes.begin(), bytes.end());
const void* dataPtr = data.c_str();
At this point, I have a const void* dataPtr and size using data.size().

Handling of const char* on ESP32

I'm working on making some Spotify API calls on an ESP32. I'm fairly new to C++ and while I seem to got it working how I wanted it to, I would like to know if it is the right way/best practice or if I was just lucky. The whole thing with chars and pointers is still quite confusing for me, no matter how much I read into it.
I'm calling the Spotify API, get a json response and parse that with the ArduinoJson library. The library returns all keys and values as const char*
The library I use to display it on a screen takes const char* as well. I got it working before with converting it to String, returning the String with the getTitle() function and converting it back to display it on screen. After I read that Strings are inefficient and best to avoid, I try to cut out the converting steps.
void getTitle()
{
// I cut out the HTTP request and stuff
DynamicJsonDocument doc(1024);
DeserializationError error = deserializeJson(doc, http.getStream(), );
JsonObject item = doc["item"];
title = item["name"]; //This is a const char*
}
const char* title = nullptr;
void loop(void) {
getTitle();
u8g2.clearBuffer();
u8g2.setDrawColor(1);
u8g2.setFont(u8g2_font_6x12_tf);
u8g2.drawStr(1, 10, title);
u8g2.sendBuffer();
}
Is it okay to do it like that?
This is not fine.
When seeing something like this, you should immediately become suspicious.
This is because in getTitle, you are asking a local object (item) for a pointer-- but you use the pointer later, when the item object no longer exists.
That means your pointer might be meaningless once you need it-- it might no longer reference your data, but some arbitrary other bytes instead (or even lead to crashes).
This problem is independent of what exact library you use, and you can often find relevant, more specific information by searching your library documentation for "lifetime" or "object ownership".
FIX
Make sure that item (and also DynamicJsonDocument, because the documentation tells you so!) both still exist when you use the data, e.g. like this:
void setTitle(const char *title)
{
u8g2.clearBuffer();
u8g2.setDrawColor(1);
u8g2.setFont(u8g2_font_6x12_tf);
u8g2.drawStr(1, 10, title);
u8g2.sendBuffer();
}
void updateTitle()
{
DynamicJsonDocument doc(1024);
DeserializationError error = deserializeJson(doc, http.getStream(), );
JsonObject item = doc["item"];
setTitle(item["name"]);
}
See also: https://arduinojson.org/v6/how-to/reuse-a-json-document/#the-best-way-to-use-arduinojson
Edit: If you want to keep parsing/display update decoupled
You could keep the JSON document "alive" for when the parsed data is needed:
/* "static" visibility, so that other c/cpp files ("translation units") can't
* mess mess with our JSON doc directly
*/
static DynamicJsonDocument doc(1024);
static const char *title;
void parseJson()
{
[...]
// super important to avoid leaking memory!!
doc.clear();
DeserializationError error = deserializeJson(doc, http.getStream(), );
// TODO: robustness/error handling (e.g. inbound JSON is missing "item")
title = doc["item"]["name"];
}
// may be nullptr when called before valid JSON was parsed
const char* getTitle()
{
return title;
}

Size of encoded avro message without encoding it

Is there a way to get the size of the encoded avro message without actually encoding it?
I'm using Avro 1.8.1 for C++.
I'm used to google protocol buffers where you can call ByteSize() on a protobuf to get the encoded size, so it's something similar i'm looking for.
Since the message in essence is a raw struct I get that the size cannot be retrieved from the message itself, but perhaps there is a helper method that i'm not aware of?
There is no way around it unfortunately...
Here is an example showing how the size can be calculated by encoding the object:
MyAvroStruct obj;
avro::EncoderPtr encoder = avro::binaryEncoder();
std::auto_ptr<avro::OutputStream> out = avro::memoryOutputStream(1);
encoder->init(*out);
avro::encode(*encoder, obj);
out->flush();
uint32_t bufferSize = out->byteCount();
(Edit below shows a hacky way to shrink-to-fit an OutputStream after writing to it with a BinaryEncoder)
It's a shame that avro::encode() doesn't use backup on the OutputStream to free unused memory after encoding. Martin G's answer gives the best solution using only the tools avro provides, but it issues N memory allocations of 1 byte each if your serialized object is N bytes in size.
You could implement a custom avro::OutputStream that simply counts and discards all written bytes. This would get rid of the memory allocations. It's still not a great approach, as the actual encoder will have to "ask" for every single byte:
(Code untested, just for demonstration purposes)
#include <avro/Encoder.hh>
#include <cstdint>
class ByteCountOutputStream : public avro::OutputStream {
public:
size_t byteCount_ = 0;
uint8_t dummyWriteLocation_;
explicit ByteCountOutputStream() {};
bool next(uint8_t **data, size_t *len) final {
byteCount_ += 1;
*data = &dummyWriteLocation_;
*len = 1;
return true;
}
void backup(size_t len) final {
byteCount_ -= len;
}
uint64_t byteCount() const final {
return byteCount_;
}
void flush() final {}
};
this could then be used as:
MyAvroStruct obj;
avro::EncoderPtr encoder = avro::binaryEncoder();
ByteCountOutputStream out();
encoder->init(out);
avro::encode(*encoder, obj);
size_t bufferSize = out.byteCount();
Edit:
My initial question when stumbling upon this was: How can I tell how many bytes of the OutputStream are required (for storing / transmitting)? Or, equivalently, if OutputStream.byteCount() returns the count of bytes allocated by the encoder so far, how can I make the encoder "backup" / release the bytes it didn't use? Well, there is a hacky way:
The Encoder abstract class provides a init method. For the BinaryEncoder, this is currently implemented as:
void BinaryEncoder::init(OutputStream &os) {
out_.reset(os);
}
with out_ being the internal StreamWriter of the Encoder.
Now, the StreamWriter implements reset as:
void reset(OutputStream &os) {
if (out_ != nullptr && end_ != next_) {
out_->backup(end_ - next_);
}
out_ = &os;
next_ = end_;
}
which will return unused memory back to the "old" OutputStream before switching to the new one.
So, you can abuse the encoder's init method like this:
// setup as always
MyAvroStruct obj;
avro::EncoderPtr encoder = avro::binaryEncoder();
std::auto_ptr<avro::OutputStream> out = avro::memoryOutputStream();
// actual serialization
encoder->init(*out);
avro::encode(*encoder, obj);
// re-init on the same OutputStream. Happens to shrink the stream to fit
encoder->init(*out);
size_t bufferSize = out->byteCount();
However, this behavior is not documented, so it might break in the future.

How to optimize parse data flow algorithm?

I need to implement some abstract protocol client-server conversation parsing library with C++. I don't have file containing the whole client-server conversation, but have to parse it on the fly. I have to implement following interface:
class parsing_class
{
public:
void on_data( const char* data, size_t len );
//other functions
private:
size_t pos_;// current position in the data flow
bool first_part_parsed_;
bool second_part_parsed_;
//... some more bool markers or something like vector< bool >
};
The data is passed to my class through on_data function. Data chunk length varies from one call to another. I know protocol's packet format and know how conversation should be organized, so I can judge by current pos_ whether i have enough data to parse Nth part.
Now the implementation is like following:
void parsing_class::on_data( const char* data, size_t len )
{
pos_ += len;
if( pos > FIRST_PART_SIZE and !first_part_parsed_ )
parse_first_part( data, len );
if( pos > SECOND_PART_SIZE and !second_part_parsed_ )
parse_second_part( data, len );
//and so on..
}
What I want is some tips how to optimize this algorithm. Maybe to avoid these numerous if ( on_data may be called very many times and each time it will have to go through all switches ).
You don't need all those bool and pos_, as they seem to only keep the state of what of the conversation has passed so that you can continue with the next part.
How about the following: write yourself a parse function for each of the parts of the conversation
bool parse_part_one(const char *data) {
... // parse the data
next_fun = parse_part_two;
return true;
}
bool parse_part_two(const char *data) {
... // parse the data
next_fun = parse_part_thee;
return true;
}
...
and in your class you add a pointer to the current parse function, starting at one. Now, in on_data all you do is to call the next parse function
bool success = next_fun(data);
Because each function sets the pointer to the next parse function, the next call of on_data will invoke the next parse function automatically. No tests required of where in the conversation you are.
If the value of len is critical (which I assume it would be) then pass that along as well and return false to indicate that the part could not be parsed (don't update next_fun in that case either).