LibTorch's save_to & load_from function - Model Archiving - c++

I'm fairly new to LibTorch and it's Model Archiving system.
At the moment, I'm trying to save my model configuration from one module, and load it a different module, but I'm raising an error in LibTorch that I don't quite understand.
To do this, i've been reading the API documentation here: https://pytorch.org/cppdocs/api/classtorch_1_1serialize_1_1_output_archive.html#class-documentation
which doesn't seem to be all that helpful on the matter.
I've been trying utilise as much of LibTorch as possible here, but I'm suspecting a JSON or similar storage structure might infact be easier. I'm doing it this way (rather than using a .clone() or similar) as I intend to send the data at some point in the future.
I've simplified my code below;
torch::serialize::OutputArchive
NewModule::getArchive()
{
torch::serialize::OutputArchive archive;
auto params = named_parameters(true /*recurse*/);
auto buffers = named_buffers(true /*recurse*/);
for (const auto& val : params)
{
if (!is_empty(val.value()))
archive.write(val.key(), val.value());
// Same again with a write-buffers.
return archive;
}
}
This function aims to copy the contents to a torch::serialise::OutputArchive, which can then be saved to disk, passed into an ostream or "writer function". It's the last of these I'm struggling to get working successfully.
Torch specifies that the writer function must be of type std::function<size_t(const void*, size_t)> I'm assuming (as the docs don't specify!) that the const void* is actually an array of bytes which length is determined by the second parameter, size_t I am unsure why the return value is also a size_t here.
My next block of code reads this data blob and attempts to read it using torch::serialise::InputArchive. Calling load_from here produces the error: "PytorchStreamReader failed reading zip archive: failed finding central directory"
Can anyone help resolve why this is the case?
Code below:
void
NewModule::LoadFromData(const char* data, size_t data_size)
{
torch::serialize::InputArchive archive;
archive.load_from(data, data_size);
auto params = named_parameters(true);
auto buffers = named_buffers(true);
for (auto& val : params)
{
archive.read(val.key(), val.value());
}
// Same again with a copy buffers
}
torch::serialize::OutputArchive
NewModule::copyArchive()
{
NewModule other_module;
auto archive = getArchive();
std::function<size_t(const void*, size_t)> writer_lambda = [this, other_module](const void* data, size_t size) mutable -> size_t {
other_module.LoadFromData(reinterpret_cast<const char*>(data), size);
}
archive.save_to(writer_lambda);
}

Related

Handling of const char* on ESP32

I'm working on making some Spotify API calls on an ESP32. I'm fairly new to C++ and while I seem to got it working how I wanted it to, I would like to know if it is the right way/best practice or if I was just lucky. The whole thing with chars and pointers is still quite confusing for me, no matter how much I read into it.
I'm calling the Spotify API, get a json response and parse that with the ArduinoJson library. The library returns all keys and values as const char*
The library I use to display it on a screen takes const char* as well. I got it working before with converting it to String, returning the String with the getTitle() function and converting it back to display it on screen. After I read that Strings are inefficient and best to avoid, I try to cut out the converting steps.
void getTitle()
{
// I cut out the HTTP request and stuff
DynamicJsonDocument doc(1024);
DeserializationError error = deserializeJson(doc, http.getStream(), );
JsonObject item = doc["item"];
title = item["name"]; //This is a const char*
}
const char* title = nullptr;
void loop(void) {
getTitle();
u8g2.clearBuffer();
u8g2.setDrawColor(1);
u8g2.setFont(u8g2_font_6x12_tf);
u8g2.drawStr(1, 10, title);
u8g2.sendBuffer();
}
Is it okay to do it like that?
This is not fine.
When seeing something like this, you should immediately become suspicious.
This is because in getTitle, you are asking a local object (item) for a pointer-- but you use the pointer later, when the item object no longer exists.
That means your pointer might be meaningless once you need it-- it might no longer reference your data, but some arbitrary other bytes instead (or even lead to crashes).
This problem is independent of what exact library you use, and you can often find relevant, more specific information by searching your library documentation for "lifetime" or "object ownership".
FIX
Make sure that item (and also DynamicJsonDocument, because the documentation tells you so!) both still exist when you use the data, e.g. like this:
void setTitle(const char *title)
{
u8g2.clearBuffer();
u8g2.setDrawColor(1);
u8g2.setFont(u8g2_font_6x12_tf);
u8g2.drawStr(1, 10, title);
u8g2.sendBuffer();
}
void updateTitle()
{
DynamicJsonDocument doc(1024);
DeserializationError error = deserializeJson(doc, http.getStream(), );
JsonObject item = doc["item"];
setTitle(item["name"]);
}
See also: https://arduinojson.org/v6/how-to/reuse-a-json-document/#the-best-way-to-use-arduinojson
Edit: If you want to keep parsing/display update decoupled
You could keep the JSON document "alive" for when the parsed data is needed:
/* "static" visibility, so that other c/cpp files ("translation units") can't
* mess mess with our JSON doc directly
*/
static DynamicJsonDocument doc(1024);
static const char *title;
void parseJson()
{
[...]
// super important to avoid leaking memory!!
doc.clear();
DeserializationError error = deserializeJson(doc, http.getStream(), );
// TODO: robustness/error handling (e.g. inbound JSON is missing "item")
title = doc["item"]["name"];
}
// may be nullptr when called before valid JSON was parsed
const char* getTitle()
{
return title;
}

Size of encoded avro message without encoding it

Is there a way to get the size of the encoded avro message without actually encoding it?
I'm using Avro 1.8.1 for C++.
I'm used to google protocol buffers where you can call ByteSize() on a protobuf to get the encoded size, so it's something similar i'm looking for.
Since the message in essence is a raw struct I get that the size cannot be retrieved from the message itself, but perhaps there is a helper method that i'm not aware of?
There is no way around it unfortunately...
Here is an example showing how the size can be calculated by encoding the object:
MyAvroStruct obj;
avro::EncoderPtr encoder = avro::binaryEncoder();
std::auto_ptr<avro::OutputStream> out = avro::memoryOutputStream(1);
encoder->init(*out);
avro::encode(*encoder, obj);
out->flush();
uint32_t bufferSize = out->byteCount();
(Edit below shows a hacky way to shrink-to-fit an OutputStream after writing to it with a BinaryEncoder)
It's a shame that avro::encode() doesn't use backup on the OutputStream to free unused memory after encoding. Martin G's answer gives the best solution using only the tools avro provides, but it issues N memory allocations of 1 byte each if your serialized object is N bytes in size.
You could implement a custom avro::OutputStream that simply counts and discards all written bytes. This would get rid of the memory allocations. It's still not a great approach, as the actual encoder will have to "ask" for every single byte:
(Code untested, just for demonstration purposes)
#include <avro/Encoder.hh>
#include <cstdint>
class ByteCountOutputStream : public avro::OutputStream {
public:
size_t byteCount_ = 0;
uint8_t dummyWriteLocation_;
explicit ByteCountOutputStream() {};
bool next(uint8_t **data, size_t *len) final {
byteCount_ += 1;
*data = &dummyWriteLocation_;
*len = 1;
return true;
}
void backup(size_t len) final {
byteCount_ -= len;
}
uint64_t byteCount() const final {
return byteCount_;
}
void flush() final {}
};
this could then be used as:
MyAvroStruct obj;
avro::EncoderPtr encoder = avro::binaryEncoder();
ByteCountOutputStream out();
encoder->init(out);
avro::encode(*encoder, obj);
size_t bufferSize = out.byteCount();
Edit:
My initial question when stumbling upon this was: How can I tell how many bytes of the OutputStream are required (for storing / transmitting)? Or, equivalently, if OutputStream.byteCount() returns the count of bytes allocated by the encoder so far, how can I make the encoder "backup" / release the bytes it didn't use? Well, there is a hacky way:
The Encoder abstract class provides a init method. For the BinaryEncoder, this is currently implemented as:
void BinaryEncoder::init(OutputStream &os) {
out_.reset(os);
}
with out_ being the internal StreamWriter of the Encoder.
Now, the StreamWriter implements reset as:
void reset(OutputStream &os) {
if (out_ != nullptr && end_ != next_) {
out_->backup(end_ - next_);
}
out_ = &os;
next_ = end_;
}
which will return unused memory back to the "old" OutputStream before switching to the new one.
So, you can abuse the encoder's init method like this:
// setup as always
MyAvroStruct obj;
avro::EncoderPtr encoder = avro::binaryEncoder();
std::auto_ptr<avro::OutputStream> out = avro::memoryOutputStream();
// actual serialization
encoder->init(*out);
avro::encode(*encoder, obj);
// re-init on the same OutputStream. Happens to shrink the stream to fit
encoder->init(*out);
size_t bufferSize = out->byteCount();
However, this behavior is not documented, so it might break in the future.

Deleting pointer after async call

I m trying to upload a file to azure storage and I would like to leverage the async feature. however I m having hard time to know if the cleanup is done correctly.
I would like to delete file data, and release all streams but obviously it should be done after the upload is complete.
Any comments on how I can improve that to make it more robust?
Concurrency::task<void> BlobService::UploadAsync(
cloud_blob_container container, const wchar_t* blobName,
uint8_t * data, size_t dataLength,
const wchar_t* contentType, const wchar_t* cacheControl) {
rawptr_buffer<uint8_t>* buffer = new rawptr_buffer<uint8_t>(data, dataLength);
istream inputStream = buffer->create_istream();
cloud_block_blob blob = container.get_block_blob_reference(utility::string_t(blobName));
blob.properties().set_content_type(utility::string_t(contentType));
blob.properties().set_cache_control(utility::string_t(cacheControl));
azure::storage::blob_request_options options;
options.set_maximum_execution_time(std::chrono::seconds(10800));
options.set_server_timeout(std::chrono::seconds(10800));
azure::storage::access_condition access;
azure::storage::operation_context context;
return blob.upload_from_stream_async(inputStream, access, options, context).then([buffer, inputStream, data] {
buffer->close().wait();
inputStream.close().wait();
delete[] data;
delete buffer;
});
}

std::async and lambda function in C++ gives no associated state

I'm trying to obtain a better performance in my program by using async whenever this is convenient. My program compiles, but I get the following error every time I use a function containing async calls:
C++ exception with description "No associated state"
The way I am trying to call async with a lambda is e.g. as follows:
auto f = [this](const Cursor& c){ return this->getAbsIndex(c); };
auto nodeAbsIndex = std::async(f,node); // node is const Cursor&
auto otherAbsIndex = std::async(f,other); // other too
size_t from = std::min(nodeAbsIndex.get(), otherAbsIndex.get());
size_t to = std::max(nodeAbsIndex.get(), otherAbsIndex.get());
Signature of the function to call is as follows:
uint64_t getAbsIndex(const Cursor& c) const
What am I doing wrong here? Thanks for any hints!
Diego
You can't call get() twice on the same future. Read documentation carefully (the part regarding valid()): http://en.cppreference.com/w/cpp/thread/future/get
On a side note, implicitly casting uint64_t to size_t is not good. The latter could be of smaller size.

How can I get more details about errors generated during protobuf parsing? (C++)

I am new to protobuf (C++) and my code fails during parse of my messages. How can I get more details about the errors that occurred?
Example
The following snippet illustrates the problem:
const bool ok=my_message.ParseFromCodedStream(&stream);
if(ok){
std::cout<< "message parsed. evidence:\n"<< my_message.DebugString();
}
else{
std::cerr<< "error parsing protobuf\n";
//HOW CAN I GET A REASON FOR THE FAILURE HERE?
}
If you look inside protobuf code, you will find it's using its own logging system - based on macros. By default all these messages goes to stderr, but you can capture them in your program with SetLogHandler():
typedef void LogHandler(LogLevel level, const char* filename, int line,
const std::string& message);
The possible solution is to make your own errno-like mechanism (sorry for C++11-ishness):
typedef LogMessage std::tuple<LogLevel, std::string, int, std::string>; // C++11
typedef LogStack std::list<LogMessage>;
namespace {
LogStack stack;
bool my_errno;
} // namespace
void MyLogHandler(LogLevel level, const char* filename, int line,
const std::string& message) {
stack.push_back({level, filename, line, message}); // C++11.
my_errno = true;
}
protobuf::SetLogHandler(MyLogHandler);
bool GetError(LogStack* my_stack) {
if (my_errno && my_stack) {
// Dump collected logs.
my_stack->assign(stack.begin(), stack.end());
}
stack.clear();
bool old_errno = my_errno;
my_errno = false;
return old_errno;
}
And use it in your code:
...
else {
std::cerr<< "error parsing protobuf" << std::endl;
LogStack my_stack;
if (GetError(&my_stack) {
// Handle your errors here.
}
}
The main drawback of my sample code - it doesn't work well with multiple threads. But that can be fixed on your own.
Sometimes error information will be printed to the console, but that's it. There's no way to get extra error info through the API.
That said, there are only two kinds of errors anyway:
A required field was missing. (Information should be printed to the console in this case.)
The data is corrupt. It was not generated by a valid protobuf implementation at all -- it's not even a different type of protobuf, it's simply not a protobuf.
If you are seeing the latter case, you need to compare your data on the sending and receiving side and figure out why it's different. Remember that the data you feed to the protobuf parser not only must be the same bytes, but it must end at the same place -- the protobuf parser does not know where the message ends except by receiving EOF. This means that if you are writing multiple messages to a stream, you need to write the size before the data, and make sure to read only that many bytes on the receiving end before passing on to the protobuf parser.