Apache arrow - read serialized VectorSchemaRoot to C++

Apache arrow - read serialized VectorSchemaRoot to C++ - apache-arrow

I have a Java library that is writing an Arrow Table to a VectorSchemaRoot object in memory. Those serailized bytes are available to me in a std::string object in C++. How do I de-serialize and read the data?
Java:
try (final ArrowStreamWriter arrowStreamWriter
= new ArrowStreamWriter(vectorSchemaRoot, provider, outputStream)) {
arrowStreamWriter.start();
arrowStreamWriter.writeBatch();
arrowStreamWriter.end();
return buffer.byteArray();
}
C++
std::string bytes;
???

Assuming you've writen a RecordBatch, I think you can read it back this way:
#include <arrow/api.h>
#include <arrow/ipc/writer.h>
#include <arrow/io/memory.h>
// ...
std::shared_ptr<arrow::io::BufferReader> bufferReader = std::make_shared<arrow::io::BufferReader>(bytes);
std::shared_ptr<arrow::ipc::RecordBatchStreamReader> reader = arrow::ipc::RecordBatchStreamReader::Open(bufferReader.get()).ValueOrDie();
std::shared_ptr<arrow::RecordBatch> recordBatchBack = reader->Next().ValueOrDie();
std::cout << recordBatchBack->num_rows() << std::endl;
Here's an end to end test in c++:
#include <arrow/api.h>
#include <arrow/ipc/writer.h>
#include <arrow/ipc/reader.h>
#include <arrow/io/memory.h>
BOOST_AUTO_TEST_CASE(RecordBatchStreamReaderTest) {
arrow::Int32Builder builder;
builder.Append(1);
builder.Append(2);
builder.Append(3);
auto schema = arrow::schema({arrow::field("hello", arrow::int32())});
auto structArray = arrow::StructArray::Make({builder.Finish().ValueOrDie()}, {"hello"}).ValueOrDie();
auto recordBatch = arrow::RecordBatch::FromStructArray(structArray).ValueOrDie();
auto outputStream = arrow::io::BufferOutputStream::Create().ValueOrDie();
auto writer = arrow::ipc::MakeStreamWriter(outputStream.get(), schema).ValueOrDie();
writer->WriteRecordBatch(*recordBatch);
writer->Close();
auto buffer = outputStream->Finish().ValueOrDie();
std::string bytes = buffer->ToString();
std::shared_ptr<arrow::io::BufferReader> bufferReader = std::make_shared<arrow::io::BufferReader>(bytes);
std::shared_ptr<arrow::ipc::RecordBatchStreamReader> reader = arrow::ipc::RecordBatchStreamReader::Open(bufferReader.get()).ValueOrDie();
std::shared_ptr<arrow::RecordBatch> recordBatchBack = reader->Next().ValueOrDie();
std::cout << recordBatchBack->num_rows() << std::endl;
}

Related

C++ double free or corruption (out)

Here is my code:
#include "Accounts.h"
using namespace Vibranium;
void Accounts::LoadTable(RowResult &res) {
std::vector<AccountsStruct> accounts;
AccountsStruct accountsStruct;
for (Row row : res.fetchAll()){
accountsStruct.id = row[0].get<int>();
accountsStruct.email = row[1].get<std::string>();
accountsStruct.warTag = row[2].get<std::string>();
accountsStruct.state = row[4].get<int>();
accountsStruct.name = row[5].get<std::string>();
accountsStruct.lastname = row[6].get<std::string>();
accountsStruct.country = row[7].get<std::string>();
accountsStruct.dob_month = row[8].get<int>();
accountsStruct.dob_day = row[9].get<int>();
accountsStruct.dob_year = row[10].get<int>();
accountsStruct.balance = row[11].get<double>();
accountsStruct.created_at = row[12].get<std::string>();
accountsStruct.updated_at = row[13].get<std::string>();
accountsStruct.account_role = row[15].get<int>();
accountsStruct.rank = row[16].get<int>();
accountsStruct.playerRole = row[17].get<int>();
Data.emplace_back(&accountsStruct);
}
std::cout << "SIZE: " << Data.size() << std::endl;
}
Data is std::vector<std::unique_ptr<DataStruct>> Data;.
To add into the vector I call Data.emplace_back(&accountsStruct); which leads me to the following output:
SIZE: 2
double free or corruption (out)
Process finished with exit code 134 (interrupted by signal 6: SIGABRT)
I am sure this line Data.emplace_back(&accountsStruct); is causing the issue. Why? How can I fix it?

You're trying to free memory not allocated with new (stack memory, to be precise).
std::vector<std::unique_ptr<DataStruct>> Data;
AccountsStruct accountsStruct; // <-- a stack variable
Data.emplace_back(&accountsStruct); // <-- an instance of unique_ptr is created using the address of accountsStruct
So when Data is destroyed, unique_ptr calls delete on that pointer (not good!!).
I can think of 2 possible solutions:
Allocate accountsStruct on the heap using std::make_unique:
for (auto& row : res.fetchAll()) {
Data.emplace_back(std::make_unique<AccountsStruct>()); // allocate a new instance on the heap
AccountsStruct& accountsStruct = *Data.back(); // get a reference to that instance
accountsStruct.id = row[0].get<int>(); // fill it normally ...
accountsStruct.email = row[1].get<std::string>();
accountsStruct.warTag = row[2].get<std::string>();
. . .
Simplify Data to store by-value: std::vector<DataStruct> Data;
for (auto& row : res.fetchAll()) {
Data.emplace_back(); // allocates a new instance of AccountsStruct in-place
AccountsStruct& accountsStruct = Data.back(); // get a reference to that instance
accountsStruct.id = row[0].get<int>(); // fill it normally ...
accountsStruct.email = row[1].get<std::string>();
accountsStruct.warTag = row[2].get<std::string>();
. . .

Data should contain a std::unique_ptr<AccountsStruct>. I'm afraid, unique_ptr can't be created from AccountsStruct.
So, create struct dynamically, fills by data, create unique_ptr from pointer and add it to vector.

The problem in your code is that you provide an address of a local variable to the constructor of std::unique_ptr.
I assume that Data is std::vector<std::unique_ptr<AccountsStruct>> If AccountsStruct has constructor with no arguments, you can try this:
#include "Accounts.h"
using namespace Vibranium;
void Accounts::LoadTable(RowResult &res) {
std::vector<AccountsStruct> accounts;
// create an instance in the vector and get a reference to it
// auto will be std::unique_ptr<AccountsStruct>&;
auto &accountsStruct = Data.emplace_back();
// work with that reference
for (Row row : res.fetchAll()){
accountsStruct->id = row[0].get<int>();
accountsStruct->email = row[1].get<std::string>();
accountsStruct->warTag = row[2].get<std::string>();
accountsStruct->state = row[4].get<int>();
accountsStruct->name = row[5].get<std::string>();
accountsStruct->lastname = row[6].get<std::string>();
accountsStruct->country = row[7].get<std::string>();
accountsStruct->dob_month = row[8].get<int>();
accountsStruct->dob_day = row[9].get<int>();
accountsStruct->dob_year = row[10].get<int>();
accountsStruct->balance = row[11].get<double>();
accountsStruct->created_at = row[12].get<std::string>();
accountsStruct->updated_at = row[13].get<std::string>();
accountsStruct->account_role = row[15].get<int>();
accountsStruct->rank = row[16].get<int>();
accountsStruct->playerRole = row[17].get<int>();
}
std::cout << "SIZE: " << Data.size() << std::endl;
}
If AccountsStruct is derived from DataStruct you can use:
data.emplace_back(std::make_unique<AccountData>());

parse json data in c++ with Qt library

I have exactly following json data as follows:
[
{
"id":"01323",
"name":"Json Roy",
"contacts":[
"CONTACT1=+917673267299",
"CONTACT2=+917673267292",
"CONTACT3=+917673267293",
"CONTACT4=+917673267294",
]
}
]
I want to parse above jsonData data and extract contacts of that data.
QJsonParseError jerror;
QJsonDocument jsonData = QJsonDocument::fromJson(jsonData.c_str(),&jerror);
QJsonArray jsonArray = jsonData.array();
QJsonObject jsonObject = jsonData.object();
foreach (const QJsonValue & value, jsonArray){
string contact=jsonObject["contacts"].toString().toUtf8().constData();
}
can anybody suggest me how can i accomplish this with same above library?

I removed latest comma in the contacts list.
Your mistake is treating QJsonValue as you want but QJsonValue is something like a wrapper so you should convert it to appropriate object ( array, object, string etc. ).
jsonData is not an object sojsonData.object() doesn't give you what you want.
Here is the code, it could be the starting point for you.
#include <QString>
#include <QJsonDocument>
#include <QJsonObject>
#include <QJsonArray>
#include <QJsonValue>
#include <QJsonParseError>
#include <QDebug>
#include <string>
int main(){
auto json_input = R"([
{
"id":"01323",
"name":"Json Roy",
"contacts":[
"CONTACT1=+917673267299",
"CONTACT2=+917673267292",
"CONTACT3=+917673267293",
"CONTACT4=+917673267294"
]
}
])";
QJsonParseError err;
auto doc = QJsonDocument::fromJson( QString::fromStdString( json_input ).toLatin1() , &err );
auto objects = doc.array();
if ( err.error != QJsonParseError::NoError )
{
qDebug() << err.errorString();
return 1;
}
for( auto obj_val : objects )
{
auto obj = obj_val.toObject();
auto contacts = obj.value( "contacts" ).toArray();
for ( auto contact_val : contacts )
{
auto cotact_str = contact_val.toString();
qDebug() << cotact_str;
}
}
}
Output :
CONTACT1=+917673267299
CONTACT2=+917673267292
CONTACT3=+917673267293
CONTACT4=+917673267294

Serializing a vector of objects with FlatBuffers

I have a vector of objects, let's call them Plumbuses, that I want to serialize with FlatBuffers. My schema for this example would be
namespace rpc;
struct Plumbus
{
dinglebopBatch:int;
fleeb:double;
}
table PlumbusesTable {
plumbuses:[Plumbus];
}
root_type PlumbusesTable;
since the root type can't be a vector. Calling flatc --cpp on this file generates plumbus_generated.h with functions such as CreatePlumbusesTableDirect.
The generated GeneratePlumbusesTableDirect function expects an argument const std::vector<const Plumbus *> *plumbuses. My idea was to simply take the addresses of the objects in the vector pbs and store them in another vector, pbPtrs. Since the buffer is created and sent away before pbs goes out of scope, I thought this would not be a problem.
#include <vector>
#include <iostream>
#include "plumbus_generated.h"
void send_plumbus(std::vector<rpc::Plumbus> pbs) {
std::vector<const rpc::Plumbus *> pbPtrs;
pbPtrs.push_back(&(pbs[0]));
pbPtrs.push_back(&(pbs[1]));
flatbuffers::FlatBufferBuilder buf(1024);
auto msg = CreatePlumbusesTableDirect(buf, &pbPtrs);
buf.Finish(msg);
void *msg_buf = buf.GetBufferPointer();
// here, I'd normally send the data through a socket
const rpc::PlumbusesTable *pbt = rpc::GetPlumbusesTable(msg_buf);
auto *pbPtrs_ = pbt->plumbuses();
for (const auto pbPtr_ : *pbPtrs_) {
std::cout << "dinglebopBatch = " << pbPtr_->dinglebopBatch() << ", fleeb = " << pbPtr_->fleeb() << std::endl;
}
}
int main(int argc, char** argv) {
rpc::Plumbus pb1(1, 2.0);
rpc::Plumbus pb2(3, 4.0);
std::vector<rpc::Plumbus> pbs = { pb1, pb2 };
send_plumbus(pbs);
}
Running this, instead of 1, 2.0, 3, and 4.0, I get
$ ./example
dinglebopBatch = 13466704, fleeb = 6.65344e-317
dinglebopBatch = 0, fleeb = 5.14322e-321
Why does it go wrong?

This looks like this relates to a bug that was recently fixed: https://github.com/google/flatbuffers/commit/fee9afd80b6358a63b92b6991d858604da524e2b
So either work with the most recent FlatBuffers, or use the version without Direct: CreatePlumbusesTable. Then you call CreateVectorOfStructs yourself.

How can I write a file with containing a lua table using sol2

I've settled on using lua as my config management for my programs after seeing posts like this and loving the syntax, and sol2 recently got released so I'm using that.
So my question is, how can I grab all the variables in my lua state and spit them out in a file?
say,
sol::state lua;
lua["foo"]["bar"] = 2;
lua["foo"]["foobar"] = lua.create_table();
would, in turn, eventually spit out
foo = {
bar = 2
foobar = {}
}
Is this at all possible and if so, how?

I used this serializer to serialize my table and print it out, really quite easy!
This is what I came up with
std::string save_table(const std::string& table_name, sol::state& lua)
{
auto table = lua["serpent"];
if (!table.valid()) {
throw std::runtime_error("Serpent not loaded!");
}
if (!lua[table_name].valid()) {
throw std::runtime_error(table_name + " doesn't exist!");
}
std::stringstream out;
out << table_name << " = ";
sol::function block = table["block"];
std::string cont = block(lua[table_name]);
out << cont;
return std::move(out.str());
}

How to train in Matlab a model, save it to disk, and load in C++ program?

I am using libsvm version 3.16. I have done some training in Matlab, and created a model. Now I would like to save this model to disk and load this model in my C++ program. So far I have found the following alternatives:
This answer explains how to save a model from C++, which is based on this website. Not exactly what I need, but could be adapted. (This requires development time).
I could find the best training parameters (kernel,C) in Matlab and re-train everything in C++. (Will require doing the training in C++ each time I change a parameter. It's not scalable).
Thus, both of these options are not satisfactory,
Does anyone have an idea?

My solution was to retrain in C++ because I couldn't find a nice way to directly save the model. Here's my code. You'll need to adapt it and clean it up a bit. The biggest change you'll have to make it not hard coding the svm_parameter values like I did. You'll also have to replace FilePath with std::string. I'm copying, pasting and making small edits here in SO so the formatting won't e perfect:
Used like this:
auto targetsPath = FilePath("targets.txt");
auto observationsPath = FilePath("observations.txt");
auto targetsMat = MatlabMatrixFileReader::Read(targetsPath, ',');
auto observationsMat = MatlabMatrixFileReader::Read(observationsPath, ',');
auto v = MiscVector::ConvertVecOfVecToVec(targetsMat);
auto model = SupportVectorRegressionModel{ observationsMat, v };
std::vector<double> observation{ { // 32 feature observation
0.883575729725847,0.919446119013878,0.95359403450317,
0.968233630936732,0.91891307107125,0.887897763183844,
0.937588566544751,0.920582702918882,0.888864454119387,
0.890066735260163,0.87911085669864,0.903745573664995,
0.861069296586979,0.838606194934074,0.856376230548304,
0.863011311537075,0.807688936997926,0.740434984165146,
0.738498042748759,0.736410940165691,0.697228384912424,
0.608527698289016,0.632994967880269,0.66935784966765,
0.647761430696238,0.745961037635717,0.560761134660957,
0.545498063585615,0.590854855113663,0.486827902942118,
0.187128866890822,- 0.0746523069562551
} };
double prediction = model.Predict(observation);
miscvector.h
static vector<double> ConvertVecOfVecToVec(const vector<vector<double>> &mat)
{
vector<double> targetsVec;
targetsVec.reserve(mat.size());
for (size_t i = 0; i < mat.size(); i++)
{
targetsVec.push_back(mat[i][0]);
}
return targetsVec;
}
libsvmtargetobjectconvertor.h
#pragma once
#include "machinelearning.h"
struct svm_node;
class LibSvmTargetObservationConvertor
{
public:
svm_node ** LibSvmTargetObservationConvertor::ConvertObservations(const vector<MlObservation> &observations, size_t numFeatures) const
{
svm_node **svmObservations = (svm_node **)malloc(sizeof(svm_node *) * observations.size());
for (size_t rowI = 0; rowI < observations.size(); rowI++)
{
svm_node *row = (svm_node *)malloc(sizeof(svm_node) * numFeatures);
for (size_t colI = 0; colI < numFeatures; colI++)
{
row[colI].index = colI;
row[colI].value = observations[rowI][colI];
}
row[numFeatures].index = -1; // apparently needed
svmObservations[rowI] = row;
}
return svmObservations;
}
svm_node* LibSvmTargetObservationConvertor::ConvertMatToSvmNode(const MlObservation &observation) const
{
size_t numFeatures = observation.size();
svm_node *obsNode = (svm_node *)malloc(sizeof(svm_node) * numFeatures);
for (size_t rowI = 0; rowI < numFeatures; rowI++)
{
obsNode[rowI].index = rowI;
obsNode[rowI].value = observation[rowI];
}
obsNode[numFeatures].index = -1; // apparently needed
return obsNode;
}
};
machinelearning.h
#pragma once
#include <vector>
using std::vector;
using MlObservation = vector<double>;
using MlTarget = double;
//machinelearningmodel.h
#pragma once
#include <vector>
#include "machinelearning.h"
class MachineLearningModel
{
public:
virtual ~MachineLearningModel() {}
virtual double Predict(const MlObservation &observation) const = 0;
};
matlabmatrixfilereader.h
#pragma once
#include <vector>
using std::vector;
class FilePath;
// Matrix created with command:
// dlmwrite('my_matrix.txt', somematrix, 'delimiter', ',', 'precision', 15);
// In these files, each row is a matrix row. Commas separate elements on a row.
// There is no space at the end of a row. There is a blank line at the bottom of the file.
// File format:
// 0.4,0.7,0.8
// 0.9,0.3,0.5
// etc.
static class MatlabMatrixFileReader
{
public:
static vector<vector<double>> Read(const FilePath &asciiFilePath, char delimiter)
{
vector<vector<double>> values;
vector<double> valueline;
std::ifstream fin(asciiFilePath.Path());
string item, line;
while (getline(fin, line))
{
std::istringstream in(line);
while (getline(in, item, delimiter))
{
valueline.push_back(atof(item.c_str()));
}
values.push_back(valueline);
valueline.clear();
}
fin.close();
return values;
}
};
supportvectorregressionmodel.h
#pragma once
#include <vector>
using std::vector;
#include "machinelearningmodel.h"
#include "svm.h" // libsvm
class FilePath;
class SupportVectorRegressionModel : public MachineLearningModel
{
public:
SupportVectorRegressionModel::~SupportVectorRegressionModel()
{
svm_free_model_content(model_);
svm_destroy_param(&param_);
svm_free_and_destroy_model(&model_);
}
SupportVectorRegressionModel::SupportVectorRegressionModel(const vector<MlObservation>& observations, const vector<MlTarget>& targets)
{
// assumes all observations have same number of features
size_t numFeatures = observations[0].size();
//setup targets
//auto v = ConvertVecOfVecToVec(targetsMat);
double *targetsPtr = const_cast<double *>(&targets[0]); // why aren't the targets const?
LibSvmTargetObservationConvertor conv;
svm_node **observationsPtr = conv.ConvertObservations(observations, numFeatures);
// setup observations
//svm_node **observations = BuildObservations(observationsMat, numFeatures);
// setup problem
svm_problem problem;
problem.l = targets.size();
problem.y = targetsPtr;
problem.x = observationsPtr;
// specific to out training sets
// TODO: This is hard coded.
// Bust out these values for use in constructor
param_.C = 0.4; // cost
param_.svm_type = 4; // SVR
param_.kernel_type = 2; // radial
param_.nu = 0.6; // SVR nu
// These values are the defaults used in the Matlab version
// as found in svm_model_matlab.c
param_.gamma = 1.0 / (double)numFeatures;
param_.coef0 = 0;
param_.cache_size = 100; // in MB
param_.shrinking = 1;
param_.probability = 0;
param_.degree = 3;
param_.eps = 1e-3;
param_.p = 0.1;
param_.shrinking = 1;
param_.probability = 0;
param_.nr_weight = 0;
param_.weight_label = NULL;
param_.weight = NULL;
// suppress command line output
svm_set_print_string_function([](auto c) {});
model_ = svm_train(&problem, &param_);
}
double SupportVectorRegressionModel::Predict(const vector<double>& observation) const
{
LibSvmTargetObservationConvertor conv;
svm_node *obsNode = conv.ConvertMatToSvmNode(observation);
double prediction = svm_predict(model_, obsNode);
return prediction;
}
SupportVectorRegressionModel::SupportVectorRegressionModel(const FilePath & modelFile)
{
model_ = svm_load_model(modelFile.Path().c_str());
}
private:
svm_model *model_;
svm_parameter param_;
};

Option 1 is actually pretty reasonable. If you save the model in libsvm's C format through matlab, then it is straightforward to work with the model in C/C++ using functions provided by libsvm. Trying to work with matlab-formatted data in C++ will probably be much more difficult.
The main function in "svm-predict.c" (located in the root directory of the libsvm package) probably has most of what you need:
if((model=svm_load_model(argv[i+1]))==0)
{
fprintf(stderr,"can't open model file %s\n",argv[i+1]);
exit(1);
}
To predict a label for example x using the model, you can run
int predict_label = svm_predict(model,x);
The trickiest part of this will be to transfer your data into the libsvm format (unless your data is in the libsvm text file format, in which case you can just use the predict function in "svm-predict.c").
A libsvm vector, x, is an array of struct svm_node that represents a sparse array of data. Each svm_node has an index and a value, and the vector must be terminated by an index that is set to -1. For instance, to encode the vector [0,1,0,5], you could do the following:
struct svm_node *x = (struct svm_node *) malloc(3*sizeof(struct svm_node));
x[0].index=2; //NOTE: libsvm indices start at 1
x[0].value=1.0;
x[1].index=4;
x[1].value=5.0;
x[2].index=-1;
For SVM types other than the classifier (C_SVC), look at the predict function in "svm-predict.c".

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Apache arrow - read serialized VectorSchemaRoot to C++ - apache-arrow

Related

C++ double free or corruption (out)

parse json data in c++ with Qt library

Serializing a vector of objects with FlatBuffers

How can I write a file with containing a lua table using sol2

How to train in Matlab a model, save it to disk, and load in C++ program?

Categories

Resources