Read CSV from std::vector<unsigned char> using Apache Arrow - c++

I am trying to read a csv input format using Apache arrow. The example here mentions that the input should be an InputStream, however in my case I just have an std::vector of unsigned chars. Is it possible to parse this using apache arrow? I have checked the I/O interface to see if there is an "in-memory" data structure with no luck.
I copy-paste the example code for convenience here as well as my input data:
#include "arrow/csv/api.h"
{
// ...
std::vector<unsigned char> data;
arrow::io::IOContext io_context = arrow::io::default_io_context();
// how can I fit the std::vector to the input stream?
std::shared_ptr<arrow::io::InputStream> input = ...;
auto read_options = arrow::csv::ReadOptions::Defaults();
auto parse_options = arrow::csv::ParseOptions::Defaults();
auto convert_options = arrow::csv::ConvertOptions::Defaults();
// Instantiate TableReader from input stream and options
auto maybe_reader =
arrow::csv::TableReader::Make(io_context,
input,
read_options,
parse_options,
convert_options);
if (!maybe_reader.ok()) {
// Handle TableReader instantiation error...
}
std::shared_ptr<arrow::csv::TableReader> reader = *maybe_reader;
// Read table from CSV file
auto maybe_table = reader->Read();
if (!maybe_table.ok()) {
// Handle CSV read error
// (for example a CSV syntax error or failed type conversion)
}
std::shared_ptr<arrow::Table> table = *maybe_table;
}
Any help would be appreciated!

The I/O interface docs list BufferReader which works as an in-memory input stream. While not listed in the docs, it can be constructed from a pointer and a size which should let you use your vector<char>.

Related

Using C++ protobuf formatted structure in leveldb. set/get operations

I'd like to make a POC of using leveldb in order to store key-value table of different data types in protobuf format.
So far I was able to open the database file, and I also saw the get function with the following signature :
virtual Status Get(const ReadOptions& options, const Slice& key, std::string* value)=0
I understand that the value is actually refers to a binary string like vector and not regular alphanumeric string, so I guess it can fit for multi type primitives like string, uint, enum) but how can it support struct/class that represent protobuf layout in c++ ?
So this is my proto file that I'd like to store in the leveldb:
message agentStatus {
string ip = 1;
uint32 port = 2;
string url = 3;
google.protobuf.Timestamp last_seen = 4;
google.protobuf.Timestamp last_keepalive = 5;
bool status = 6;
}
and this is my current POC code. How can I use the get method to access any of the variables from the table above ?
#include <leveldb/db.h>
void main () {
std::string db_file_path = "/tmp/data.db";
leveldb::DB* db;
leveldb::Status status;
leveldb::Options options;
options.create_if_missing = false;
status_ = leveldb::DB::Open(options, db_file_path, &db);
if (!status_.ok()) {
throw std::logic_error("unable to open db");
}
Thanks !
You need to serialize the protobuf message into a binary string, i.e. SerilaizeToString, and use the Put method to write the binary string to LevelDB with a key.
Then you can use the Get method to retrieve the binary value with the given key, and parse the binary string to a protobuf message, i.e. ParseFromString.
Finally, you can get fields of the message.

Open CSV file with Apache Arrow in C++

I'm trying to read a csv file with Apache Arrow but I can't get my head around the InputStream...
It seems the example in their documentation is out of date.
I've tweaked a bit the example but I get "Access violation reading location" exception. Any idea what I'm doint wrong?
Thanks
My code:
arrow::MemoryPool* pool = arrow::default_memory_pool();
std::shared_ptr<arrow::io::ReadableFile> infile;
infile->Open("test.csv", pool);
auto read_options = arrow::csv::ReadOptions::Defaults();
auto parse_options = arrow::csv::ParseOptions::Defaults();
auto convert_options = arrow::csv::ConvertOptions::Defaults();
// Instantiate TableReader from input stream and options
std::shared_ptr<arrow::csv::StreamingReader> reader;
auto res1 = reader->Make(pool, infile, read_options, parse_options, convert_options);
if (!res1.ok()) {
// Handle TableReader instantiation error...
}
std::shared_ptr<arrow::Table> table;
// Read table from CSV file
auto res2 = reader->ReadAll(&table);
if (!res2.ok()) {
// Handle CSV read error
// (for example a CSV syntax error or failed type conversion)
}

How to read data from rosbag2 in ros2

I am writing a program to read data from rosbag directly without playing it in ros2. Sample code snippet is below. The intention of the code is that it checks for a ros2 topic and fetches only message in that topic. I am not able to fetch the data from the bag. When printed the console is printing hexadecimal values.
auto read_only_storage = factory.open_read_only(bag_file_path, storage_id);
while(read_only_storage->has_next())
{
auto msg = read_only_storage->read_next();
if(msg->topic_name == topic)
{
cout << msg->serialized_data<<endl;
}
}
Any help in this regard would be appreciable.
You have to deserialize "msg->serialized_data" data. If you are using data serialized "cdr" format, please look below code.
// deserialization and conversion to ros message
my_pkg::msg::Msg msg;
auto ros_message = std::make_shared<rosbag2_introspection_message_t>();
ros_message->time_stamp = 0;
ros_message->message = nullptr;
ros_message->allocator = rcutils_get_default_allocator();
ros_message->message = &msg;
auto type_support = rosbag2::get_typesupport("my_pkg/msg/Msg", "rosidl_typesupport_cpp");
rosbag2::SerializationFormatConverterFactory factory;
std::unique_ptr<rosbag2::converter_interfaces::SerializationFormatDeserializer> cdr_deserializer_;
cdr_deserializer_ = factory.load_deserializer("cdr");
cdr_deserializer_->deserialize(msg, type_support, ros_message);
Full code: https://github.com/Kyungpyo-Kim/ROS2BagFileParsing

Azure C++ library: "Invalid streambuf object"

I am trying to download a potentially huge Azure block blob, using the C++ Azure client library. It isn't working because I don't know how to initialize a concurrency::streams::streambuf object with a buffer size. My code looks like this:
// Assume blockBlob has been created correctly.
concurrency::streams::istream blobStream = blockBlob.open_read();
// I don't know how to initialize this streambuf:
concurrency::streams::streambuf<uint8_t> dlStreamBuf;
size_t nBytesReturned = 0, nBytesToRead = 65536;
do {
// This gets the exception "Invalid streambuf object":
concurrency::task<size_t> returnedTask = blobStream.read(dlStreamBuf, nBytesToRead);
nBytesReturned = returnedTask.get();
bytesSoFar += nBytesReturned;
// Process the data in dlStreamBuf here...
} while(nBytesReturned > 0);
blobStream.close();
Note that the above streambuf is not to be confused with a standard C++ streambuf.
Can anyone advise me on how to properly construct and initialize a concurrency::streams::streambuf?
Thanks.
streambuf seems to be a template class. Try this instead:
concurrency::streams::container_buffer<std::vector<uint8_t>> output_buffer;
size_t nBytesReturned = 0, nBytesToRead = 65536;
do {
// This gets the exception "Invalid streambuf object":
concurrency::task<size_t> returnedTask = stream.read(output_buffer, nBytesToRead);
nBytesReturned = returnedTask.get();
bytesSoFar += nBytesReturned;
// Process the data in dlStreamBuf here...
} while (nBytesReturned > 0);
stream.close();
Sample code is here: https://github.com/Azure/azure-storage-cpp/blob/76cb553249ede1e6f05456d936c9a36753cc1597/Microsoft.WindowsAzure.Storage/tests/blob_streams_test.cpp#L192
I haven't used the stream methods for C++, but there are two ways mentioned in the C++ documentation about downloading to files or to steams here
The download_to_stream method ex:
// Retrieve storage account from connection string.
azure::storage::cloud_storage_account storage_account = azure::storage::cloud_storage_account::parse(storage_connection_string);
// Create the blob client.
azure::storage::cloud_blob_client blob_client = storage_account.create_cloud_blob_client();
// Retrieve a reference to a previously created container.
azure::storage::cloud_blob_container container = blob_client.get_container_reference(U("my-sample-container"));
// Retrieve reference to a blob named "my-blob-1".
azure::storage::cloud_block_blob blockBlob = container.get_block_blob_reference(U("my-blob-1"));
// Save blob contents to a file.
concurrency::streams::container_buffer<std::vector<uint8_t>> buffer;
concurrency::streams::ostream output_stream(buffer);
blockBlob.download_to_stream(output_stream);
std::ofstream outfile("DownloadBlobFile.txt", std::ofstream::binary);
std::vector<unsigned char>& data = buffer.collection();
outfile.write((char *)&data[0], buffer.size());
outfile.close();
Alternative, using download_to_file:
// Retrieve storage account from connection string.
azure::storage::cloud_storage_account storage_account = azure::storage::cloud_storage_account::parse(storage_connection_string);
// Create the blob client.
azure::storage::cloud_blob_client blob_client = storage_account.create_cloud_blob_client();
// Retrieve a reference to a previously created container.
azure::storage::cloud_blob_container container = blob_client.get_container_reference(U("my-sample-container"));
// Retrieve reference to a blob named "my-blob-2".
azure::storage::cloud_block_blob text_blob = container.get_block_blob_reference(U("my-blob-2"));
// Download the contents of a blog as a text string.
utility::string_t text = text_blob.download_text();

How to create new GMimeMessage from string?

In my project i use libgmime for MIME types. I'm trying to create new GMimeMessage using std::string as a body.
According to docs it can be done using GMimeStream and GMimeDataWrapper for preparing data, and then creating GMimePart from this data to be set as MIME part of new message.
The code:
std::string body = "some test data";
GMimeMessage* message = g_mime_message_new(FALSE);
//set header
g_mime_object_set_header((GMimeObject *) message, name.c_str()), value.c_str();
//create stream and write data into it.
GMimeStream* stream;
g_mime_stream_construct(stream, 0, body.length());
g_mime_stream_write_string(stream, body.c_str());
GMimeDataWrapper* wrapper = g_mime_data_wrapper_new_with_stream(stream, GMIME_CONTENT_ENCODING_DEFAULT);
//create GMimePart to be set as mime part of GMimeMessage
GMimePart* mime_part = g_mime_part_new();
g_mime_part_set_content_object(mime_part, wrapper);
g_mime_message_set_mime_part(message, (GMimeObject *) mime_part);
When i try to create message in this way, i get segfault here:
g_mime_stream_write_string(stream, body.c_str());
Maybe i'm using wrong method of message creation...
What's the right way it can be done?
You have bad initialization GMimeStream *stream. Need:
GMimeStream *stream;
/* initialize GMime */
g_mime_init (0);
/* create a stream around stdout */
stream = g_mime_stream_mem_new_with_buffer(body_part.c_str(), body_part.length());
See doc: http://spruce.sourceforge.net/gmime/tutorial/x49.html
And sample: http://fossies.org/linux/gmime/examples/basic-example.c