C++ iterate avro schema and map it to Key Value (name , type) - c++

I'm learning Avro & C++ (both together :) ) and what i'm trying to do is:
Load a schema.
Map the schema fields to key / value paris of name & type.
Then iterate the avro data according to the mapping.
From what I've found and did, I'm extracting the schema from the Avro data file, and handles it as GenricDatum.
When I try to iterate it's fields - I get the name ok, but the field type is null, where I would expect to get the actual type. any help would be appreciated.
My code:
const char *avroFilePathCstr = avroFilePath.c_str();
avro::DataFileReader<avro::GenericDatum> reader(avroFilePathCstr);
auto dataSchema = reader.dataSchema();
avro::GenericDatum datum(dataSchema);
ProcessAvroSchema(dataSchema);
void ProcessAvroSchema(avro::GenericDatum schema) {
const avro::GenericRecord& schemaRecord = schema.value<avro::GenericRecord>();
for(unsigned int i = 0 ; i < schemaRecord.fieldCount(); i++)
{
avro::GenericDatum fieldDatum = schemaRecord.fieldAt(i);
cout << "SCHEMA:: fieldName: " << schemaRecord.schema() -> nameAt(i) << " , fieldType: " << fieldDatum.type() << "\n";
}
}

Related

GzipOutputStream fails to serialise to a string using ProtoBuf

I'm trying to incorporate Protocol buffers in my project. I have created the following very simple schema:
syntax = "proto3";
message Document{
string title = 1;
int64 size = 2;
int64 data = 3;
}
Then in my C++ code (after compiling with the protobufc compiler) I use this as:
Document document;
document.set_title(random_string(20));
document.set_data(300);
document.set_size(500);
std::string docString = document.SerializeAsString();
std::string compressedString;
google::protobuf::io::StringOutputStream stream(&compressedString);
google::protobuf::io::GzipOutputStream gStream(&stream);
document.SerializeToZeroCopyStream(&gStream);
std::cout << docString.size() << std::endl;
std::cout << compressedString.size() << std::endl;
The output of the code above is:
28 0
Thus the compressed string is empty, while the normal string is 28. What is the correct way of using GzipOutputStream and compress the serialised data of a protocol buffer.

How to write mongocxx (v3) query with Date data type?

I have a collection and I need to extract only most recent data based on field updateDate (with data type Date).
I use c++ driver (mongocxx (v3) ).
mongocxx::cursor cursor =
collection.find(document{} << "updateDate" <<
open_document << "$gt" <<
(date parameter) << close_document <<
finalize);
for(auto doc : cursor) {
std::cout << bsoncxx::to_json(doc) << "\n";
}
How can I write my query for instance:
what format, data type I need to pass as day?
Thank you.
You can do something like this :
replace (date parameter) with bsoncxx::types::b_date{<value>}

json serialize c++

I have this C++ code, and am having trouble json serializing it.
string uInput;
string const& retInput;
while(!std::cin.eof()) {
getline(cin, uInput);
JSONExample source; //JSON enabled class from jsonserialize.h
source.text = uInput;
//create JSON from producer
std::string json = JSON::producer<JSONExample>::convert(source); //string -> returns {"JSONExample":{"text":"hi"}}
//then create new instance from a consumer...
JSONExample sink = JSON::consumer<JSONExample>::convert(json);
//retInput = serialize(sink);
// Json::FastWriter fastWriter;
// retInput = fastWriter.write(uInput);
retInput = static_cast<string const&>(uInput);
pubnub::futres fr_2 = pb_2.publish(chan, retInput);
cout << "user input as json which should be published is " << retInput<< std::endl;
while(!cin.eof()) {
getline(cin, uInput);
newInput = "\"\\\"";
newInput += uInput;
newInput += "\\\"\"";
Instead of typing in the message like "\"hi\"", this code takes "hi" and does it.
If the change you described made the "Invalid JSON" disappear, then a "more correct" solution would be, AFAICT, to change the publish() line to:
pubnub::futres fr_2 = pb_2.publish(chan, json);
Because json already has JSON serialized data. Of course, if that JSON is what you want to publish.

Cassandra cppdriver Query String Buffer Overflow?

I have been writing a wrapper for the Cassandra cppdriver for CQL3.0 and I have come across some odd behavior, and I am not sure if it is typical or a bug.
For reference, I am working with with the cppdriver code release on 4 September (from the repository), libuv0.10 and off of the songs / playlist example posted on the datastax website (http://www.datastax.com/documentation/cql/3.1/cql/ddl/ddl_music_service_c.html)
The problem that I am having is with executing query strings. There seems to be some threshold of characters after which the query string being sent to Cassandra becomes garbage. The code that I am using to construct and send the string to the cppdriver library (and parse the results) is provided below. I added a function (cass_session_print_query) to the cassandra.h and session.cpp files to print out generated statement.
map<string, vector<string> > retresults;
int i = 0, ccount;
stringstream ss;
vector<string> keys = get.GetList();
vector<string>::iterator kit = keys.begin();
map<int, pair<string, string> > primkeys = get.GetMap();
map<int, pair<string, string> >::iterator mit = primkeys.begin();
if (!keys.empty())
{
ss << "SELECT " << (*kit);
++kit;
for ( ; kit != keys.end(); ++kit)
ss << "," << (*kit);
ss << " FROM " << tablename;
if (!primkeys.empty())
{
ss << " WHERE ";
ss << mit->second.first << " = ?";
++mit;
for ( ; mit != primkeys.end(); ++mit)
ss << " and " << mit->second.first << " = ?";
mit = primkeys.begin();
}
ss << ";";
cass_bool_t has_more_pages = cass_false;
const CassResult* result = NULL;
CassString query = cass_string_init(ss.str().c_str());
CassStatement* statement = cass_statement_new(query, primkeys.size());
for ( ; mit != primkeys.end(); ++mit)
cass_statement_bind_string(statement, i++, cass_string_init(mit->second.second.c_str()));
cass_statement_set_paging_size(statement, 100);
do
{
cass_session_print_query(statement);
CassIterator* iterator;
CassFuture* future = cass_session_execute(session_, statement);
if (cass_future_error_code(future) != 0)
{
CassString message = cass_future_error_message(future);
fprintf(stderr, "Error: %.*s\n", (int)message.length, message.data);
break;
}
result = cass_future_get_result(future);
ccount = cass_result_column_count(result);
vector<string> cnames;
for (i = 0; i < ccount; i++)
cnames.push_back(cass_result_column_name(result, i).data);
iterator = cass_iterator_from_result(result);
ListVector::iterator vit;
while (cass_iterator_next(iterator))
{
const CassRow* row = cass_iterator_get_row(iterator);
for (vit = cnames.begin(); vit != cnames.end(); ++vit)
{
CassString value;
char value_buffer[256];
cass_value_get_string(cass_row_get_column_by_name(row, (*vit).c_str()), &value);
if (value.length == 0 || value.data == NULL)
continue;
memcpy(value_buffer, value.data, value.length);
value_buffer[value.length] = '\0';
retresults[(*vit)].push_back(value_buffer);
}
}
has_more_pages = cass_result_has_more_pages(result);
if (has_more_pages)
cass_statement_set_paging_state(statement, result);
cass_iterator_free(iterator);
cass_result_free(result);
} while (has_more_pages);
}
return retresults;
With this, an initial query string of SELECT id,album,title,artist,data FROM songs; results in a Cassandra query string of SELECT id,album,title,artist,data FROM songs;. However, if I add one more column to the SELECT portion SELECT id,album,title,artist,data,tags FROM songs; the query string in the Cassandra cppdriver library becomes something like: ,ar����,dat�� jOM songX. This results in the following error from Cassandra / library: Error: line 1:49 no viable alternative at character '�'.
I have also tried fewer columns, but with a WHERE clause, and that results in the same problem.
Is this a bug? Or am I building and sending strings to the cppdriver library incorrectly?
You should cass_future_wait() on the execute future before testing the error code.
Unrelated: there are also a couple of things that should be freed (future, statement), but I'm assuming that was omitted to keep this concise.
So, it looks like (for whatever reason) I HAVE to parse out the row key from the results. I checked the example, and I was able to not parse out the row key information and everything still worked. I am not yet entirely sure what is forcing me to do this (compared to the provided paging example), but for others, you need to include the following within the while (cass_iterator_nex(iterator)) block to "magically" fix my code above.
CassUuid key;
char key_buffer[CASS_UUID_STRING_LENGTH];
const CassRow* row = cass_iterator_get_row(iterator);
cass_value_get_uuid(cass_row_get_column(row, 0), key);
cass_uuid_string(key, key_buffer);
This is really a long shot, but since you mentioned the Music Service example, did you possibly download and use cql_collections.zip query strings? If so, the strings (now fixed) had minor syntax errors:
-use music
-CREATE TABLE music.songs ( id uuid PRIMARY KEY, album text, artist text, data blob, reviews list, tags set, title text, venue map
+use music;
+CREATE TABLE music.songs ( id uuid PRIMARY KEY, album text, artist text, data blob, reviews list, tags set, title text, venue map);
AeroBuffalo's code worked for me except that I had to put '&' in front of the second parameter of cass_value_get_uuid() function. It required reference type.
cass_value_get_uuid(cass_row_get_column(row, 0), &key);

PQexecParams in C++, query error

I'm using pqlib with postgresql version 9.1.11
I have the following code
const char *spid = std::to_string(pid).c_str();
PGresult *res;
const char *paramValues[2] = {u->getID().c_str(), spid};
std::string table;
table = table.append("public.\"").append(Constants::USER_PATTERNS_TABLE).append("\"");
std::string param_name_pid = Constants::RELATION_TABLE_PATTERN_ID;
std::string param_name_uid = Constants::RELATION_TABLE_USER_ID;
std::string command = Constants::INSERT_COMMAND + table + " (" + param_name_uid + ", " + param_name_pid + ") VALUES ($1, $2::int)";
std::cout << "command: " << command << std::endl;
res = PQexecParams(conn, command.c_str(), 2, NULL, paramValues, NULL, NULL,0);
Where
INSERT_COMMAND = "INSERT INTO " (string)
USER_PATTERN_TABLE = "User_Patterns" (string)
RELATION_TABLE_PATTERN_ID = "pattern_id" (string)
RELATION_TABLE_USER_ID = "user_id" (string)
pid = an int
u->getID() = a string
conn = the connection to the db
The table "User_Patterns" is defined as
CREATE TABLE "User_Patterns"(
user_id TEXT references public."User" (id) ON UPDATE CASCADE ON DELETE CASCADE
,pattern_id BIGSERIAL references public."Pattern" (id) ON UPDATE CASCADE
,CONSTRAINT user_patterns_pkey PRIMARY KEY (user_id,pattern_id) -- explicit pk
)WITH (
OIDS=FALSE
);
I already have a user and a pattern loaded into their respective tables.
The command generated is :
INSERT INTO public."User_Patterns" (user_id, pattern_id) VALUES ($1, $2::int)
I also tried with $2, $2::bigint, $2::int4
The problem is:
I receive the error :
ERROR: invalid input syntax for integer: "public.""
I already use PQexecParams to store users and patterns, the only difference is that they all have text/xml fields (the only int field on patterns is a serial one and I don't store that value myself) but because the user_patterns is a relation table I need to store and int for the pattern_id.
I already read the docs for pqlib and saw the examples, both are useless.
The problem is in the lines:
const char *spid = std::to_string(pid).c_str();
const char *paramValues[2] = {u->getID().c_str(), spid};
std::to_string(pid) creates temporary string and .c_str() returns a pointer to an internal representation of this string, which is destroyed at the end of the line, resulting in a dead pointer. You may also see answer to the question
stringstream::str copy lifetime