from proto2 debugString to json - c++

I'm new to protocol buffers, and I' m having some hard time parsing the human readable output from proto2 (DebugString function, ref here) to a structured data.
In fact, at first, I tried to do this recursively and then i though about using the debugString() function that I can use. What I want is a simple structured output (json) that contain fieldnames and their values, but proto2 doesn't have a json output or coverter... (correct me if i' m wrong) and I'm working with cpp
Is there any other way to do this work without manually handling it? Do you know any library that is doing that?
thank you

Related

Real time parsing

I am quite new to parsing text files. While googling a bit, I found out that a parser builds a tree structure usually out of a text file. Most of the examples consists of parsing files, which in my view is quite static. You load the file to parser and get the output.
My problem is something different from parsing files. I have a stream of JSON data coming from a server socket at TCP port 6000. I need to parse the incoming data.I have some questions in mind:
1) Do I need to save the incoming JSON data at the client side with some sought of buffer? Answer: I think yes I need to save it, but are there any parsers which can do it directly like passing the JSON object as an argument to the parse function.
2) How would the structure of the real time parser look like`? Answer: Since on google only static parsing tree structure is available. In my view each object is parsed and have some sought of parsed tree and then it is deleted from the memory. Otherwise it will cause memory overflow because the data is continuous.
There are some parser libraries available like JSON-C and JSON lib. One more thing which comes into my mind is that can we save a JSON object in any C/C++ array. Just thought of that but could realize how to do that.

Emitting avro format from pipes in Hadoop

I have to program in C++ for Hadoop and I deal with a complex structure of output value.
Unfortunately I can't figure out how to emit this structure in Avro format in MapReduce.
There are some writers like DataFileWriter and they work well for me. But it all doesn't make sense in terms of HDFS.
How I emit the structure now:
IOSerializer serializer;
context.emit(key, serializer.toString(output));
This custom toString method I wrote myself (sorry for the name, I'm totally from the Java world).
This is just a custom serialization into String. I really want some interoperability here and decided to use Avro.
This is the code to write Avro into the file:
avro::DataFileWriter<fusion_solve::graph> dfw("test.bin", schema);
dfw.write(output);
dfw.close();
What I want to be able to do is something like this:
IOSerializer serializer;
context.emit(serializer.toAvro(key, output));
For the moment I will be happy to get just plain JSON string as output, to convert it later.
The other option for me is writing custom RecordWriter in Java. But which type of input data should I use in this case, JSON?

How do you open a file in C++ from HTTP where the URL is NOT the file location

I'm a first year comp sci student with a moderate knowledge of C++ and for a job I'm trying to put together a utility using a new U.S. Census Bureau API. It takes ID codes for things like state/county/census tract and the desired table and spits back out the desired table for the desired location.
Here's an example of a query for population stats for California and New York.
More examples can be found here: http://www.census.gov/developers/
My snag is that I've both never worked with files from HTTP and also I'm not sure how to handle a URL that outputs plain text but doesn't actually lead to the file location. Would it be possible to just use stdin? I don't understand how to handle the output given by one of the census query URLs.
Right now I'm using infile which I know isn't correct but I'm not sure a correct solution is either.
Thanks
The fact that the data you're receiving is (apparently) generated on the fly rather than coming from a file doesn't really make any difference to you -- you get the same stream of bytes either way.
My immediate advice would be to use cURL for the job. Most of your work is generating a correct URL, which is what cURL specializes in. It'll then make it pretty easy to grab the data. From there, you can use any of quite a few JSON parser libraries (e.g., yajl), or you can parse it on your own (JSON is simple enough to make that fairly practical). A quick Google indicates that a fair number of people have already done this, and have various blog posts and such giving information about how to do it (though I suspect most of that is probably unnecessary).

How to start using xml with C++

(Not sure if this should be CW or not, you're welcome to comment if you think it should be).
At my workplace, we have many many different file formats for all kinds of purposes. Most, if not all, of these file formats are just written in plain text, with no consistency. I'm only a student working part-time, and I have no experience with using xml in production, but it seems to me that using xml would improve productivity, as we often need to parse, check and compare these outputs.
So my questions are: given that I can only control one small application and its output (only - the inputs are formats that are used in other applications as well), is it worth trying to change the output to be xml-based? If so, what are the best known ways to do that in C++ (i.e., xml parsers/writers, etc.)? Also, should I also provide a plain-text output to make it easy for the users (which are also programmers) to get used to xml? Should I provide a script to translate xml-plaintext? What are your experiences with this subject?
Thanks.
Don't just use XML because it's XML.
Use XML because:
other applications (that only accept XML) are going to read your output
you have an hierarchical data structure that lends itself perfectly for XML
you want to transform the data to other formats using XSL (e.g. to HTML)
EDIT:
A nice personal experience:
Customer: your application MUST be able to read XML.
Me: Er, OK, I will adapt my application so it can read XML.
Same customer (a few days later): your application MUST be able to read fixed width files, because we just realized our mainframe cannot generate XML.
Amir, to parse an XML you can use TinyXML which is incredibly easy to use and start with. Check its documentation for a quick brief, and read carefully the "what it does not do" clause. Been using it for reading and all I can say is that this tiny library does the job, very well.
As for writing - if your XML files aren't complex you might build them manually with a string object. "Aren't complex" for me means that you're only going to store text at most.
For more complex XML reading/writing you better check Xerces which is heavier than TinyXML. I haven't used it yet I've seen it in production and it does deliver it.
You can try using the boost::property_tree class.
http://www.boost.org/doc/libs/1_43_0/doc/html/property_tree.html
http://www.boost.org/doc/libs/1_43_0/doc/html/boost_propertytree/tutorial.html
http://www.boost.org/doc/libs/1_43_0/doc/html/boost_propertytree/parsers.html#boost_propertytree.parsers.xml_parser
It's pretty easy to use, but the page does warn that it doesn't support the XML format completely. If you do use this though, it gives you the freedom to easily use XML, INI, JSON, or INFO files without changing more than just the read_xml line.
If you want that ability though, you should avoid xml attributes. To use an attribute, you have to look at the key , which won't transfer between filetypes (although you can manually create your own subnodes).
Although using TinyXML is probably better. I've seen it used before in a couple of projects I've worked on, but don't have any experience with it.
Another approach to handling XML in your application is to use a data binding tool, such as CodeSynthesis XSD. Such a tool will generate C++ classes that hide all the gory details of parsing/serializing XML -- all that you see are objects corresponding to your XML vocabulary and functions that you can call to get/set the data, for example:
Person p = person ("person.xml");
cout << p.name ();
p.name ("John");
p.age (30);
ofstream ofs ("person.xml");
person (ofs, p);
Here's what previous SO threads have said on the topic. Please add others you know of that are relevant:
What is the best open XML parser for C++?
What is XML good for and when should i be using it?
What are good alternative data formats to XML?
BTW, before you decide on an XML parser, you may want to make sure that it will actually be able to parse all XML documents instead of just the "simple" ones, as discussed in this article:
Are you using a real XML parser?

XML Serialization/Deserialization in C++

I am using C++ from Mingw, which is the windows version of GNC C++.
What I want to do is: serialize C++ object into an XML file and deserialize object from XML file on the fly. I check TinyXML. It's pretty useful, and (please correct me if I misunderstand it) it basically add all the nodes during processing, and finally put them into a file in one chunk using TixmlDocument::saveToFile(filename) function.
I am working on real-time processing, and how can I write to a file on the fly and append the following result to the file?
Thanks.
BOOST has a very nice Serialization/Deserialization lib BOOST.Serialization.
If you stream your objects to a boost xml archive it will stream them in xml format.
If xml is to big or to slow you only need to change the archive in a text or binary archive to change the streaming format.
Here is a better example of C++ object serialization:
http://www.codeproject.com/KB/XML/XMLFoundation.aspx
I notice that each TiXmlBase Class has a Print method and also supports streaming to strings and streams.
You could walk the new parts of the document in sequence and output those parts as they are added, maybe?
Give it a try.....
Tony
I've been using gSOAP for this purpose. It is probably too powerful for just XML serialization, but knowing it can do much more means I do not have to consider other solutions for more advanced projects since it also supports WSDL, SOAP, XML-RPC, and JSON. Also suitable for embedded and small devices, since XML is simply a transient wire format and not kept in a DOM or something memory intensive.