Parsing JSON data from TCP stream - c++

I am using nlohmann's json library for parsing json data from a TCP stream. I am not quite sure how to handle partial json reads from local socket. Suppose that in the first read() I get:
{
"MessageType": "CancelOrder",
"Account":11111,
"CustomerNo":11111,
"Side":"A",
"DestinationMarket":"DUMB_MARKET",
"Symbol":"DUMB_SYMBOL",
"PositionEffect":"D",
"Limi
and in the following read() from socket, I get:
tPrice":0,
"Quantity":1,
"OrderType":"DUMB_TYPE",
"StopPrice":0,
"TimeInForce":"01.06.1999",
"ExpireDate":0,
"OrderID": "DUMB_ID",
"IsStopOrder":"DUMB_STOP",
"CorrelationId": 456
}
Partial reads cannot be parsed by the library since they are not valid. Does the library offer a solution to this? Or should I implement a solution myself?
What should be the best practice here?

You've gotten some good answers in the comments. I'm going to assemble some and add one more choice.
If you have control over both ends of the communications, then some people feel you should change the communications in one of two ways:
Send the length of text first
Or use a smarter messaging system over the socket
Either of these would solve your problem for you.
I'll offer two more possible choices.
Send an "end of data" indicator -- something that won't appear in the JSON. For instance, a null-byte. Break before the EOD character.
Try successively parsing data until it parses successfully.
The second one is kind of ugly. You'd parse { and get an exception. Then you'd parse {" for an exception, over and over until finally you have complete JSON. I bet it's slow, but it might work, and it doesn't depend on changing the data stream in any way.
Personally, I'd consider in order:
Use a proper messaging protocol
Use an End of Data indicator
Send the length
The hack of parsing and catching the exceptions until it parses
I think any of these would work. The last one is the only one that doesn't force you to change both ends of your data stream.

Related

How should I go about serving a json file to a website in my current architecture

sorry for absolutly murdering the tilte. But I am not sure how to frame this question, please edit this if there is a better way of explaining my problem.
I am reading a bitstream from a program which I convert into json data, write it to a socket, where another program reads this data and appends it to a log.json file. I am doing all of this in C++
Now I want to display this data in a better way. So why not try to display this in an html document, with some css applied on it.
My first thought was to simply fetch this with javascript. But now-a-days this throws an error.
So my second thought was to create a simple node.js server which accepts GET requests and then use this to serve the file. But this feels like its a bit overkill.
My third thought is now to perhaps use my original server (who continuously reads from the socket). And use that one to also accept http requests. But then I would have to multithread it, which again seems kinda overkill.
So im kinda falling back to needing 2 different "servers". One that reads from the socket and appends to the log file and another to serve this file to the website.
Am I'm thinking wrong here? What would be a good way to solve this?

Real time parsing

I am quite new to parsing text files. While googling a bit, I found out that a parser builds a tree structure usually out of a text file. Most of the examples consists of parsing files, which in my view is quite static. You load the file to parser and get the output.
My problem is something different from parsing files. I have a stream of JSON data coming from a server socket at TCP port 6000. I need to parse the incoming data.I have some questions in mind:
1) Do I need to save the incoming JSON data at the client side with some sought of buffer? Answer: I think yes I need to save it, but are there any parsers which can do it directly like passing the JSON object as an argument to the parse function.
2) How would the structure of the real time parser look like`? Answer: Since on google only static parsing tree structure is available. In my view each object is parsed and have some sought of parsed tree and then it is deleted from the memory. Otherwise it will cause memory overflow because the data is continuous.
There are some parser libraries available like JSON-C and JSON lib. One more thing which comes into my mind is that can we save a JSON object in any C/C++ array. Just thought of that but could realize how to do that.

How to send/receive XML data with sockets in Qt using string?

I have a Qt TCP Server and Client program which can interact with each other. The Server can send some function generated data to the socket using Qtextstream. And the Client reads the data from the socket using simple readAll() and displays to a QtextEdit.
Now my data from Server side is huge (around 7000+ samples ) and I need the data to appear on the Client side instantaneously. I have learned that using XML will help in my case. So, I made an Qt XML Server and it generates the whole xml data into a .xml file. I read the .xml file in Client side and I can get to display its contents. I used the DOM method for parsing. But I get the data to display only when all the 7000+ samples have been generated on the Server side.
I need clarifications on these questions:
How do I write each element of the XML Server side in to a String and send them through socket? I learnt tagName() can help me, but I have not been able to figure out how.
Is there any other way other than the String method to get a single element generated in the Server side to appear in the Client side.
PS: I am a newbie, forgive my ignorance. Thank you.
Most DOM XML parsers require a complete, well-formed XML document before they'll do anything with it. That's precisely what you see: your data is processed only after all of the samples have been received.
You need to use an incremental parser that doesn't care about the XML document not being complete yet.
On the other hand: if you're not requiring XML for interoperability with 3rd party systems, you're probably wasting a lot of resources by using it. I don't know where you've "learned" that XML will "help in your case". To me it's not learning, it's just following the crowd without understanding what's going on. Is your requirement to use XML or to move the data around? Moving data around has been a well understood problem for decades. Computers "speak" binary. No need to work around it, you know. If all you need is to move around some numbers, use QDataStream and be done with it. It'll be two orders of magnitude faster than the fastest XML parsers, you'll transmit an order of magnitude less data, and everyone will live happily ever after*.
*living happily ever after not guaranteed, individual results may vary.

How to handle server-client requests

Currently I'm working on a Server-Client system which will be the backbone of my application.
I have to find the best way to send requests and handle them on the server-side.
The server-side should be able to handle requests like this one:
getPortfolio -i 2 -d all
In an old project I decided to send such a request as string and the server application had to look up the first part of the string ("getPortfolio"). Afterwards the server application had to find the correct method in a map which linked the methods with the the first part of the string ("getPortfolio"). The second part ("-i 2 -d all") got passed as parameter and the method itself had to handle this string/parameter.
I doubt that this is the best solution in order to handle many different requests.
Rgds
Layne
To me it seems you're having two different questions.
For the socket part, I suggest you use Beej's guide to socket programming if you want to have full control about what you do. If you don't want to/don't have the time to treat this part yourself, you can just use a C++ socket library as well. There are plenty of them; I only used this one so far, but others might be as just good (or even better).
Regarding your parsing algorithm, you may first write down everything about the message format, so you'll have a strict guideline to follow. Then process step by step:
First, extract the "first word" and just keep the following parameters in some list. Check if the first word is valid and if it is known. If the "first word" does not match with any of the predefined existing functions, just ignore the message (and eventually report the error to the client application).
Once you have the matching function, simply call it passing the others parameters.
This way, each function will do a specific task and your code will be splitted in an elegant way.
Unfortunately, it is difficult for me to be any more explicit since we somehow lack of details here.

XML Parsing Problem

I have an XML parser that crashes on incomplete XML data. So XML data fed to it could be one of the following:
<one><two>twocontent</two</one>
<a/><b/> ( the parser treats it as two root elements )
Element attributes are also handled ( though not shown above ).
Now, the problem is when I read data from socket I get data in fragments. For example:
<one>one
content</two>
</one>
Thus, before sending the XML to the parser I have to construct a valid XML and send it.
What programming construct ( like iteration, recursion etc ) would be the best fit for this kind of scenario.
I am programming in C++.
Please help.
Short answer: You're doing it wrong.
Your question confuses two separate issues:
Parsing of data that is not well-formed XML at all, i.e. so-called tag soup.
Example: Files generated by programmers who do not understand XML or have lousy coding practices.
It is not unfair to say: A file that is not well-formed XML is not an XML document at all. Every correct XML parser will reject it. Ideally you would work to correct the source of this data and make sure that proper XML is generated instead.
Alternatively, use a tag soup parser, i.e. a parser that does error correction.
Useful tag soup parsers are often actually HTML parsers. tidy has already been pointed out in another answer.
Make certain that you understand what correction steps such a parser actually performs, since there is no universal approach that could fix XML. Tidy in particular is very aggressive at "repairing" the data, more aggressive than real browsers and the HTML 5 spec, for example.
XML parsing from a socket, where data arrives chunk-by-chunk in a stream. In this situation, the XML document might be viewed as "infinite", with chunks being processed as the appear, long before a final end tag for the root element has been seen.
Example: XMPP is a protocol that works like this.
The solution is to use a pull-based parser, for example the XMLTextReader API in libxml2.
If a tree-based data structure for the XML child elements being parser is required, you can build a tree structure for each such element that is being read, just not for the entire document.
What is feeding you the XML from the other end of the socket connection? It doesn't make sense that you should be missing stuff, as you illustrate, just because you receive it from a socket.
If the socket is using TCP (or a custom protocol with similar properties), you should not be missing parts of your XML. Thus, you should be able to just buffer it all until the other end signals "end of document", and then feed it to your picky XML parser.
If you are using UDP or some other "lossy" protocol, you need to reconsider, since it's obviously not possible to correctly transfer a large XML document over a channel that randomly drops pieces.
Because the XML structure is a hierarchic structure (a tree) a recursion would be the best way to approach this.
You can call the recursion on each child and fix the missing XML identifiers.
Basically, you'll be doing the same thing a DOM object parser would do, only you'll parse the file in order to fix it's structure.
One thing though, it seems to me as if in this method you are going to re-write the XML parser. Isn't it a waist of time?
Maybe it's better to find a way for the XML to arrive in the right structure rather than trying to fix it.
Are there multiple writers? Why isn't your parser validating the XML?
Use a tree, where every node represents an element and carries with it a dirty bit. The first occurrence of the node marks it as dirty i.e. you are expecting a closing tag, unless of course the node is of the form <a/>. Also, the first element, you encounter is the root.
When you hit a dirty node, keep pushing nodes in a stack, until you hit the closing tag, when you pop the contents.
In your example, how are you going to figure out exactly where in the content to put the opening <two> tag once you have detected it is missing? This is, as they say, non-trivial.