I am working on a Windows app using javascript as the UI. We are delegating a heavy processing task to a c++ component in order to increase performance. Unfortunately we have to transfer a large amount of JSON data between our c++ component to the javascript (like length 100000+ length arrays of objects) and I'm not sure the best way to do this. Our app is looking to perform this computation multiple times per second. Truth be told I don't actually know what is going on behind the scenes when data gets passed between javascript and c++.
One approach that I'm considering is to stringify the JSON, and pass that string through to the javascript before parsing. I haven't done any testing on that, but I'm afraid that it would be quite slow. As I mentioned above I don't know how/if the long string length would affect data transfer, and what the stringify and parse impact would be.
Another approach that I have considered is to essentially store the data as a blob, and pass the address to the javascript which could read the data from the blob. However I find the Windows documentation a little sparse/difficult to search (I'm a c++ rookie) and I don't know how I would accomplish that.
What I'm looking for is advice as to which method sounds better (and I'm open to other suggestions!) and/or help as to how to accomplish method 2. Thanks for your time!
Related
Current working of my C++ application is as follows:
1. It involves launching another process and uses windows shared memory to communicate between the two processes.
2. The data is serialized in one process and de-serialized in another process. However, the data type could also vary based on the user inputs, and hence the type is also serialized so that deserializer could interpret the data correctly.
Now, I am intending to use flat-buffer to serialize and deserialize data (because of its obvious advantages - random access and backward compatibility).
However, to do that I need clarity in some areas and hoping for some help on them.
Based on the data type, I can programmatically generate schema and feed it to flatc.exe to generate files. However, instead of using flatc.exe, I am thinking to build flatc.dll (from the open source code) and use that to keep the interaction simpler. Does that sound wiser?
Secondly, what I am more unsure is of the following. I will create a schema and invoke 'Flat Buffer compiler' while the application is running. It will generate some C++ files. Now, as much as I understand I would need to build those files somehow and the built binary should be plugged in both serializer and deserializer to serialize and deserialize the actual data- and this is all while the application is running. How do I achieve all this? This problem is all stemming from the fact that my application does not have any fixed schema. What is the general approach to using flat buffers when the schema is variable?
I hope I am clear about what I am intending to ask. If not, please let me know. I will be happy to provide more details. Thanks for your answers in advance.
The answer is that you do not want this. While it is feasible, especially runtime generation of C++, compiling it into a DLL and then loading it back into your process is an extremely clumsy way of doing it.
The data structures of your program must be known at compile time (if it is written in C++), so why can't you define a schema for that just once and compile it ahead of time? Does your program allow the user to "design" data structures at runtime?
For extremely dynamic use cases such as where the user can create arbitrary objects, I'd recommend FlexBuffers (https://google.github.io/flatbuffers/flexbuffers.html). They can be used inside a FlatBuffer to store "unknown" data, or even as their own serialization format. With these, you can serialize objects whose structure is only known at runtime, they have most of the same efficiency properties of FlatBuffers, and you won't need to bundle a C++ compiler with your program :)
Best is a combination of the two, where all compile time known data is stored in FlatBuffers, and the remainder in FlexBuffers.
I have a perl script which generates a very large data structure (which starts life as an array of array references). This is then written to a text file using some weird home-brew serialisation scheme.
The data from the text file is stored as the value in a key-value store db.
A c++ file then retrieves the data, and deserializes it (into a hashmap, although can potentially be flexible on how this data is structured).
What I'm interested in is finding if there are any good ways of sharing a data structure between perl and c++ (something like Storable, but that is meant for perl->perl not perl->c++). The current method is a headache to maintain, and may not have the best performance.
The most important factors are speed of deserialisation, and the size of the serialized structure in that order. Anyone know of something that might do the trick?
Storable is one way to dump and load perl data structures. I wouldn't actually recommend it for general usage though - it's handy in that it's part of core and easy to use.
But for multi-platform (and language) portability, it's far better to use a standard data representation. Which you choose is probably a matter of what sort of data you're holding in your structure, but core contenders are:
JSON - good for arrays and hashes (key-value).
YAML - Excellent for 'config file' style data (but extends in ways similar to JSON)
And if you must, XML - but bear in mind that XML is designed for documents-with-metadata, and so IMO isn't suitable for most of the applications it's used for.
As standards, they've got documented formatting and parsers are widely available. And implementing your own isn't too hard, if that's the route you want to go. Just make sure you follow the spec and you're good.
Note - that because XML and JSON (and I think YAML?) are recursive, you can parse as a stream, rather than a standalone object. (Trap, process and discard as you hit 'close brackets' in JSON, or 'close tags' in XML).
easy job.
I like perl , and I also like C/C++. To make the best of both,
I wrote a github project to solve this issue.
please see:
https://github.com/tlqtangok/perlcpp
a short example is here :
P_eval("$a=2;$a=$a**10;");
Int("a") ; // a= 1024
This might be a simple question for most people out there but I'm like stuck on it.
I was wondering,most bank softwares or lets say any commercial software when closed at the end of the day and then re-opened the next,how do those programs remember everything from the previous day? I hope I make myself clear, thanks in advance for your guidance
Best.
This is not black magic.
The answer is by saving its data. You do this by putting it in a database, or writing data files.
The trick is to write your programs in a way that makes it easy to guarantee that you've restored the state you thought you saved.
A common approach is to use serialization. This means that you are able to take your giant data structure and recursively call a 'Save' function on it and its contained objects. This is very intuitive if you are taking advantage of object inheritance and polymorphism. Of course, you also write a 'Load' function to do the reverse.
You write your data in such a way that it can be read back in. For example, if you wanted to write a string, you might first write its length and then its characters. That way, when you read it you know how many bytes to allocate.
The above approach is pretty standard if you are writing binary file formats. In fact, it's the philosophy behind chunk-based formats such as AVI.
For text-based, you might choose to serialize your data in popular formats like XML or JSON. But you are only restricted by your imagination.
I would like to know which are the best way to save and load C++ data.
I am mostly interested in saving classes and matrices (not sparse) I use in my simulations.
Now I just save them as txt files, but if I add a member to a class I then have to modify the function that loads the data (it has to parse and check for the value in the txt file),
that I think is not ideal.
What would you recommend in general? (p.s. as I'd like to release my code I'd really like to use only standard c++ or libraries that can be redistributed).
In this case, there is no "best." What is best for you is highly dependent upon your situation. But, lets have an example to get you thinking about your details and how deep this rabbit hole can go.
If you absolutely positively must have the fastest save possible without question (and you're willing to pay the price), you can define your own memory management to put all objects into a contiguous array of a common type (such as integers). This allows you to write that array to disk as binary data very rapidly. You might need this in a simulation that uses threads efficiently to load every core/processor to run at real time.
Why is a rather horrible solution? Because it takes a LOT of work and runs many risks for problems in the name of "optimization."
It requires you to build your own memory management (operator new() and operator delete()) which may need to be thread safe.
If you try to load from this array, you will have to placement new all objects with a unique non-modifying constructor in order to ensure all virtual pointers are set properly. Oh, and you have to track the type of each address to now how to do this.
For portability with other systems and between versions of the binary, you will need to have utilities to convert from the binary format to something generic enough to be cross platform (including repopulating pointers to other objects).
I have done this. It was highly unpleasant. I have no doubt there are still problems with it and I have only listed a few here. But, it was very, very fast and very, very, very problematic.
You must design to your needs. Generally, the first need is "Make it work." Don't care about efficiency, just about something that accurately persists and that you have the information known and accessible at some point to do it. Also, you should encapsulate the process of saving and loading. Then, if the need "Make it better" steps in, you should be able to change that one bit of code and the rest should work. You might even make the saving format selectable on user needs instead of your needs which you must assume for all users.
Given all the assumptions, pros and cons listed, you should be able to elaborate your particular needs for this question.
Given that performance is not your concern -- which is a critical part of the answer -- the Boost Serialization library is a great answer.
The link in the comment leads to the documentation. Read the tutorial (which is overkill for what you are initially wanting, but well worth it).
Finally, since you have mostly array matrices, try to encapsulate the entire process of save and load so that should you need to change it later, you are writing a new implementatio and choosing between the exisiting. I expend the eddedmtime for the smarts of Boost Serialization would not be great; however, you might find a future requirement moves you to something else or multiple something elses.
The C++ Middleware Writer automates the creation of marshalling functions. When you add a member to a class, it updates the marshalling functions for you.
I'm about to start on a little project i'm trying to do where I create a C++ program to store inventory data into a file ( I guess a .txt will do )
• Item Description • Quantity on Hand
• Wholesale Cost • Retail Cost • Date
Added to Inventory
I need to be able to:
• Add new records to the file
• Display any record in the file
• Change any record in the file
Is there anything I should know of before I start this that could make this much more easy & efficient...
Like for example, should I try and use XML or what that be too hard to work with via C++?
I've never really understood the most efficient way of doing this.
Like would I search through the file and look for things in brackets or something?
EDIT
The datasize shouldn't be too large. It is for homework I guess you could say. I want to write the struct's contents into a file's route, how would I go about doing that?
There are many approaches. Is this for homework or for real use? If it's for homework, there are probably some restrictions on what you may use.
Otherwise I suggest some embedded DBMS like SQLite. There are others too, but this will be the most powerful solution, and will also have the easiest implementation.
XML is also acceptable, and has many reusable implementations available, but it will start loosing performance once you go into thousands of records. The same goes for JSON. And one might still debat which one is simpler - JSON or XML.
Another possibility is to create a struct and write its contents directly to the file. Will get tricky though if the record size is not constant. And, if the record format changes, the file will need to be rebuilt. Otherwise this solution could be one of the best performance-wise - if implemented carefully.
Could you please enlighten us why don't you want to use a database engine for it?
If it is just for learning then.... give us please an estimated size of stored data in that file and the access pattern (how many users, how often they do it etc.)?
The challenge will be to create an efficient search and modification code.
For the search, it's about data structures and organization.
For the modification, it's how would you write updates to the file without reading it completely into memory, updating it there and then writing it again completely back to the file.
If this is a project that will actually be used, with the potential to have features added over time, go for a database solution from the start, even if it seems overkill. I've been down this road before, small features get added over time, and before you realize it you have implemented a database. Poorly. Bite the bullet and use a database.
If this is a learning exercise, it depends on the amount of data you want to store. If it is small, the easiest thing to do is read the entire file into memory and operate on it there. When changes are made, write the entire file back out to disk. If the data is too large to do that, the next best thing is to have fixed sized records. Create a POD struct that contains all of the data (i.e., no pointers, stl containers, etc). Then you can rewrite individual records without needed to rewrite the entire file. If neither of these will work, your best bet is a database solution.
If you insist to do it manually, I suggest JSON instead of XML.
Also, consider sqlite.
This sounds like a perfect job for SQLite. Small, fast, flexible and easy to use.