Architectural tips on building a shared resource for different processess [closed] - c++

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
In the company I work at we're dealing with a huge problem: we have a system that consists in several units of processing. We made it this way so each module has specific functionality. The integration between these modules is done using a queue system (which is not fast but we're working on it) and replicating messages between these modules. The problem is that this is generating a great deal of overhead as four of these systems are requiring the same kind of data, and maintaining consistency for these modules is bad.
Another requirement for the system is redundancy, so I was thinking to kill these two problems in one shot.
So I was thinking of using some kind of shared resource. I've looked at shared memories (which are great but could lead to locking inconsistencies if the module crashes leading to inconsistencies in the program), and maybe do some "raw copy" from the segment to another computer to do redundancy.
So I've began to search for alternatives, ideas and something like that. I've found one that is noSQL, but I don't know if the speed that I'm requiring would suffice this.
I need something (ideally):
Memory-like fast
That could provide me redundancy (active-passive is ok, active active is good)

I also think that shared-memory is the way to go. To provide redundancy, let every process copy the data that is going to be changed to local/non-shared memory. Only after the module has done its work, copy it back to shared memory. Make sure the 'copy-to-shared-memory' part is as small as possible and nothing can go wrong while doing the copy. Some tricks you could use are:
Prepare all data in local memory and use one memcpy operation to copy it to shared memory
Use a single value to indicate that the written data is valid. This could be a boolean or something like a version number that indicates the 'version' of the data written in shared memory.

Related

Risks of maintaining/sustaining two code sets, one for CPU one for GPU, that need to perform very similar functions [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
This is a bad title, but hopefully my description is clearer. I am managing a modeling and simulation application that is decades old. For the longest time we have been interested in writing some of the code to run on GPUs because we believe it will speed up the simulations (yes, we are very behind in the times). We finally have the opportunity to do this (i.e. money), and so now we want to make sure we understand the consequences of doing this, specifically to sustaining the code. The problem is that since many of our users do not have high end GPUs (at the moment), we would still need our code to support normal processing and GPU processing (i.e. I believe we will now have two sets of code performing very similar operations). Has anyone had to go through this and have any lesson learned and/or advice that they would like to share? If it helps, our current application is developed with C++ and we are looking at going with NVIDIA and writing in Cuda for the GPU.
This is similar to writing hand-crafted assembly version with vectorization or other assembly instructions, while maintaining a C/C++ version as well. There is a lot of experience with doing this in the long-term out there, and this advice is based on that. (My experience with doing this with GPU cases is both shorter term (a few years) and smaller (a few cases)).
You will want to write unit tests.
The unit tests use the CPU implementations (because I have yet to find a situation where they are not simpler) to test the GPU implementations.
The test runs a few simulations/models, and asserts that the results are identical if possible. These run nightly, and/or with every change to the code base as part of the acceptance suite.
This ensures that both code bases do not go "stale" as they are constantly exercised, and the two indepdendent implementations actually help with maintenance on the other.
Another approach is to run blended solutions. Sometimes running a mix of CPU and GPU is faster than one or the other, even if they are both solving the same problem.
When you have to switch technology (say, to a new GPU language, or to a distributed network of devices, or whatever new whiz-bang that shows up in the next 20 years), the "simpler" CPU implementation will be a life saver.

Where to store code constants when writing a JIT compiler? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
I am writing a JIT compiler for x86-64 and I have a question regarding best practice for inclusion of constants into the machine code I am generating.
My approach thus far is straightforward:
Allocate a chunk of RW memory with VirtualAlloc or mmap
Load the machine code into said memory region.
Mark the page executable with VirtualProtect or mprotect (and remove the write privilege for security).
Execute.
When I am generating the code, I have to include constants (numerical, strings) and I am not sure what is the best way to go about it. I have several approaches in mind:
Store all constants as immediate values into instructions' opcodes. This seems like a bad idea for everything except maybe small scalar values.
Allocate a separate memory region for constants. This seems to me like the best idea but it complicates memory management slightly and compilation workflow - I have to know the memory location before I can start writing the executable code. Also I am not sure if this affects performance somehow due to worse memory locality.
Store the constants in the same region as the code and access it with RIP-relative addressing. I like this approach since it keeps relevant parts of the program together but I feel slightly uneasy about mixing instructions and data.
Something completely different?
What is the preferable way to go about this?
A lot depends on how you are generating your binary code. If you use a JIT assembler that handles labels and figuring out offsets, things are pretty easy. You can stick the constants in a block after the end of the code, using pc-relative references to those labels and end up with a single block of bytes with both the code and the constants (easy management). If you're trying to generate binary code on the fly, you already have the problem of figuring out how to handle forward pc-relative references (eg for forward branches). If you use back-patching, you need to extend that to support references to your constants block.
You can avoid the pc-relative offset calculations by putting the constants in a separate block and passing the address of that block as a parameter to your code. This is pretty much the "Allocate a separate region for constants" you propose. You don't need to know the address of the block if you pass it in as an argument.

Advice for keeping large C++ project modular? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
Our team is moving into much larger projects in size, many of which use several open source projects within them.
Any advice or best practices to keep libraries and dependancies relatively modular and easily upgradable when new releases for them are out?
To put it another way, lets say you make a program that is a fork of an open source project. As both projects grow, what is the easiest way to maintain and share updates to the core?
Advice regarding what I'm asking only please...I don't need "well you should do this instead" or "why are you"..thanks.
With clones of open source projects one of your biggest headaches will be keeping in sync/patched according to the upstream sources. You might not care about new features, but you will sure need critical bug fixes applied.
My suggestion would be to carefully wrap such inner projects into shared libraries, so you can more or less painlessly upgrade just those parts if the ABI is not broken by the changes.
One more thing - if you find and fix bugs in an open source project - don't keep the fixes to yourself. Push the patches upstream. That will make the project better and will save you days of merging with a new version.
In order of preference
Make as few changes as possible to the third party libraries. Try and get around their limitations within your code. Document your changes and then submit a patch.
If you can't get around their limitations, submit your change as a patch (this may be idealistic with the glacial pace of some projects).
If you can't do either of those things, document what you've changed in a centralized location so that the poor person doing the integration for new versions can figure out what the heck you were doing, and if the changes made are still needed.
1 and 2 are greatly preferred (however, fast and very slow respectively), while the third option will only lead to headaches and bugs as your code base deviates from the dependencies' code base. In my code, I don't even have the 3rd party code loaded up in the IDE unless I have to peruse a header file. This removes temptation to change things that aren't mine.
As far as modularity, and this is assuming you are using relatively stable 3rd party libraries, only program to the public facing interface. Just because you have the source doesn't mean you have to use it all over your code. This should allow updates to be essentially drag and drop. Now, this is completely idealistic but its what I strive for with code I work on.

IDL-like parser that turns a document definition into powerful classes? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 8 years ago.
Improve this question
I am looking for an IDL-like (or whatever) translator which turns a DOM- or JSON-like document definition into classes which
are accessible from both C++ and Python, within the same application
expose document properties as ints, floats, strings, binary blobs and compounds: array, string dict (both nestable) (basically the JSON type feature set)
allow changes to be tracked to refresh views of an editing UI
provide a change history to enable undo/redo operations
can be serialized to and from JSON (can also be some kind of binary format)
allow to keep large data chunks on disk, with parts only loaded on demand
provide non-blocking thread-safe read/write access to exchange data with realtime threads
allow multiple editors in different processes (or even on different machines) to view and modify the document
The thing that comes closest so far is the Blender 2.5 DNA/RNA system, but it's not available as a separate library, and badly documented.
I'm most of all trying to make sure that such a lib does not exist yet, so I know my time is not wasted when I start to design and write such a thing. It's supposed to provide a great foundation to write editing UI components.
ICE is the closest product I could think of. I don't know if you can do serialization to disk with ICE, but I can't think of a reason why it wouldn't. Problem is it costs $$$. I haven't personally negotiated a license with them, but ICE is the biggest player I know of in this domain.
Then you have Pyro for python which is Distributed Objects only.
Distributed Objects in Objective-C (N/A for iPhone/iPad Dev, which sucks IMHO)
There are some C++ distributed objects libraries but they're mostly dead and unusable (CORBA comes to mind).
I can tell you that there would be a lot of demand for this type of technology. I've been delving into some serialization and remote object stuff since off-the-shelf solutions can be very expensive.
As for open-source frameworks to help you develop in-house, I recommend boost::asio's strands for async thread-safe read/write and boost::serialization for serialization. I'm not terribly well-read in JSON tech but this looks like an interesting read.
I wish something freely available already existed for this networking/serialization glue that so many projects could benefit from.
SWIG doesn't meet all your requirements, but does make interfacing c++ <-> python a lot easier.

C++ serialization library that supports partial serialization? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 5 years ago.
Improve this question
Are there any good existing C++ serialization libraries that support partial serialization?
By "partial serialization" I mean that I might want to save the values of 3 specific members, and later be able to apply that saved copy to a different instance. I'd only update those 3 members and leave the others intact.
This would be useful for synchronizing data over a network. Say I have some object on a client and a server, and when a member changes on the server I want to send the client a message containing the updated value for that member and that member only. I don't want to send a copy of the whole object over the wire.
boost::serialization at a glance looks like it only supports all or nothing.
Edit: 3 years after originally writing this I look back at it and say to myself, 'wut?' boost::serialization lets you define what members you want saved or not, so it would support 'partial serialization' as I seem to have described it. Further, since C++ lacks reflection serialization libraries require you to explicitly specify each member you're saving anyway unless they come with some sort of external tooling to parse the source files or have a separate input file format that is used to generate C++ code (e.g. what Protocol Buffers does). I think I must have been conceptually confused when I wrote this.
You're clearly not looking for serialization here.
Serialization is about saving an object and then recreating it from the stream of bytes. Think video games saves or the session context for a webserver.
Here what you need is messaging. Google's FlatBuffers is nice for that. Specify a message that will contain every single field as optional, upon reception of the message, update your object with the fields that do exist and leave the others untouched.
The great thing with FlatBuffers is that it handles forward and backward compatibility nicely, as well as text and binary encoding (text being great for debugging and binary being better for pure performance), on top of a zero-cost parsing step.
And you can even decode the messages with another language (say python or ruby) if you save them somewhere and want to throw together a html gui to inspect it!
Although I'm not familiar with them, you could also check out Google's Protocol Buffers
.