How to parse JSON larger than memory? - c++

I'm working on a project that involves a large JSON file, basically a multidimensional array dumped in JSON form, but the overall size would be larger than the amount of memory I have. If I load it in as a string and then parse the string, that will consume all of the memory.
Are there any methods to limit the memory consumption, such as only retrieving data between specific indices? Could I implement that using solely the Nlohmann json library/the standard libraries?

RapidJSON and others can do it. Here's an example program using RapidJSON's "SAX" (streaming) API: https://github.com/Tencent/rapidjson/blob/master/example/simplereader/simplereader.cpp
This way, you'll get an event (callback) for each element encountered during parsing. The memory consumption of the parsing itself will be quite small.

Could you please specify the context of your question
What programming language you are using (NodeJS, Vanilla JavaScript, Java, React)
What environment your code is running (Monolithic app on a server, AWS Lambda, Serverless)
Computing large JSON files can consume a lot of memory resources on a server, perhaps, make your app to crash.
I have experienced first-hand, that manipulating large JSON files on my local computer with 8 GB of memory RAM is not a problem using a NodeJS script to compute the large JSON files payloads. However, trying to run those large JSON payloads in an application running on a server give me problems too.
I hope this helps.

Using DAW JSON Link, https://github.com/beached/daw_json_link , you can create an iterator pair/range and iterate over the JSON array 1 record at a time. The library also has routines for working with JSONL, which is common in large datasets.
For opening the file, I would use something like mmap/virtual alloc to handle that for us. The examples in the library use this via the daw::filesystem::memory_mapped_file_t type that abstracts the file mapping.
With that, the memory mapped file allows the OS to page the data in/out as needed, and the iterator like interface keeps the memory requirement to that of one array element at a time.
The following demonstrates this, using a simple Record that
struct Point {
int x;
int y;
};
The program to do this looks like
#include <cassert>
#include <daw/daw_memory_mapped_file.h>
#include <daw/json/daw_json_iterator.h>
#include <daw/json/daw_json_link.h>
#include <iostream>
struct Point {
double x;
double y;
};
namespace daw::json {
template<>
struct json_data_contract<Point> {
using type =
json_member_list<json_number<"x">, json_number<"y">>;
};
}
int main( int argc, char** argv ) {
assert( argc >= 1 );
auto json_doc = daw::filesystem::memory_mapped_file_t<char>( argv[1] );
assert( json_doc.size( ) > 2 );
auto json_range = daw::json::json_array_range<Point>( json_doc );
auto sum_x = 0.0;
auto sum_y = 0.0;
auto count = 0ULL;
for( Point p: json_range ) {
sum_x += p.x;
sum_y += p.y;
++count;
}
sum_x /= static_cast<double>( count );
sum_y /= static_cast<double>( count );
std::cout << "Centre Point (" << sum_x << ", " << sum_y << ")\n";
}
https://jsonlink.godbolt.org/z/xoxEd1z6G

Related

Protobuf vs Flatbuffers vs Cap'n proto which is faster?

I decided to figure out which of Protobuf, Flatbuffers and Cap'n proto would be the best/fastest serialization for my application. In my case sending some kind of byte/char array over a network (the reason I serialized to that format). So I made simple implementations for all three where i seialize and dezerialize a string, a float and an int. This gave unexpected resutls: Protobuf being the fastest. I would call them unexpected since both cap'n proto and flatbuffes "claims" to be faster options. Before I accept this I would like to see if I unitentionally cheated in my code somehow. If i did not cheat I would like to know why protobuf is faster (exactly why is probably impossible). Could the messages be to simeple for cap'n proto and faltbuffers to really make them shine?
My timings:
Time taken flatbuffers: 14162 microseconds
Time taken capnp: 60259 microseconds
Time taken protobuf: 12131 microseconds
(time from one machine. Relative comparison might be relevant.)
UPDATE: The above numbers are not representative of CORRECT usage, at least not for capnp -- see answers & comments.
flatbuffer code:
int main (int argc, char *argv[]){
std::string s = "string";
float f = 3.14;
int i = 1337;
std::string s_r;
float f_r;
int i_r;
flatbuffers::FlatBufferBuilder message_sender;
int steps = 10000;
auto start = high_resolution_clock::now();
for (int j = 0; j < steps; j++){
auto autostring = message_sender.CreateString(s);
auto encoded_message = CreateTestmessage(message_sender, autostring, f, i);
message_sender.Finish(encoded_message);
uint8_t *buf = message_sender.GetBufferPointer();
int size = message_sender.GetSize();
message_sender.Clear();
//Send stuffs
//Receive stuffs
auto recieved_message = GetTestmessage(buf);
s_r = recieved_message->string_()->str();
f_r = recieved_message->float_();
i_r = recieved_message->int_();
}
auto stop = high_resolution_clock::now();
auto duration = duration_cast<microseconds>(stop - start);
cout << "Time taken flatbuffer: " << duration.count() << " microseconds" << endl;
return 0;
}
cap'n proto code:
int main (int argc, char *argv[]){
char s[] = "string";
float f = 3.14;
int i = 1337;
const char * s_r;
float f_r;
int i_r;
::capnp::MallocMessageBuilder message_builder;
Testmessage::Builder message = message_builder.initRoot<Testmessage>();
int steps = 10000;
auto start = high_resolution_clock::now();
for (int j = 0; j < steps; j++){
//Encodeing
message.setString(s);
message.setFloat(f);
message.setInt(i);
kj::Array<capnp::word> encoded_array = capnp::messageToFlatArray(message_builder);
kj::ArrayPtr<char> encoded_array_ptr = encoded_array.asChars();
char * encoded_char_array = encoded_array_ptr.begin();
size_t size = encoded_array_ptr.size();
//Send stuffs
//Receive stuffs
//Decodeing
kj::ArrayPtr<capnp::word> received_array = kj::ArrayPtr<capnp::word>(reinterpret_cast<capnp::word*>(encoded_char_array), size/sizeof(capnp::word));
::capnp::FlatArrayMessageReader message_receiver_builder(received_array);
Testmessage::Reader message_receiver = message_receiver_builder.getRoot<Testmessage>();
s_r = message_receiver.getString().cStr();
f_r = message_receiver.getFloat();
i_r = message_receiver.getInt();
}
auto stop = high_resolution_clock::now();
auto duration = duration_cast<microseconds>(stop - start);
cout << "Time taken capnp: " << duration.count() << " microseconds" << endl;
return 0;
}
protobuf code:
int main (int argc, char *argv[]){
std::string s = "string";
float f = 3.14;
int i = 1337;
std::string s_r;
float f_r;
int i_r;
Testmessage message_sender;
Testmessage message_receiver;
int steps = 10000;
auto start = high_resolution_clock::now();
for (int j = 0; j < steps; j++){
message_sender.set_string(s);
message_sender.set_float_m(f);
message_sender.set_int_m(i);
int len = message_sender.ByteSize();
char encoded_message[len];
message_sender.SerializeToArray(encoded_message, len);
message_sender.Clear();
//Send stuffs
//Receive stuffs
message_receiver.ParseFromArray(encoded_message, len);
s_r = message_receiver.string();
f_r = message_receiver.float_m();
i_r = message_receiver.int_m();
message_receiver.Clear();
}
auto stop = high_resolution_clock::now();
auto duration = duration_cast<microseconds>(stop - start);
cout << "Time taken protobuf: " << duration.count() << " microseconds" << endl;
return 0;
}
not including the message definition files scince they are simple and most likely has nothing to do with it.
In Cap'n Proto, you should not reuse a MessageBuilder for multiple messages. The way you've written your code, every iteration of your loop will make the message bigger, because you're actually adding on to the existing message rather than starting a new one. To avoid memory allocation with each iteration, you should pass a scratch buffer to MallocMessageBuilder's constructor. The scratch buffer can be allocated once outside the loop, but you need to create a new MallocMessageBuilder each time around the loop. (Of course, most people don't bother with scratch buffers and just let MallocMessageBuilder do its own allocation, but if you choose that path in this benchmark, then you should also change the Protobuf benchmark to create a new message object for every iteration rather than reusing a single object.)
Additionally, your Cap'n Proto code is using capnp::messageToFlatArray(), which allocates a whole new buffer to put the message into and copies the entire message over. This is not the most efficient way to use Cap'n Proto. Normally, if you were writing the message to a file or socket, you would write directly from the message's original backing buffer(s) without making this copy. Try doing this instead:
kj::ArrayPtr<const kj::ArrayPtr<const capnp::word>> segments =
message_builder.getSegmentsForOutput();
// Send segments
// Receive segments
capnp::SegmentArrayMessageReader message_receiver_builder(segments);
Or, to make things more realistic, you could write the message out to a pipe and read it back in, using capnp::writeMessageToFd() and capnp::StreamFdMessageReader. (To be fair, you would need to make the protobuf benchmark write to / read from a pipe as well.)
(I'm the author of Cap'n Proto and Protobuf v2. I'm not familiar with FlatBuffers so I can't comment on whether that code has any similar issues...)
On benchmarks
I've spent a lot of time benchmarking Protobuf and Cap'n Proto. One thing I've learned in the process is that most simple benchmarks you can create will not give you realistic results.
First, any serialization format (even JSON) can "win" given the right benchmark case. Different formats will perform very, very differently depending on the content. Is it string-heavy, number-heavy, or object heavy (i.e. with deep message trees)? Different formats have different strengths here (Cap'n Proto is incredibly good at numbers, for example, because it doesn't transform them at all; JSON is incredibly bad at them). Is your message size incredibly short, medium-length, or very large? Short messages will mostly exercise the setup/teardown code rather than body processing (but setup/teardown is important -- sometimes real-world use cases involve lots of small messages!). Very large messages will bust the L1/L2/L3 cache and tell you more about memory bandwidth than parsing complexity (but again, this is important -- some implementations are more cache-friendly than others).
Even after considering all that, you have another problem: Running code in a loop doesn't actually tell you how it performs in the real world. When run in a tight loop, the instruction cache stays hot and all the branches become highly predictable. So a branch-heavy serialization (like protobuf) will have its branching cost swept under the rug, and a code-footprint-heavy serialization (again... like protobuf) will also get an advantage. This is why micro-benchmarks are only really useful to compare code against other versions of itself (e.g. to test minor optimizations), NOT to compare completely different codebases against each other. To find out how any of this performs in the real world, you need to measure a real-world use case end-to-end. But... to be honest, that's pretty hard. Few people have the time to build two versions of their whole app, based on two different serializations, to see which one wins...

How to access pixel values using NITRO NITF library using C++ bindings

I have a task to write a module to unpack National Imagery Transmission Format (NITF) images and pass around the data in memory to various processing modules. I have chosen to use the NITRO library. I am trying to figure out how to read the image and access the pixel values, but I am having trouble. I am using the C++ bindings.
I successfully compiled the library. Now, I am trying to use the unit tests to understand how to use the library, namely read an image and access the pixel values. There are also some examples here. However, the unit tests and code snippets don't perform this task directly.
My toy example is below. I've tried variations of the code below, but I almost always get some error in image_reader.read(). The code below results in an error about too many bands, but if I limit the number of bands, then I don't get an error but the buffer doesn't seem to have any values in it.
I would be grateful to anyone who could give me some guidance or tips on how to use this library to access pixel values.
#include "stdafx.h"
#define IMPORT_NITRO_API
#include <import/nitf.hpp>
int _tmain(int argc, _TCHAR* argv[])
{
const std::string filename = "my_image.NTF";
nitf::Reader reader;
nitf::IOHandle io(filename.c_str());
nitf::Record record = reader.read(io);
nitf::List images = record.getImages();
nitf::ListIterator iter = images.begin(); // NITF can store more than one image - just try the first
nitf::ImageSegment segment = *iter;
nitf::SubWindow window; // define a subwindow for reading - try to read the whole image although it might be slow
unsigned int numRows = segment.getSubheader().getNumRows();
unsigned int numCols = segment.getSubheader().getNumCols();
const int band_count = segment.getSubheader().getBandCount();
window.setNumRows(numRows);
window.setNumCols(numCols);
window.setNumBands(band_count);
nitf::Uint32* band_list = new nitf::Uint32[band_count];
for (nitf::Uint32 band_number = 0; band_number < band_count; band_number++)
band_list[band_number] = band_number;
window.setBandList(band_list);
auto image_reader = reader.newImageReader(1); // 1 seems to be the image number: nitro-master\c\nitf\tests\test_create_xmltre.c
std::vector< std::vector<nitf::Uint8> > buffer(band_count); // User-defined data buffers for read
for (nitf::Uint32 band_number = 0; band_number < band_count; band_number++)
buffer[band_number].resize(numRows * numCols);
int padded = 0; // Returns TRUE if pad pixels may have been read
image_reader.read(window, (nitf::Uint8**)&buffer[0], &padded);
return 0;
}

How to fix serialization problems MQL4?

Today I get problems with serialization in MQL4.
I have a method, which I imported from a DLL:
In MQL4:
void insertQuery( int id,
string tableName,
double &values[4],
long &times[3],
int volume
);
In DLL:
__declspec(dllexport) void __stdcall insertQuery( int id,
wchar_t *tableName,
double *values,
long *times,
int volume
);
I tested it with this function calls in MQL4:
string a = "bla";
double arr[4] = { 1.1, 1.3, 0.2, 0.9 };
long A[3] = { 19991208, 19991308, 19992208 };
int volume = 1;
insertQuery( idDB, a, arr, A, volume );
Inside of this method I collect this values to files.
C++ :
stringstream stream;
stream << " '";
for (int i = 0; i < 2; ++i) {
stream << times[i] << "' , '";
}
stream << times[2] << ", ";
for (int i = 0; i < 4; ++i) {
stream << values[i] << ", ";
}
stream << volume;
wstring table(tableName);
query.append("INSERT INTO ");
query.append(table.begin(), table.end());
query.append(" VALUES (");
query.append(stream.str());
query.append(" )");
std::ofstream out("C:\\Users\\alex\\Desktop\\text.txt");
out << query;
out.close();
But in output file I receive this record:
INSERT INTO bla VALUES ( '19991208' , '0' , '19991308, 1.1, 1.3, 0.2, 0.9, 1 )
So my question is : why I lose one long value in array when I receive my record in DLL?
I tried a lot of ways to solve this problem ( I transfered two and three long values, etc ) and always I get a result that I lose second long value at serialization. Why?
The problem is cause because in MQL4, a long is an 8 bytes, while a long in C++ is a 4 bytes.
What you want is a long long in your C++ constructor.
Or you could also pass them as strings, then convert them into the appropriate type within your C++ code.
Well, be carefull, New-MQL4.56789 is not a c-compatible language
The first thing to test is to avoid passing MQL4 string into DLL calling interface, where really a c-lang string is expected.
Since old-MQL4 has been silently re-defined into a still-WIP-creeping syntax New-MQL4,the MQL4 string is not a string, but a struct.
Root-cause [ isolation ]:
Having swallowed the shock about string/struct trouble, if you can, first try to test the MQL4/DLL interactions without passing any string to proof, that all other parameters, passed by value and addressed by-ref, do get their way to the hands of a DLL-function as you expect.
If this works as you wish, proceed to the next step:
How to pass the very data to expected string representation, then?
Let me share a dirty hack I used for passing data where DLL expects string-s
#import "mql4TOOL.dll"
...
int mql4TOOL_msg_init_data ( int &msg[],
uchar &data[],
int size
);
...
#import
...
int tool_msg_init_data ( int &msg[], string data, int size ) { uchar dataChar[]; StringToCharArray( data, dataChar );
return ( mql4TOOL_msg_init_data ( msg, dataChar, size ) );
}
Yes, dirty, but works for years and saved us many tens-of-man*years of re-engineering upon a maintained code-base with heavy dependence on the MQL4/DLL interfacing in massively distributed heterogeneous computing systems.
The last resort:
If all efforts went in vain, go low level, passing a uchar[] as needed, where you assemble some serialised representation in MQL4 and parse that on the opposite end, before processing the intended functionality.
Ugly?
Yes, might look like that,butkeeps you focused on core-functionality and isolates you from any next shift of paradigm if not only strings cease to be strings et al.

How can I send an object (or pointer) from C++ .NET application to VB application

I have 2 applications.
VB application is written in .NET 3.5. It is pretty big application. I can't rewrite this to C++ for few reasons. Im not sure if that matters, but it is x86 application.
C++ application is written in .NET 4.0. It is x64 build and there will be no x86 support. For now - it is managed code with a bit of assembler code. I will mix managed and unmanaged later when I learn more about C++. It is x64 build and has to stay like this.
It is supposed to extend VB application features - capture frames from camera, do something with them and send processed images to VB application. Images are pretty big (1920x1080x24bpp) and I need to process 30-60 frames per second like that, so it must be efficent way.
My goals:
"Send" bitmap from C++ application to VB application, and VB application should start some method when that bitmap came.
"Send" some information the other way, from VB application to C++ application. It is supposed to change C++ application processing parameters from VB application GUI.
If possible - send just a pointer and size of bitmap instead of copying whole data in RAM.
Lets say, I want something like this:
VB side:
Function receivedBitmapFromCpp(BMP_POINTER?, BMP_SIZE_X?, BMP_SIZE_Y?, BMP_BPP?) As Integer Handles ????
End Function
C++ side:
void sendBitmapToVb(BMP_POINTER?, BMP_SIZE_X?, BMP_SIZE_Y?, BMP_BPP?)
{
int bitmapsendingresult = ?????
}
It may be System.Drawing.Bitmap, or just some array that I will convert to System.Drawing.Bitmap in VB application. It doesn't matter that much.
My question:
Can someone explain, how can I:
send some object data (like System.Drawing.Bitmap for example), or better pointer to that data from VB application to C++ application
receive that data in C++ application
start some C++ function (with some event?) when data is received/ready
Use shared memory circular buffer to exchange data between processes. This could be implemented using boost interprocess as a C++ dll and then that dll could be imported into your .Net applications. Note that you will need to build 32 and 64 bit versions of boost and your shared memory dll. I prepared an example of 32 bit and 64 bit apps which you can run and see how fast that is. I think it should be fast enough but if is not then still multithreading could be used.
64 bit producer:
#include <boost/interprocess/shared_memory_object.hpp>
#include <boost/interprocess/mapped_region.hpp>
#include <boost/interprocess/sync/interprocess_mutex.hpp>
#include <boost/interprocess/sync/scoped_lock.hpp>
#include <cstring>
#include <cstdlib>
#include <string>
#include <iostream>
#include <chrono>
struct shmem_info
{
boost::interprocess::interprocess_mutex mutex;
uint64_t pos;
bool run;
};
int main(int argc, char *argv[])
{
using namespace boost::interprocess;
struct shm_remove
{
shm_remove() { shared_memory_object::remove("MySharedMemory"); shared_memory_object::remove("MySharedMemoryInfo");}
//~shm_remove() { shared_memory_object::remove("MySharedMemory"); shared_memory_object::remove("MySharedMemoryInfo");}
} remover;
const size_t width = 1920;
const size_t height = 1080;
const size_t bytes_per_pixel = 3;
const size_t frame_size = width*height*bytes_per_pixel;
const size_t frames = 60;
const size_t shmem_frames = 3 * frames;
const size_t shmem_size = width * height * bytes_per_pixel * shmem_frames;
std::cout << "Generating data ..." << std::endl;
std::vector<uint8_t> frame(frame_size);
// generate frame data
for (size_t x = 0; x < width*height; ++x)
for (size_t y = 0; y < bytes_per_pixel; ++y)
frame[x*bytes_per_pixel + y] = (x%252) + y;
std::cout << "Creating shared memory files ..." << std::endl;
shared_memory_object shm(create_only, "MySharedMemory", read_write);
shared_memory_object shm_info(create_only, "MySharedMemoryInfo", read_write);
//Set size
shm.truncate(shmem_size);
shm_info.truncate(sizeof(shmem_info));
//Map the whole shared memory in this process
mapped_region region(shm, read_write);
mapped_region region_info(shm_info, read_write);
shmem_info *info = new (region_info.get_address()) shmem_info;
{
scoped_lock<interprocess_mutex> lock(info->mutex);
info->pos = 0;
info->run = true;
}
char c;
std::cout << "Ready. Now start client application and wait for it to be ready." << std::endl;
std::cout << "Then press a key and enter to start" << std::endl;
std::cin >> c;
std::cout << "Running ..." << std::endl;
std::chrono::steady_clock::time_point start = std::chrono::steady_clock::now();
size_t times = 10;
for (size_t t = 0; t < times; ++t)
{
for (size_t f = 0; f < shmem_frames; ++f)
{
// get pointer to the beginning of shared memory
uint8_t *ptr = static_cast<uint8_t*>(region.get_address());
// move pointer to the next frame
ptr += f*frame_size;
// modify first data point for testing purposes
frame[0] = f;
frame[1] = f + 1;
frame[2] = f + 2;
// copy data to shared memory
memcpy(ptr, &frame[0], frame_size);
// update the position each "frames" number, doing that too frequently kills the performance
if (f % frames == 0)
{
// this will lock access to the pos for the time of updating the pos only
scoped_lock<interprocess_mutex> lock(info->mutex);
info->pos += frames;
std::cout << "write pos = " << info->pos << std::endl;
}
}
}
std::chrono::steady_clock::time_point end = std::chrono::steady_clock::now();
size_t ms = std::chrono::duration_cast<std::chrono::milliseconds>(end - start).count();
std::cout << (double(times*shmem_frames*1000) / double(ms)) << " fps." << std::endl;
winapi::sleep(2000);
// stop run
{
scoped_lock<interprocess_mutex> lock(info->mutex);
info->run = false;
}
return 0;
}
32 bit consumer:
#include <boost/interprocess/shared_memory_object.hpp>
#include <boost/interprocess/mapped_region.hpp>
#include <boost/interprocess/sync/interprocess_mutex.hpp>
#include <boost/interprocess/sync/scoped_lock.hpp>
#include <cstring>
#include <cstdlib>
#include <string>
#include <iostream>
#include <chrono>
struct shmem_info
{
boost::interprocess::interprocess_mutex mutex;
uint64_t pos;
bool run;
};
int main(int argc, char *argv[])
{
using namespace boost::interprocess;
const size_t width = 1920;
const size_t height = 1080;
const size_t bytes_per_pixel = 3;
const size_t frame_size = width*height*bytes_per_pixel;
const size_t frames = 60;
const size_t shmem_frames = 3 * frames;
const size_t shmem_size = width * height * bytes_per_pixel * shmem_frames;
std::vector<uint8_t> frame(frame_size);
std::cout << "Opening shared memory files ..." << std::endl;
//Open already created shared memory object.
shared_memory_object shm(open_only, "MySharedMemory", read_write);
shared_memory_object shm_info(open_only, "MySharedMemoryInfo", read_write);
//Map the whole shared memory in this process
mapped_region region(shm, read_only);
mapped_region region_info(shm_info, read_write);
shmem_info *info = static_cast<shmem_info*>(region_info.get_address());
std::cout << "Ready." << std::endl;
bool run = true;
// first wait for processing to be started
while (true)
{
{
scoped_lock<interprocess_mutex> lock(info->mutex);
if (info->run)
break;
}
winapi::Sleep(1);
}
std::chrono::steady_clock::time_point start = std::chrono::steady_clock::now();
uint64_t pos = 0;
uint64_t shm_pos = 0;
while(run)
{
// wait for a new data
{
scoped_lock<interprocess_mutex> lock(info->mutex);
run = info->run;
if (info->pos == pos)
{
winapi::Sleep(1);
continue;
}
// we've got the new data
shm_pos = info->pos;
}
while (pos < shm_pos)
{
// get pointer to the beginning of shared memory
uint8_t *ptr = static_cast<uint8_t*>(region.get_address());
// calculate the frame position in circular buffer and move pointer to that frame
ptr += (pos%shmem_frames)*frame_size;
// copy data from shared memory
memcpy(&frame[0], ptr, frame_size);
//winapi::Sleep(1);
++pos;
if (pos % frames == 0)
std::cout << "read pos: " << pos << std::endl;
}
}
std::chrono::steady_clock::time_point end = std::chrono::steady_clock::now();
size_t ms = std::chrono::duration_cast<std::chrono::milliseconds>(end - start).count();
ms -= 2000; // producer waits 2 seconds before sets run=false
std::cout << (double(pos*1000) / double(ms)) << " fps." << std::endl;
return 0;
}
I used boost 1.58, first circle is always slow, you may want to run a warm up circle before start using the shared memory. The data needs to be copied into shared memory but for reading the shmem pointer to the frame could be passed to the .Net application. Then you need to ensure your .Net app reads the data on time before it gets overwritten.
Useful links:
boost interpocess
Simple example
EDIT: I've modified the source code to show number of frames per seconds that roughly can be achieved. On my machine that is 190+ fps so I would expect it to be still above the required 60 fps taking into account the overhead of transferring data/pointer between .Net app and c++ dll.
Above code should give you a good start, you need to refactor producer and consumer shared memory code into a common class and make it a dll. There are few ways of importing c++ dll into .Net. How-to-Marshal-a-C-Class explains some of them quite well.
Now to your questions:
Can someone explain, how can I:
send some object data (like System.Drawing.Bitmap for example), or better pointer to that data from VB application to C++ application
You will need to get HBITMAP from Bitmap using GetHbitmap() method and pass it down to c++ dll. Then in c++ dll copy pixel data and other bitmap info if required into shared memory (pointer to data won't work). How to do that between .Net and c++ see c-get-raw-pixel-data-from-hbitmap. Especially useful will be this answer.
receive that data in C++ application
Then to read data from shared memory you will probably need to first create empty Bitmap of same size like in shared memory and pass its HBITMAP down to C++ dll to fill the pixel data.
start some C++ function (with some event?) when data is received/ready
You just need to continuously poll shared memory for new data like in the above code.
you should compile your VB application to x64 architecture. Since one of your binaries is in x64 so you should be ok with it. From your description I understood that youre using managed C++ .NET. Your C++ project should be compiled to a dll, because it doest not do anything else than just extending VB. So you can just import that .NET dll into your VB application. Now you can use managed C++ .NET functionality inside VB.
If you however not using managed C++ .NET. You can choose one of fallowing.
You can make native C wrapper to your C++. It can be pretty basic, you can make a function witch takes some arguments, and a function pointer as a callback (you wanted a callback) to get a bitmap. Use inter operations to probe you C++ dll for those functions (inter operators are found at Runtime.Interop namespace). Make a VB wrapper around those functions inside you VB application. You can convert methods inside VB to delegates representing function pointers, ant pass those delegates as callbacks.
Or you can make managed C++ .NET wrapper around your native C++ methods. Again, it can be basic, you can make a basic class methods inside managed C++ .NET witch just forwards arguments to native code. Compile everything to managed C++ .NET dll, and include that dll to you VB application and use its functionality. Cheers.
I would use a Named Pipe. It will allow you to send and receive data between processes. Once you receive the data, you can do whatever you want with it. That said, you won't be able to share objects between processes, so don't try sending a pointer. The good news is that named piped is supported by .NET 3.5 and .NET 4.0 according to this page. If you want examples, there are plenty online.
You can do the following:
create a folder that the c++ application generate the images in
in you vb application add FileSystemWatcher so when the first
application (c++) add any file to the common folder, automatically
the second application will read it
you can use this link to learn more about the FileSystemWatcher
you might let the first application to name the file in a specific way for exmaple p-dd-mm-yyyy-hh-mm-ss and the second application rename it to c-dd-mm-yyyy-hh-mm-ss wher p: pending, c: completed and the dd-mm-yyyy-hh-mm-ss datetime value
based on your comment, hope you can find a solution mentioned here How to do CreateFileMapping in a C++ DLL and access it in C#, VB and C++
You are using .Net. So you can use the interop lib. It contains a IntPtr class you can use to capsule the access to the c++ pointer. The bitmap class even has a constructor with an IntPtr. Have a look at the MSDN for Bitmap and IntPtr.

Pre-compute once cos() and sin() in tables

I'd like to improve performance of my Dynamic Linked Library (DLL).
For that I want to use lookup tables of cos() and sin() as I use a lot of them.
As I want maximum performance, I want to create a table from 0 to 2PI that contains the resulting cos and sin computations.
For a good result in term of precision, I think tables of 1 mb for each function is a good trade between size and precision.
I would like to know how to create and uses these tables without using an external file (as it is a DLL) : I want to keep everything within one file.
Also I don't want to compute the sin and cos function when the plugin starts : they have to be computed once and put in a standard vector.
But how do I do that in C++?
EDIT1: code from jons34yp is very good to create the vector files.
I did a small benchmark and found that if you need good precision and good speed you can do a 250000 units vector and linear interpolate between them you will have a 7.89E-11 max error (!) and it is the fastest between all the approximations I tried (and it is more than 12x faster than sin() (13,296 x faster exactly)
Easiest solution is to write a separate program that creates a .cc file with definition of your vector.
For example:
#include <iostream>
#include <cmath>
int main()
{
std::ofstream out("values.cc");
out << "#include \"static_values.h\"\n";
out << "#include <vector>\n";
out << "std::vector<float> pi_values = {\n";
out << std::precision(10);
// We only need to compute the range from 0 to PI/2, and use trigonometric
// transformations for values outside this range.
double range = 3.141529 / 2;
unsigned num_results = 250000;
for (unsigned i = 0; i < num_results; i++) {
double value = (range / num_results) * i;
double res = std::sin(value);
out << " " << res << ",\n";
}
out << "};\n"
out.close();
}
Note that this is unlikely to improve performance, since a table of this size probably won't fit in your L2 cache. This means a large percentage of trigonometric computations will need to access RAM; each such access costs roughly several hundreds of CPU cycles.
By the way, have you looked at approximate SSE SIMD trigonometric libraries. This looks like a good use case for them.
You can use precomputation instead of storing them already precomputed in the executable:
double precomputed_sin[65536];
struct table_filler {
table_filler() {
for (int i=0; i<65536; i++) {
precomputed_sin[i] = sin(i*2*3.141592654/65536);
}
}
} table_filler_instance;
This way the table is computed just once at program startup and it's still at a fixed memory address. After that tsin and tcos can be implemented inline as
inline double tsin(int x) { return precomputed_sin[x & 65535]; }
inline double tcos(int x) { return precomputed_sin[(x + 16384) & 65535]; }
The usual answer to this sort of question is to write a small
program which generates a C++ source file with the values in
a table, and compile it into your DLL. If you're thinking of
tables with 128000 entries (128000 doubles are 1MB), however,
you might run up against some internal limits in your compiler.
In that case, you might consider writing the values out to
a file as a memory dump, and mmaping this file when you load
the DLL. (Under windows, I think you could even put this second
file into a second stream of your DLL file, so you wouldn't have
to distribute a second file.)