Performance of v8 array creation in Nodejs

Performance of v8 array creation in Nodejs - c++

I am trying to port a JS algorithm to C++, to see if I can improve the perfs, but I'm facing a huge performance bottleneck on populating v8 arrays.
Here is a snippet that reproduce just the array populating. I create an array of 800k items, each item being an array of 17 numbers. This algo takes 3secs to execute on my machine, which is quite huge.
Is there anyway to speed it up?
#include <node.h>
namespace demo {
using namespace v8; // just for lisibility of the example
void Method(const FunctionCallbackInfo<Value>& args) {
Isolate* isolate = args.GetIsolate();
Local<Array> array = Array::New(isolate, 800000);
for (int i = 0; i < 800000; ++i) {
Local<Array> line = Array::New(isolate, 17);
for (int j = 0; j < 17; ++j) {
line->Set(j, Number::New(isolate, i * 100 + j));
}
array->Set(i, line);
}
args.GetReturnValue().Set(array);
}
void Init(Local<Object> exports) {
NODE_SET_METHOD(exports, "hello", Method);
}
NODE_MODULE(parser, Init)
}

Creating JS objects (and interacting with them) from C++ is more expensive than doing it from JS. This can easily offset performance gains from the rest of the C++ code.
You can work around this by communicating via a Buffer (the serialization overhead will typically be lower than the above). More importantly, this will also let you do the work off the main v8 thread.
If you're only dealing with numbers, this should be relatively straightforward using Buffer.readIntLE (or similar methods). You could also encode the array's length into the first few bytes of the buffer. Here's what the JS side of things could look like:
var buf = new Buffer(/* Large enough to contain your serialized data. */);
// Function defined in your C++ addon.
addon.populate(buf, function (err) {
if (err) {
// Handle C++ error.
return;
}
// At this point, `buf` contains the serialized data. Deserialization
// will depend on the chosen serialization format but a reasonable
// option could be the following:
var arr = [];
var pos = 4;
var size = buf.readInt32LE(0);
while (size--) {
var subarr = new Array(17);
for (var i = 0; i < 17; i++) {
subarr[i] = buf.readInt32LE(pos);
pos += 4;
}
arr.push(subarr);
}
// `arr` now contains your decoded data.
});
The C++ part of the code would keep a reference to buf's data (a char *) and populate it inside a worker thread (see nan's AsyncWorker for a convenient helper).

As mtth said, working with JS arrays in C++ is expensive. Using a buffer would work, but you can also use TypedArrays. These are accessible from C++ as pointers to contiguous, aligned blocks of memory, which makes them easy to work with and fast to iterate over.
See https://stackoverflow.com/a/31712512/1218408 for some info on how to access their contents.

Related

Creating ArrayBuilders in a Loop

Is there any way to create a dynamic container of arrow::ArrayBuilder objects? Here is an example
int main(int argc, char** argv) {
std::size_t rowCount = 5;
arrow::MemoryPool* pool = arrow::default_memory_pool();
std::vector<arrow::Int64Builder> builders;
for (std::size_t i = 0; i < 2; i++) {
arrow::Int64Builder tmp(pool);
tmp.Reserve(rowCount);
builders.push_back(tmp);
}
return 0;
}
This yields error: variable ‘arrow::Int64Builder tmp’ has initializer but incomplete type
I am ideally trying to build a collection that will hold various builders and construct a table from row-wise data I am receiving. My guess is that this isn't the intended use for builders, but I couldn't find anything definitive in the Arrow documentation

What do your includes look like? That error message seems to suggest you are not including the right files. The full definition for arrow:Int64Builder is in arrow/array/builder_primitive.h but you can usually just include arrow/api.h to get everything.
The following compiles for me:
#include <iostream>
#include <arrow/api.h>
arrow::Status Main() {
std::size_t rowCount = 5;
arrow::MemoryPool* pool = arrow::default_memory_pool();
std::vector<arrow::Int64Builder> builders;
for (std::size_t i = 0; i < 2; i++) {
arrow::Int64Builder tmp(pool);
ARROW_RETURN_NOT_OK(tmp.Reserve(rowCount));
builders.push_back(std::move(tmp));
}
return arrow::Status::OK();
}
int main() {
auto status = Main();
if (!status.ok()) {
std::cerr << "Err: " << status << std::endl;
return 1;
}
return 0;
}
One small change to your example is that builders don't have a copy constructor / can't be copied. So I had to std::move it into the vector.
Also, if you want a single collection with many different types of builders then you probably want std::vector<std::unique_ptr<arrow::ArrayBuilder>> and you'll need to construct your builders on the heap.
One challenge you may run into is the fact that the builders all have different signatures for the Append method (e.g. the Int64Builder has Append(long) but the StringBuilder has Append(arrow::util::string_view)). As a result arrow::ArrayBuilder doesn't really have any Append methods (there are a few which take scalars, if you happen to already have your data as an Arrow C++ scalar). However, you can probably overcome this by casting to the appropriate type when you need to append.
Update:
If you really want to avoid casting and you know the schema ahead of time you could maybe do something along the lines of...
std::vector<std::function<arrow::Status(const Row&)>> append_funcs;
std::vector<std::shared_ptr<arrow::ArrayBuilder>> builders;
for (std::size_t i = 0; i < schema.fields().size(); i++) {
const auto& field = schema.fields()[i];
if (isInt32(field)) {
auto int_builder = std::make_shared<Int32Builder>();
append_funcs.push_back([int_builder] (const Row& row) ({
int val = row.GetCell<int>(i);
return int_builder->Append(val);
});
builders.push_back(std::move(int_builder));
} else if {
// Other types go here
}
}
// Later
for (const auto& row : rows) {
for (const auto& append_func : append_funcs) {
ARROW_RETURN_NOT_OK(append_func(row));
}
}
Note: I made up Row because I have no idea what format your data is in originally. Also I made up isInt32 because I don't recall how to check that off the top of my head.
This uses shared_ptr instead of unique_ptr because you need two copies, one in the capture of the lambda and the other in the builders array.

dart/flutter: getting data array from C/C++ using ffi?

The official flutter tutorial on C/C++ interop through ffi only touches on calling a C++ function and getting a single return value.
Goal
What if I have a data buffer created on C/C++ side, but want to deliver to dart/flutter-side to show?
Problem
With #MilesBudnek 's tip, I'm testing Dart's FFI by trying to have safe memory deallocation from Dart to C/C++. The test reuses the official struct sample .
I could get the Array as a dart Pointer, but it's unclear to me how to iterate the array as a collection easily.
Code
I'm implementing a Dart-side C array binding like this:
In struct.h
struct Array
{
int* array;
int len;
};
and a pair of simple allocation/deallocation test functions:
struct Array* get_array();
int del_array(struct Array* arr);
Then on Dart side in structs.dart:
typedef get_array_func = Pointer<Array> Function();
typedef del_array_func = void Function(int arrAddress);
...
final getArrayPointer = dylib.lookup<NativeFunction<get_array_func>>('get_array');
final getArray = getArrayPointer.asFunction<get_array_func>();
final arrayPointer = getArray();
final array = arrayPointer.ref.array;
print('array.array: $array');
This gives me the print out
array.array: Pointer<Int32>: address=0x7fb0a5900000
Question
Can I convert the array pointer to a List easily? Something like:
final array = arrayPointer.ref.array.toList();
array.forEach(index, elem) => print("array[$idx]: $elem");
======
Old Question (you can skip this)
Problem
It's unclear to me how to retrieve this kind of vector data from C/C++ by dart/flutter.
Possible solutions
More importantly, how to push data from C++ side from various threads?
If there is no builtin support, off the top of my head I'd need to implement some communication schemes.
Option #1: Networking
I could do network through TCP sockets. But I'm reluctant to go there if there are easier solutions.
Option #2: file I/O
Write data to file with C/C++, and let dart/flutter poll on the file and stream data over. This is not realtime friendly.
So, are there better options?

Solved it.
According to this issue, the API asTypedList is the way to go.
Here is the code that works for me
final getArrayPointer = dylib.lookup<NativeFunction<get_array_func>>('get_array');
final getArray = getArrayPointer.asFunction<get_array_func>();
final arrayPointer = getArray();
final arr = arrayPointer.ref.arr;
print('array.array: $arr');
final arrReal = arr.asTypedList(10);
final arrType = arrReal.runtimeType;
print('arrReal: $arrReal, $arrType');
arrReal.forEach((elem) => print("array: $elem"));
This gives me:
array.array: Pointer<Int32>: address=0x7f9eebb02870
arrReal: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9], Int32List
array: 0
array: 1
array: 2
array: 3
array: 4
array: 5
array: 6
array: 7
array: 8
array: 9

asTypedList will only work with pointers that relate to TypedData.
there are other cases where, for example, you want to convert an Pointer<UnsignedChar> to a Uint8List, in this case you can:
use an extension and then either cast the Pointer<UnsignedChar to a Pointer<Uint8> and then use asTypedList. In this case you have to make sure the pointer is not freed while the Uint8List is still referenced.
extension UnsignedCharPointerExtension on Pointer<UnsignedChar> {
Uint8List? toUint8List(int length) {
if (this == nullptr) {
return null;
}
return cast<Uint8>().asTypedList(length);
}
}
use an extension and don't cast the pointer but copy it manually. In this case you can free the pointer after you get the Uint8List
extension UnsignedCharPointerExtension on Pointer<UnsignedChar> {
Uint8List? toUint8List(int length) {
if (this == nullptr) {
return null;
}
final Uint8List list = Uint8List(length);
for (int i = 0; i < length; i++) {
list[i] = this[i];
}
return list;
}
}

Is access by pointer so expensive?

I've a Process() function that is called very heavy within my DLL (VST plugin) loaded in a DAW (Host software), such as:
for (int i = 0; i < nFrames; i++) {
// ...
for (int voiceIndex = 0; voiceIndex < PLUG_VOICES_BUFFER_SIZE; voiceIndex++) {
Voice &voice = pVoiceManager->mVoices[voiceIndex];
if (voice.mIsPlaying) {
for (int envelopeIndex = 0; envelopeIndex < ENVELOPES_CONTAINER_NUM_ENVELOPE_MANAGER; envelopeIndex++) {
Envelope &envelope = pEnvelopeManager[envelopeIndex]->mEnvelope;
envelope.Process(voice);
}
}
}
}
void Envelope::Process(Voice &voice) {
if (mIsEnabled) {
// update value
mValue[voice.mIndex] = (mBlockStartAmp[voice.mIndex] + (mBlockStep[voice.mIndex] * mBlockFraction[voice.mIndex]));
}
else {
mValue[voice.mIndex] = 0.0;
}
}
It basically takes 2% of CPU within the Host (which is nice).
Now, if I slightly change the code to this (which basically are increments and assignment):
void Envelope::Process(Voice &voice) {
if (mIsEnabled) {
// update value
mValue[voice.mIndex] = (mBlockStartAmp[voice.mIndex] + (mBlockStep[voice.mIndex] * mBlockFraction[voice.mIndex]));
// next phase
mBlockStep[voice.mIndex] += mRate;
mStep[voice.mIndex] += mRate;
}
else {
mValue[voice.mIndex] = 0.0;
}
// connectors
mOutputConnector_CV.mPolyValue[voice.mIndex] = mValue[voice.mIndex];
}
CPU go to 6/7% (note, those var don't interact with other part of codes, or at least I think so).
The only reason I can think is that access to pointer is heavy? How can I reduce this amount of CPU?
Those arrays are basic double "pointer" arrays (the most lighter C++ container):
double mValue[PLUG_VOICES_BUFFER_SIZE];
double mBlockStartAmp[PLUG_VOICES_BUFFER_SIZE];
double mBlockFraction[PLUG_VOICES_BUFFER_SIZE];
double mBlockStep[PLUG_VOICES_BUFFER_SIZE];
double mStep[PLUG_VOICES_BUFFER_SIZE];
OutputConnector mOutputConnector_CV;
Any suggestions?

You might be thinking that "pointer arrays" are the lightest containers. but CPU's don't think in terms of containers. They just read and write values through pointers.
The problem here might very well be that you know that two containers do not overlap (there are no "sub-containers"). But the CPU might not be told that by the compiler. Writing to mBlockStep might affect mBlockFraction. The compiler doesn't have run-time values, so it needs to handle the case where it does. This will mean introducing more memory reads, and less caching of values in registers.

Pack all the data items in a structure and create an array of structure. I would simply use a vector.
In Process function get the single element out of this vector, and use its parameters. At the cache-line/instruction level, all items would be (efficiently) brought into local cache (L1), as the data element (members of struct) as contiguous. Use reference or pointer of struct type to avoid copying.
Try to use integer data-types unless double is needed.
EDIT:
struct VoiceInfo
{
double mValue;
...
};
VoiceInfo voices[PLUG_VOICES_BUFFER_SIZE];
// Or vector<VoiceInfo> voices;
...
void Envelope::Process(Voice &voice)
{
// Get the object (by ref/pointer)
VoiceInfo& info = voices[voice.mIndex];
// Work with reference 'info'
...
}

How to generate a hashmap for huge chunk of data?

I want to make a map such that a set of pointers point to arrays of dynamic size.
I did use hashing with chaining. But since data I am using it for is huge, the program give std::bad_alloc after few iterations. The reason of which may be new used to generate the linked list.
Someone please suggest which data structure shall I use?
Or anything else that can improve memory usage with my hash table?
Program is in C++.
This is what my code looks like:
Initialization of hashtable:
class Link
{
public:
double iData;
Link* pNext;
Link(double it) : iData(it)
{ }
void displayLink()
{ cout << iData << " "; }
};
class List
{
private:
Link* pFirst;
public:
List()
{ pFirst = NULL; }
void insert(double key)
{
if(pFirst==NULL)
pFirst = new Link(key);
else
{
Link* pLink = new Link(key);
pLink->pNext = pFirst;
pFirst = pLink;
}
}
};
class HashTable
{
public:
int arraySize;
vector<List*> hashArray;
HashTable(int size)
{
hashArray.resize(size);
for(int j=0; j<size; j++)
hashArray[j] = new List;
}
};
main snippet:
int t_sample = 1000;
for(int i=0; i < k; i++) // initialize random position
{
x[i] = (cal_rand() * dom_sizex); //dom_sizex = 20e-10 cal_rand() generates rand no between 0 and 1
y[i] = (cal_rand() * dom_sizey); //dom_sizey = 10e-10
}
for(int t=0; t < t_sample; t++)
{
int size;
size = cell_nox * cell_noy; //size of hash table cell_nox = 212, cell_noy = 424
HashTable theHashTable(size); //make table
int hashValue = 0;
for(int n=0; n<k; n++) // k = 10*212*424
{
int m = x[n] /cell_width; //cell_width = 4.7e-8
int l = y[n] / cell_width;
hashValue = (kx*l)+m;
theHashTable.hashArray[hashValue]->insert(n);
}
-------
-------
}

First things first, use a Standard Container. In your specific case, you might want:
either std::unordered_multimap<int, double>
or std::unordered_map<int, std::vector<double>>
(Note: if you do not have C++11, those are available in Boost)
Your main loop becomes (using the second option):
typedef std::unordered_map<int, std::vector<double>> HashTable;
for(int t = 0; t < t_sample; ++t)
{
size_t const size = cell_nox * cell_noy;
// size of hash table cell_nox = 212, cell_noy = 424
HashTable theHashTable;
theHashTable.reserve(size);
for (int n = 0; n < k; ++n) // k = 10*212*424
{
int m = x[n] / cell_width; //cell_width = 4.7e-8
int l = y[n] / cell_width;
int const cellId = (kx*l)+m;
theHashTable[cellId].push_back(n);
}
}
This will not leak memory (reliably), although of course you might have other leaks, and thus will give you a reliable baseline. It is also probably faster than your approach, with a more convenient interface, etc...
In general you should not re-invent the wheel, unless you have a specific need that is not addressed by the available wheels or you are actually trying to learn how to create a wheel or to create a better wheel.

The OS has to solve the same issues with the memory pages, maybe it's worth looking at how that is done? First of all, let's assume all pages are on the disk. A page is a fixed size memory chunk. For your use case, let's say it's an array of your records. Because RAM is limited, the OS maintains a mapping between the page number and it's location in RAM.
So, let's say your pages have 1000 records, and you want to access record 2024, you would ask the OS for page 2, and read record 24 from that page. That way, your map is only 1/1000 in size.
Now, if your page has no mapping to a memory location, then it is either on disk or has never been accessed before (is empty). Then you need to swap out another page, and load that page from disk (and update the location mapping).
This is a very simplified description of what happens and i wouldn't be surprised if someone jumps me in the neck for describing it like this.
The point is:
What does this mean for you?
First of all, your data exceeds your RAM - you won't get around writing to disk, if you don't want to try compression first.
Second, your chains can work as pages if you want, but i wonder whether just paging your hashcode would work better. What i mean is, use the upper bits as page number, and the lower bits as offset in the page. Avoiding collisions is still key, as you want to load the least pages possible. You can still chain your pages, and end up with a much smaller map.
Second - a crucial part is deciding which pages to swap out to make room for the new pages. LRU should do ok. If you can better predict which pages you will (not) need, so much better for you.
Third - you need placeholders for your pages to tell you whether they are in-memory or on disk.
Hope this helps.

Trying to fill a 2d array of structures in C++

As above, I'm trying to create and then fill an array of structures with some starting data to then write to/read from.
I'm still writing the cache simulator as per my previous question:
Any way to get rid of the null character at the end of an istream get?
Here's how I'm making the array:
struct cacheline
{
string data;
string tag;
bool valid;
bool dirty;
};
cacheline **AllocateDynamicArray( int nRows, int nCols)
{
cacheline **dynamicArray;
dynamicArray = new cacheline*[nRows];
for( int i = 0 ; i < nRows ; i++ )
dynamicArray[i] = new cacheline [nCols];
return dynamicArray;
}
I'm calling this from main:
cacheline **cache = AllocateDynamicArray(nooflines,noofways);
It seems to create the array ok, but when I try to fill it I get memory errors, here's how I'm trying to do it:
int fillcache(cacheline **cache, int cachesize, int cachelinelength, int ways)
{
for (int j = 0; j < ways; j++)
{
for (int i = 0; i < cachesize/(cachelinelength*4); i++)
{
cache[i][ways].data = "EMPTY";
cache[i][ways].tag = "";
cache[i][ways].valid = 0;
cache[i][ways].dirty = 0;
}
}
return(1);
}
Calling it with:
fillcache(cache, cachesize, cachelinelength, noofways);
Now, this is the first time I've really tried to use dynamic arrays, so it's entirely possible I'm doing that completely wrong, let alone when trying to make it 2d, any ideas would be greatly appreciated :)
Also, is there an easier way to do write to/read from the array? At the moment (I think) I'm having to pass lots of variables to and from functions, including the array (or a pointer to the array?) each time which doesn't seem efficient?
Something else I'm unsure of, when I pass the array (pointer?) and edit the array, when I go back out of the function, will the array still be edited?
Thanks
Edit:
Just noticed a monumentally stupid error, it should ofcourse be:
cache[i][j].data = "EMPTY";

You should find your happiness. You just need the time to check it out (:
The way to happiness

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js