Searching and Filtering Node.js array using C++ - c++

I have a node.js app with some arrays of 100K objects. Im doing .filter and .find and .include operations on the arrays but its taking a very long time to complete the operations, which is expected. Operations example is, I have two arrays with 100K items, i want to find the items that are in one array and not the other array, and then find the items where the id property is present in both arrays but other properties have changed. Im trying to think of ways to speed up the operations. Is it possible to pass these off to a C++ library within node? And if so, are there any libraries available to do so? Otherwise, anyone have any suggestions of speeding up performance on massive array operations in node?

Related

What is the best way to process large arrays without hitting memory requirements in C++?

I've got two arrays of strings, both with 250k+ items. When I tried to hardcode these into my C++ program, it got stuck in the compilation phase. I currently have both strings as CSV .txt files e.g., {..."fksdfjsa", "fsdajhfisa","wgferwjhgo"...}.
Should I save these as arrays in a different C++ program and try to import them, or should I somehow stream them as I iterate through the values? If so, how would I do that? For what it's worth, I intend to compare each element of the first array to each element of the second.
Just read the data from your CSV files at runtime. Learn about the <fstream> standard library header.
Read data using std::fstream and use std::vector instead creating array
I would certainly read the input from files. Among other things: you can start debugging/testing on smaller input files.
If you google for C++ CSV readers you can probably save yourself the time of debugging your own.
Finally: the goal of this exercise isn't described, but "...compare each element..." done naively is going to take a long time. If you are looking for matches: consider sorting the inputs first and running through the lists in parallel. This will bring it back to linear time.

Calender and to do task list using data structures

I am looking to make a calendar and task list for any corresponding day - just like most of the calendar apps. Main purpose of this practice is to use data structures effectively.
I have thought of two approaches:
Use array for calender days and then make a linked list of tasks for corresponding days.
Use linked lists for both.
Another question is: Can i use trees in the above scenario?
Maybe i am totally wrong. Kindly guide me through, i am keen to learn.
Note: i will be using c++ as my tool but not STL.
Regards
Your first approach is more efficient in terms of time complexity as the tasks associated with any day can be accessed in constant time in an array. But the downside is that it will use much more space than a linked list.
If you use a linked list for the calendar days, you can add a new node for each day one at a time, rather than all at once (like in arrays). So the final space usage will be the same in both cases .
As far as trees are concerned, you can use Map - like Associative containers, which are usually implemented as Self-balancing BST, thus giving you decent efficiency, in terms of both, time (logarithmic) and space/memory (Proportional to the number of days stored in the calendar, but no space wastage, unlike arrays). You can associate a date to a linked list of strings in this case.
If I were you, I'd be using map<date,vector<string> > though.

Search structure with history (persistence)

I need a map-like data structure (in C++) for storing pairs (Key,T) with the following functionality:
You can insert new elements (Key,T) into the current structure
You can search for elements based on Key in the current structure
You can make a "snapshot" of the current version of the structure
You can switch to one of the versions of the structures which you took the snapshot of and continue all operations from there
Completely remove one of the versions
What I don't need
Element removal from the structure
Merging of different versions of the structure into one
Iteration over all (or some of) elements currently stored in the structure
In other words, you have some search structure that you can build up, but at any point you can jump in history, and expand the earlier/different version of the structure in a different way. Later on you may jump between those different versions.
In my project, Key and T are likely to be integers or pointer values, but not strings.
The primary objective is to reduce the time complexity; space consumption is secondary (but should be reasonable as well). To clarify, for me log(N)+log(S) (where N-number of elements, S-number of snapshots) would be enough, although faster is better :)
I have some rough idea how to implement it --- for example: being the structure a binary search tree, the insertion of a new element can clone the path from the root to the insertion location, while keeping the rest of the tree intact. Switching tree versions would be equivalent to picking a different version of the root node, for which some changes are simply not visible.
However, to make this custom tree efficient (e.g. self-balancing) it will require some additional effort and careful coding. Of course I can do it myself but perhaps there are already existing libraries to do exactly that?
Also, there is probably a proper name for this kind of data structure that I simply don't know, making my Google searches (or SO searches) total failures...
Thank you for your help!
I think what you are looking for is an immutable map. Functional (or functionally inspired) programming languages (such as Haskell or Scala) have immutable versions of most of the containers you'd find in the STL. Operations such as insertion/removal etc. then return a copy of the map (preserving the original) with the copy containing your requested modification. A lot of work has gone into designing the datastructures so that the copies are able to point to as much of the original datastructure as possible to reduce time and memory complexity of each operation.
You can find a lot more details in a book such as this one: http://www.amazon.co.uk/Purely-Functional-Structures-Chris-Okasaki/dp/0521663504.
While searching for some persistent search trees libraries I stumbled on this:
http://cg.scs.carleton.ca/~dana/pbst/
While it does not have the exact same functionality as needed, it seems pretty close to it. I will investigate.
(posting here, as someone may find it useful as well)

Fibonacci Heaps without array indexing?

Friends, my professor covered Fibonacci heaps and gave a home work. The requirement is usually after a extract, we need to compress the root list by linking roots of same degree. We use array indexing to find another element of same degree. But now imagine that you don't have array indexing capabilities in your system. Implement extract using some data structures and additional pointers so that you can achieve same amortized time!!
I've broken my head over this but I'm not getting any ideas. Any clues or inputs???

Scalable stl set like container for C++

I need to store large number of integers. There can be
duplicates in the input stream of integers, I just need
to store distinct amongst them.
I was using stl set initially but It went OutOfMem when
input number of integers went too high.
I am looking for some C++ container library which would
allow me to store numbers with the said requirement possibly
backed by file i.e container should not try to keep all numbers in-mem.
I don't need to store this data persistently, I just need to find
unique values amongst it.
Take a look at the STXXL; might be what you're looking for.
Edit: I haven't used it myself, but from the docs - you could use stream::runs_creator to create sorted runs of your data (however much fits in memory), then stream::runs_merger to merge the sorted streams, and finally use stream::unique to filter uniques.
Since you need larger than RAM allows you might look at memcached
Have you considered using DB (maybe SQLite)? Or it would be too slow?
You should seriously at least try a database before concluding it is too slow. All you need is one of the lightweight key-value store ones. In the past I have used Berkeley DB, but here is a list of other ones.