choice of parameters for R* Tree using SpatialIndex library

choice of parameters for R* Tree using SpatialIndex library - c++

I am using the spatialindex library from http://libspatialindex.github.com/
I am creating an R* tree in the main memory:
size_t capacity = 10;
bool bWriteThrough = false;
fileInMem = StorageManager
::createNewRandomEvictionsBuffer(*memStorage, capacity, bWriteThrough);
double fillFactor = 0.7;
size_t indexCapacity = 10;
size_t leafCapacity = 10;
size_t dimension = 2;
RTree::RTreeVariant rv = RTree::RV_RSTAR;
tree = RTree::createNewRTree(*fileInMem, fillFactor, indexCapacity,
leafCapacity, dimension, rv, indexIdentifier);
Then I am inserting a large number of bounding boxes, currently some 2.5M (road network of Bavaria in Germany). Later I'll aim at inserting all roads of Europe.
What are good choice of parameters for the storage manager and rtree? Mostly I am using the rtree to find the closest roads to a given query (bbox intersection).

As your data is static, a good bulk load may work for you. The most popular (and a rather simple) bluk load is Sort-Tile-Recursive. However, it is somewhat designed around point data. As you are inserting spatial objects, it may or may not work as well.
If you are using a bulk load, it will no longer be an R*-tree, but a plain R-tree.
Capacity 10 sounds way too little to me. You want a much larger fan-out. But you'll need to benchmark, this is data set and query dependant what is good. I'd definitely try 100 or more.

Related

Generation and storage of all DES keys

I'm writing Data Encryption Standard "cracker" using C++ and CUDA. It was going to be simple brute-force - trying all possible keys to decrypt encrypted data and check if result is equal to initial plain-text message.
The problem is that generation of 2^56 keys takes time (and memory). My first approach was to generate keys recursively and save them to file.
Do you have any suggestions how to improve this?

You don' really need recursion, neither you need storing your keys.
All space of DES keys (if we don't count 12 or so weak keys, which won't change anything for your purposes) is a space of 56-bit-long numbers (which BTW fit into standard uint64_t), and you can just iterate through numbers from 0 to 2^56-1, feeding the next number as a 56-bit number to your CUDA core whenever the core reports that it is done with the previous key.
If not for cores, the code could look such as:
for(uint64_t i=0;i<0xFFFFFFFFFFFFFFULL /* double-check number of F's so the number is 2^56-1 */;++i) {
uint8_t key[7];
//below is endianness-agnostic conversion
key[0] = (uint8_t)i;
key[1] = (uint8_t)(i>>8);
key[2] = (uint8_t)(i>>16);
key[3] = (uint8_t)(i>>24);
key[4] = (uint8_t)(i>>32);
key[5] = (uint8_t)(i>>40);
key[6] = (uint8_t)(i>>48);
bool found = try_your_des_code(key,data_to_decrypt);
if(found) printf("Eureka!\n");
}
To allow restarting your program in case if anything goes wrong, you need to store (in persistent storage, such as file) only this number i (with cores, strictly speaking - the number i should be written to persistent storage only after all the numbers before it has already been processed by CUDA cores, but generally the difference of 2000 or so keys won't make any difference performance-wise).

Effective way of reducing data for real-time plot

I am developing scientific application in Windows Forms (VC++ 2010), which controls relatively new, electronic device. I control it by additional, wrapped library written in C. After initial setup of all parameters, this application triggers a measurement in the device. Then, it sends to my app a huge data of over 200k samples of int at significant rate – let’s assume it’s 50 datasets per second.
Now, I need to plot my data at the real-time pace using Windows Forms chart. It would be perfect to have 750 samples plotted inside chart at rate of about 30 FPS. The problem I encountered lies in the algorithm of reducing database in a fast way without losing reliability of plot.
My ideas (data is oscilating around value = 127):
Choose 750 points just by selecting every (200 000/ 750) th point
Group the data and calculate mean value
Group the data and select maximum or minimum (based on overall group placement – if most of them is above 127 – select minimum, else maximum).
Which one (if any) of those solution is the best considering I have to plot data at real-time speed and plot should not miss spots, where we have any significant signal (looking like a kind of narrowed, modulated sine wave)? Is there any better approach?
And the last question: should I consider using table of pointers to my huge data buffer or data copies as data for plot considering I always have the same buffer of collected data (device just overwrites this buffer constantly with new data)?
This is my first post, so please inform me if there will be anything wrong in the style of post.

I developed an application that reads data at 256Hz (256 samples / second) from 16 channels and displays it in 16 different charts. The best way of plotting all data in real time was using a separate thread to updoat the plots. Here is the solution (in c#) that might be useful for you too.
When new data is read, data is stored in a list or array. Since it is real-time data, the timestamps are also generated here. Using the sample rate of the data acquired: timeStamp = timeStamp + sampleIdx/sampleRate;
public void OnDataRead(object source, EEGEventArgs e)
{
if ((e.rawData.Length > 0) && (!_shouldStop))
{
lock (_bufferRawData)
{
for (int sampleIdx = 0; sampleIdx < e.rawData.Length; sampleIdx++)
{
// Append data
_bufferRawData.Add(e.rawData[sampleIdx]);
// Calculate corresponding timestamp
secondsToAdd = (float) sampleIdx/e.sampleRate;
// Append corresponding timestamp
_bufferXValues.Add( e.timeStamp.AddSeconds(secondsToAdd));
}
}
Then, create a thread that sleeps every N ms (100ms is suitable for me for a 2 seconds display of data, but if I wanna display 10 seconds, I need to increase to 500ms of sleep time for the thread)
//Create thread
//define a thread to add values into chart
ThreadStart addDataThreadObj = new ThreadStart(AddDataThreadLoop);
_addDataRunner = new Thread(addDataThreadObj);
addDataDel += new AddDataDelegate(AddData);
//Start thread
_addDataRunner.Start();
And finally, update the charts and make the thread sleep every N ms
private void AddDataThreadLoop()
{
while (!_shouldStop)
{
chChannels[1].Invoke(addDataDel);
// Sleeep thread for 100ms
Thread.Sleep(100);
}
}
Data will be added to the chart every 100ms
private void AddData()
{
// Copy data stored in lists to arrays
float[] rawData;
DateTime[] xValues;
if (_bufferRawData.Count > 0)
{
// Copy buffered data in thread-safe manner
lock (_bufferRawData)
{
rawData = _bufferRawData.ToArray();
_bufferRawData.Clear();
xValues = _bufferXValues.ToArray();
_bufferXValues.Clear();
}
for (int sampleIdx = 0; sampleIdx < rawData.Length; sampleIdx++)
{
foreach (Series ptSeries in chChannels[channelIdx].Series)
// Add new datapoint to the corresponding chart (x, y, chartIndex, seriesIndex)
AddNewPoint(xValues[sampleIdx], rawData[sampleIdx], ptSeries);
}
}
}

Concatenate data in an array in C ++

I'm working on software for processing audio in real time in C++ with Qt. I need that requirements are minimized.
Defining a temporary buffer 40ms, launching our device with a sampling frequency Fs = 8000Hz, every 320 samples entered a feature called Data Processing ().
The idea is to have a global buffer that stores the 10s last recorded, 80000 samples.
This Buffer in each iteration eliminates the initial 320 samples and looped at the end, 320 new samples. Thus the buffer is updated and the user can observe the real-time graphical representation of the recorded signal.
At first I thought of using QVector (equivalent to std::vector but for Qt) for this deployment, thus we reduce the process a few lines of code
int NUM_POINTS=320;
DatosTemporales.erase(DatosTemporales.begin(),DatosTemporales.begin()+NUM_POINTS);
DatosTemporales+= (DatosNuevos); // Datos Nuevos con un tamaño de NUM_POINTS
In each iteration we create a vector of 80000 samples in addition to free some positions so requires some processing time. An alternative for opting was the use of * double, and iterations a loop:
for(int i=0;i<80000;i++){
if(i<80000-NUM_POINTS){
aux=DatosTemporales[i];
DatosTemporales[i+NUM_POINTS]=aux;
}else{
DatosTemporales[i]=DatosNuevos[i-NUN_POINTS];
}
}
Does fails. I think the best way is to use dynamic memory. Implementing this process by pointers. Could anyone give me some idea how to implement it?

It sounds like what you are looking for is a circular buffer.
https://www.google.com/search?q=qcircularbuffer
https://qt.gitorious.org/qt/qtbase/merge_requests/60
And it looks like you only need the header file and you should be good to go.
A similar tool that is already in the Qt data set is found here:
http://doc.qt.io/qt-5/qcontiguouscache.html#details
The advantage of using a system like these presented, is that they don't need to have dynamic memory, it just needs to move the head and the tail pointers.
Hope that helps.

How do I filter out out-of-hearing-range data from PCM samples using C++?

I have raw 16bit 48khz pcm data. I need to strip all data which is out of the range of human hearing.
For now I'm just doing a sum of all samples and then dividing by the sample count to calculate peak sound level, but I need to reduce false positives.
I have big peak level all the time, speaking and other sounds which I can hear increasing levels just a little, so I need to implement some filtering. I am not familiar with sound processing at all, so currently I am not using any filtering because I do not understand how to create it. My current code looks like this:
for(size_t i = 0; i < buffer.size(); i++)
level += abs(buffer[i]);
level /= buffer.size();
How can I implement this kind of filtering using C++?

Use a band pass filter.
A band-pass filter is a device that passes frequencies within a
certain range and rejects (attenuates) frequencies outside that range.
This sounds like exactly the sort of filter you are looking for.
I had a quick google search and found this thread that discusses implementation in C++.

It sounds like you want to do something (maybe start recording) if the sound level goes above a certain threshold. This is sometimes called a "gate". It also sounds like you are having trouble with false positives. This is sometimes handled with a "side-chain" applied to the gate.
The general principle of a gate is create an envelope of your signal, and then monitor the envelope to discover when it goes above a certain threshold. If it is above the threshold, your gate is "on", if not, your gate is "off". If you treat your signal before creating the envelope in some way to make it more or less sensitive to various parts of your signal/noise the treatment is called a "side-chain".
You will have to discover the details on your own because there is too much for a Q&A website, but maybe this is enough of a start:
float[] buffer; //defined elsewhere
float HOLD = .9999 ; //there are precise ways to compute this, but experimentation might work fine
float THRESH = .7 ; //or whatever
float env = 0; //we initialize to 0, but in real code be sure to save this between runs
for(size_t i = 0; i < buffer.size(); i++) {
// side-chain, if used, goes here
float b = buffer[i];
// create envelope:
float tmp = abs(b); // you could also do buffer[i] * buffer[i]
env = env * HOLD + tmp * (1-HOLD);
// threshold detection
if( env > THRESH ) {
//gate is "on"
} else {
//gate is "off"
}
}
The side-chain might consist of filters like an eq. Here is a tutorial on designing audio eq: http://blog.bjornroche.com/2012/08/basic-audio-eqs.html

Selecting nodes with probability proportional to trust

Does anyone know of an algorithm or data structure relating to selecting items, with a probability of them being selected proportional to some attached value? In other words: http://en.wikipedia.org/wiki/Sampling_%28statistics%29#Probability_proportional_to_size_sampling
The context here is a decentralized reputation system and the attached value is therefore the value of trust one user has in another. In this system all nodes either start as friends which are completely trusted or unknowns which are completely untrusted. This isn't useful by itself in a large P2P network because there will be many more nodes than you have friends and you need to know who to trust in the large group of users that aren't your direct friends, so I've implemented a dynamic trust system in which unknowns can gain trust via friend-of-a-friend relationships.
Every so often each user will select a fixed number (for the sake of speed and bandwidth) of target nodes to recalculate their trust based on how much another selected fixed number of intermediate nodes trust them. The probability of selecting a target node for recalculation will be inversely proportional to its current trust so that unknowns have a good chance of becoming better known. The intermediate nodes will be selected in the same way, except that the probability of selection of an intermediary is proportional to its current trust.
I've written up a simple solution myself but it is rather slow and I'd like to find a C++ library to handle this aspect for me. I have of course done my own search and I managed to find TRSL which I'm digging through right now. Since it seems like a fairly simple and perhaps common problem, I would expect there to be many more C++ libraries I could use for this, so I'm asking this question in the hope that someone here can shed some light on this.

This is what I'd do:
int select(double *weights, int n) {
// This step only necessary if weights can be arbitrary
// (we know total = 1.0 for probabilities)
double total = 0;
for (int i = 0; i < n; ++i) {
total += weights[i];
}
// Cast RAND_MAX to avoid overflow
double r = (double) rand() * total / ((double) RAND_MAX + 1);
total = 0;
for (int i = 0; i < n; ++i) {
// Guaranteed to fire before loop exit
if (total <= r && total + weights[i] > r) {
return i;
}
total += weights[i];
}
}
You can of course repeat the second loop as many times as you want, choosing a new r each time, to generate multiple samples.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js