I've written the following algorithm designed to solve the Time Delay of Arrival problem through a "brute force" method. The problem is as follows: given the known locations of three receivers in a plane, and the propagating speed of some signal, determine the location of the signal source knowing only the times at which each receiver "saw" the signal.
The algorithm works by assuming the source to be within a 1000 x 1000 kilometre square area, and then iterating (with 1 km "resolution") over every possible location, calculating the time to travel to each receiver and determining which location matches most closely with the known delay of arrival between each receiver (so, for each location [x,y], I calculate the time to arrive to receivers 1, 2, and 3, then determine how close time to arrive at 1 - time to arrive at 2 is to the live data, and similarly for the combinations 1 - 3 and 2 - 3 (ignoring other possible combinations, for simplicity).
Here's the problem: it's highly unlikely each signal event is coming from the same direction. However, my code seems to suggest every event is at [0,0]. While it is technically possible that this is the case, it is far more likely that there is something wrong with my code, so for the purposes of this question let's assume that to be the case. Perhaps I've made some obvious mistake here?
#define c 299792
#define statn1x 3.00000
#define statn1y 3.60000
#define statn2x 2.10000
#define statn2y 2.10000
#define statn3x 0.96000
#define statn3y 3.60000
void findProb(double alpha, double gamma, double beta){
int x,y;
double thld = DBL_MAX;
for(int i = 0; i < 1000; ++i){
for(int j = 0; j < 1000; ++j){
double alphaEst = sqrt(pow(i-statn1x,2) + pow(j-statn1y,2)) / c;
double betaEst = sqrt(pow(i-statn1x,2) + pow(j-statn1y,2)) / c;
double gammaEst = sqrt(pow(i-statn1x,2) + pow(j-statn1y,2)) / c;
std::cout << i << "," << j << "\n";
if( std::max(std::max(fabs((alphaEst-betaEst) - alpha), fabs((alphaEst-gammaEst)-beta) ), fabs((betaEst-gammaEst)-gamma)) -
std::min(std::min(fabs((alphaEst-betaEst) - alpha), fabs((alphaEst-gammaEst)-beta) ), fabs((betaEst-gammaEst)-gamma)) < thld){
thld = std::max(std::max(fabs((alphaEst-betaEst) - alpha), fabs((alphaEst-gammaEst)-beta) ), fabs((betaEst-gammaEst)-gamma)) -
std::min(std::min(fabs((alphaEst-betaEst) - alpha), fabs((alphaEst-gammaEst)-beta) ), fabs((betaEst-gammaEst)-gamma));
x = i;
y = j;
}
}
}
//std::cout << x << "," << y << "\n";
}
void localize(){
ROOT::RDataFrame tdoa("D","./coincidences.root");
vector<double> alpha, beta, gamma;
tdoa.Foreach([&](double delay){ alpha.push_back(delay); },{"x"});
tdoa.Foreach([&](double delay){ beta.push_back(delay); },{"y"});
tdoa.Foreach([&](double delay){ gamma.push_back(delay); },{"z"});
int iter = std::min(std::min(alpha.size(), beta.size()), gamma.size());
for(int i = 0; i < iter; ++i){
findProb(alpha[i], beta[i], gamma[i]);
}
}
Note the terminology used in this project:
alpha refers to time to arrive (ttoa) at station 1 - ttoa station 2.
gamma refers to ttoa station 1 - ttoa station 3
beta refers to ttoa station 2 - ttoa station 3
alphaEst refers to the calculated estimated travel time from some location [x,y] to station 1
betaEst refers to the calculated estimated travel time from some [x,y] to station 2
... and likewise for gammaEst.
Note that I'm also working now to produce a smaller, reproducible example (if possible). I'll add that as soon as I can.
alphaEst, betaEst and gammaEst are assigned the same values.
betaEst = sqrt(pow(i-statn1x,2) + pow(j-statn1y,2)) / c
probably it should be
betaEst = sqrt(pow(i-statn2x,2) + pow(j-statn2y,2)) / c
Related
Question:
Fox Ciel is writing an AI for the game Starcraft and she needs your help.
In Starcraft, one of the available units is a mutalisk. Mutalisks are very useful for harassing Terran bases. Fox Ciel has one mutalisk. The enemy base contains one or more Space Construction Vehicles (SCVs). Each SCV has some amount of hit points.
When the mutalisk attacks, it can target up to three different SCVs.
The first targeted SCV will lose 9 hit points.
The second targeted SCV (if any) will lose 3 hit points.
The third targeted SCV (if any) will lose 1 hit point.
If the hit points of a SCV drop to 0 or lower, the SCV is destroyed. Note that you may not target the same SCV twice in the same attack.
You are given a int[] HP containing the current hit points of your enemy's SCVs. Return the smallest number of attacks in which you can destroy all these SCVs.
Constraints-
- x will contain between 1 and 3 elements, inclusive.
- Each element in x will be between 1 and 60, inclusive.
And the solution is:
int minimalAttacks(vector<int> x)
{
int dist[61][61][61];
memset(dist, -1, sizeof(dist));
dist[0][0][0] = 0;
for (int total = 1; total <= 180; total++) {
for (int i = 0; i <= 60 && i <= total; i++) {
for (int j = max(0, total - i - 60); j <= 60 && i + j <= total; j++) {
// j >= max(0, total - i - 60) ensures that k <= 60
int k = total - (i + j);
int & res = dist[i][j][k];
res = 1000000;
// one way to avoid doing repetitive work in enumerating
// all options is to use c++'s next_permutation,
// we first createa vector:
vector<int> curr = {i,j,k};
sort(curr.begin(), curr.end()); //needs to be sorted
// which will be permuted
do {
int ni = max(0, curr[0] - 9);
int nj = max(0, curr[1] - 3);
int nk = max(0, curr[2] - 1);
res = std::min(res, 1 + dist[ni][nj][nk] );
} while (next_permutation(curr.begin(), curr.end()) );
}
}
}
// get the case's respective hitpoints:
while (x.size() < 3) {
x.push_back(0); // add zeros for missing SCVs
}
int a = x[0], b = x[1], c = x[2];
return dist[a][b][c];
}
As far as i understand, this solution calculates all possible state's best outcome first then simply match the queried position and displays the result. But I dont understand the way this code is written. I can see that nowhere dist[i][j][k] value is edited. By default its -1. So how come when i query any dist[i][j][k] I get a different value?.
Can someone explain me the code please?
Thank you!
so i have an array [nm] and i need to code in c++ the Euclidean distance between each row and the other rows in the array and store it in a new distance-array [nn] which every cell's value is the distance between the intersected rows.
distance-array:
r0 r1 .....rn
r0 0
r1 0
. 0
. 0
rn 0
the Euclidean distance between tow rows or tow records is:
assume we have these tow records:
r0: 1 8 7
r1: 2 5 3
r2
.
.
rn
Euclidean distance between r0 and r1 = sqrt((1-2)^2+(8-5)^2+(7-3)^2)
to code this i used 4 loops(which i think is too much) but i couldn't do it right, can someone help me to code this without using 3-D array ??
this is my code:
int norarr1[row][column] = { 1,1,1,2,2,2,3,3,3 };
int i = 0; int j = 0; int k = 0; int l = 0;
for (i = 0; i < column; i++){
for(j = 0; j < column; j++){
sumd = 0;
for (k = 0; k < row; k++) {
for (l = 0; l < row; l++) {
dist = sqrt((norarr1[i][k] - norarr1[j][l]) ^ 2);
sumd = sumd + dist;
cout << "sumd =" << sumd << " ";
}
cout << endl;
}
disarr[j][i] = sumd;
disarr[i][j] = sumd;
cout << disarr[i][j];
}
cout << endl;
}
There are several problems with your code. For now, let's ignore the for loops. We'll get to that later.
The first thing is that ^ is the bitwise exclusive or (XOR) operator. It does not do exponentiation like in some other languages. Instead, you need to use std::pow().
Second, you are summing square roots, which is not the correct way to calculate Euclidean distance. Instead, you need to calculate a sum and then take the square root.
Now let's think about the for loops. Assume that you already know which two rows you want to calculate the distance between. Call these r1 and r2. Now you just need to pair one coordinate from r1 with one coordinate from r2. Note that these coordinates will always be in the same column. This means that you only need one loop to calculate the squares of the differences of each pair of coordinates. Then you sum these squares. Finally after this single loop you take the square root.
With that out of the way, we need to iterate over the rows to choose each r1 and r2. Okay, this will take two loops since we want each of these to take on the value of each row.
In total, we will need three for loops. You can make this easier to understand by designing your code well. For example, you can create a class or struct that holds each row. If you know that every row is only three dimensions, then create a point or vector3 class. Now you can write a function which calculates the distance between two points. Finally, store the list of points as a 1D array. In fact, breaking up the data and calculation in this way makes the previous discussion about calculating the distance even easier to understand.
I would like to create my own nonlinear filter in OpenCV using C++, and if I see it correctly, I can use the FilterEngine class to do so. Unfortunately, I'm not really able to follow the documentation of this class. (Link: http://docs.opencv.org/2.4/modules/imgproc/doc/filtering.html#filterengine).
Could someone be so kind to explain the class to me in a little bit more detail?
I'm grateful for every input and every example you can provide me with :-)
.
My specific needs:
1) I would like learn how to create my own nonlinear filters in general.
2) I would like to apply a rank-transform filter to my images:
Meaning: I have a kernel/region and I would like to flag every pixel inside that region with a one if the intensity-value of that (neighbourhood-) pixel is lower than the intensity of the center-pixel. Next, I want to use a simple convolution to save the sum of the transformed region, and store the value at the center-pixel. Let's look at a simple example:
100 120 200 rank-trans. 1 0 0 convolution
110 120 220 --> 1 0 0 --> 2
180 200 200 0 0 0
P.S: I know that I can archive the result of 2) by combining 255 threshold-operations with 255 box-filter operations, and then looping over every pixel and selecting the correct value. However, that seems quite inefficient to me ...
.
Code-Snipped [Edit]:
As I still struggle to understand the FilterEngine(), I started to write my own function for the above-descripted usecase. I would also be happy if you could comment on it to improve its efficiency, as it is quite slow at the moment. (~2sec. for a 1080x1920 image on one CPU-core).
void rankTransform(Mat& out, Mat in, int kernal_size, int borderType) {
// Issue warning if neccessary:
if (kernal_size >= 17) {
std::cout << "Warning, need to change Mat-type. Unsigned short only supports kernels up-to the size of 15x15" << std::endl << std::endl;
};
// First: Get borders around the image:
int border_size = (kernal_size - 1) / 2;
Mat in_incl_border = Mat(1080 + 2 * border_size, 1920 + 2 * border_size, in.depth());
copyMakeBorder(in, in_incl_border, border_size, border_size, border_size, border_size, borderType);
// Second: Loop through the image, conduct a rank transform and
// then sum over the kernel-size:
int start_pixel = 0 + (border_size + 1);
int end_pixel_width = 1920 + border_size;
int end_pixel_height = 1080 + border_size;
int i, j;
int x_1, x_2, y_1;
for (i = start_pixel; i < end_pixel_height; ++i) {
x_1 = i - border_size;
x_2 = i + border_size + 1;
for (j = start_pixel; j < end_pixel_width; ++j) {
y_1 = j - border_size;
out.at<unsigned short>(x_1-1, y_1-1) = static_cast<unsigned short>( (sum( in_incl_border(Range(x_1, x_2), Range(y_1, j + border_size + 1)) < in_incl_border.at<unsigned short>(i, j) )[0])/255 );
};
};
I am developing a small trading robot as an exercise. He receives stock prices day after day (represented as iterations).
Here's what my Trade class looks like:
class Trade
{
private:
int capital_;
int days_; // Total number of days of available stock prices
int daysInTrading_; // Increments as days go by.
std::list<int> stockPrices_; // Contains stock prices day after day.
int currentStock_; // Current stock we are dealing with.
int lastStock_; // Last stock dealt with
int trend_; // Either {-1; 0; 1} depending on the trend.
int numOfStocks_; // Number of stocks in our possession
int EMA_; // Exponential Moving Average
int lastEMA_; // Last EMA
public:
// functions
};
As you can see from my last two attributes, I wish to implement an Exponential Moving Average as part of a Trend Following Algorithm.
But I think I didn't quite understand how to implement it; here's my calcEMA function that simply calculates the EMA:
int Trade::calcEMA()
{
return ((this->currentStock_ - this->lastEMA_
* (2/(this->daysInTrading_ + 1)))
+ this->lastEMA_);
}
But when my stock values (passed in a file) are like such:
1000, 1100, 1200, 1300, 1400, 1500, 1400, 1300, 1200, 1100, 1000
As to make sure my EMA makes sense, and well... it does not !
Where did I go wrong on the operation?
Aditionally, what value should I give lastEMA if it's the first time I call calcEMA?
I believe that you are missing a parentheses in the "calcEMA" function. How about breaking the expression up into smaller expressions with temporary variables to hold intermediate results like this?
int Trade::calcEMA()
{
auto mult = 2/(timePeriod_ + 1);
auto rslt = (currentStock_ - lastEMA_) * mult + lastEMA_;
return rslt;
}
Also, as user PaulMcKenzie pointed out in the comment on your question, you are using the integer to do floating point calculation. You may consider using float or double to avoid possible truncation.
Here are my suggestions:
An EMA like yours is defined for a time period. While daysInTrading is less or equal to timePeriod, lastEMA should be set to a normal average.
Once daysInTrading is greater than your timePeriod you can start calling your "calcEMA" function with initialized lastEMA.
Please, remember to update lastEMA after each call to the "calcEMA" function.
Here is my code for you:
#include <vector>
#include <list>
#include <iostream>
// calculate a moving average
double calcMA (double previousAverage,
unsigned int previousNumDays,
double newStock) {
auto rslt = previousNumDays * previousAverage + newStock;
return rslt / (previousNumDays + 1.0);
}
// calculate an exponential moving average
double calcEMA (double previousAverage,
int timePeriod,
double newStock) {
auto mult = 2.0 / (timePeriod + 1.0);
auto rslt = (newStock - previousAverage) * mult + previousAverage;
return rslt;
}
class Trade {
unsigned int timePeriod_ = 5;
double lastMA_ = 0.0;
std::list<double> stockPrices_;
public:
void addStock (double newStock) {
stockPrices_.push_back(newStock);
auto num_days = stockPrices_.size();
if (num_days <= timePeriod_)
lastMA_ = calcMA(lastMA_, num_days - 1, newStock);
else
lastMA_ = calcEMA(lastMA_, num_days - 1, newStock);
}
double getAverage() const { return lastMA_; }
};
// ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- -----
int main() {
std::vector<double> stocks =
{1000, 1100, 1200, 1300, 1400, 1500,
1400, 1300, 1200, 1100, 1000};
Trade trade;
for (auto stock : stocks)
trade.addStock(stock);
std::cout << "Average: " << trade.getAverage() << std::endl;
return 0;
}
The operation is wrong, as you noticed.
Disclaimer I got this algorithm from wikipedia, and as such might no be accurate. Here (page 3) might be a better one, but I can't judge, I never used those algorithms and so have no idea what I'm talking about :)
c(EMA) = y(EMA) + a * (c(price) - y(EMA))
c(EMA) is current EMA
y(EMA) is previous EMA
a is some "random" value between 0 and 1
c(price) is current price
But you did almost the same thing:
c(EMA) = (c(price) - y(EMA) * b) + y(EMA)
I don't know why you did 2 / daysInTrading_ + 1, but this will not always be a value between 0 and 1 (actually, it might even be most of the time 0, because those are all intergers).
You put a parenthesis at the wrong place (after b, and not after y(EMA)).
So the operation will now look like this:
lastEMA_ + 0.5 * (currentStock_ - lastEMA_)
For the first lastEMA_, according to Wikipedia:
S1 is undefined. S1 may be initialized in a number of different ways, most commonly by setting S11 [First element in the list], though other techniques exist, such as setting S1 to an average of the first 4 or 5 observations.
The importance of the S1 initialisations effect on the resultant moving average depends on α; smaller α values make the choice of S1 relatively more important than larger α values, since a higher α discounts older observations faster.
There are generally two accepted forms of EMA.
The traditional:
m = 2/(1+n) // where n >= 1
EMA = m * currentPrice + (1-m) * previousEMA
rf the Wilder:
m = 1/n // where n >= 1
EMA Wilder = m * currentPrice + (1-m) * previousEMA
I've profiled my model and it seems that this kernel accounts for about 2/3 of my total runtime. I was looking for suggestions to optimize it. The code is as follows.
__global__ void calcFlux(double* concs, double* fluxes, double* dt)
{
int idx = blockIdx.x * blockDim.x + threadIdx.x;
fluxes[idx]=knowles_flux(idx, concs);
//fluxes[idx]=flux(idx, concs);
}
__device__ double knowles_flux(int r, double *conc)
{
double frag_term = 0;
double flux = 0;
if (r == ((maxlength)-1))
{
//Calculation type : "Max"
flux = -km*(r)*conc[r]+2*(ka)*conc[r-1]*conc[0];
}
else if (r > ((nc)-1))
{
//Calculation type : "F"
//arrSum3(conc, &frag_term, r+1, maxlength-1);
for (int s = r+1; s < (maxlength); s++)
{
frag_term += conc[s];
}
flux = -(km)*(r)*conc[r] + 2*(km)*frag_term - 2*(ka)*conc[r]*conc[0] + 2*(ka)*conc[r-1]*conc[0];
}
else if (r == ((nc)-1))
{
//Calculation type : "N"
//arrSum3(conc, &frag_term, r+1, maxlength-1);
for (int s = r+1; s < (maxlength); s++)
{
frag_term += conc[s];
}
flux = (kn)*pow(conc[0],(nc)) + 2*(km)*frag_term - 2*(ka)*conc[r]*conc[0];
}
else if (r < ((nc)-1))
{
//Calculation type : "O"
flux = 0;
}
return flux;
}
Just to give you an idea of why the for loop is an issue, this kernel is launched on an array of about maxlength = 9000 elements. For our purposes now, nc is in the range of 2-6. Here's an illustration of how this kernel processes the incoming array (conc). For this array, five different types of calculations need to be applied to different groups of elements.
Array element : 0 1 2 3 4 5 6 7 8 9 ... 8955 8956 8957 8958 8959 8960
Type of calc : M O O O O O N F F F ... F F F F F Max
The potential problems I've been trying to deal with right now are branch divergence from the quadruple if-else and the for loop.
My idea for dealing with the branch divergence is to break this kernel down into four separate device functions or kernels that treat each region separately and all launch at the same time. I'm not sure this is significantly better than just letting the branch divergence take place, which if I'm not mistaken, would cause the four calculation types to be run in serial.
To deal with the for loop, you'll notice that there's a commented out arrSum3 function, which I wrote based off my previously (and probably poorly) written parallel reduction kernel. Using it in place of the for loop drastically increased my runtime. I feel like there's a clever way to accomplish what I'm trying to do with the for loop, but I'm just not that smart and my advisor is tired of me "wasting time" thinking about it.
Appreciate any help.
EDIT
Full code is located here : https://stackoverflow.com/q/21170233/1218689
Assuming sgn() and abs() are not derived from "if"s and "else"s
__device__ double knowles_flux(int r, double *conc)
{
double frag_term = 0;
double flux = 0;
//Calculation type : "Max"
//no divergence
//should prefer 20-30 extra cycles instead of a branching.
//may not be good for CPU
fluxA = (1-abs(sgn(r-(maxlength-1)))) * (-km*(r)*conc[r]+2*(ka)*conc[r-1]*conc[0]);
//is zero if r and maxlength-1 are not equal
//always compute this in shared memory so work will be equal for all cores, no divergence
// you should divide kernel into several pieces to do a reduction
// but if you dont want that, then you can try :
for (int s = 0;s<someLimit ; s++) // all count for same number of cycles so no divergence
{
frag_term += conc[s] * ( abs(sgn( s-maxlength ))*sgn(1- sgn( s-maxlength )) )* ( sgn(1+sgn(s-(r+1))) );
}
//but you can make easier of this using "add and assign" operation
// in local memory (was it __shared in CUDA?)
// global conc[] to local concL[] memory(using all cores)(100 cycles)
// for(others from zero to upper_limit)
// if(localID==0)
// {
// frag_termL[0]+=concL[s] // local to local (10 cycles/assign.)
// frag_termL[0+others]=frag_termL[0]; // local to local (10 cycles/assign.)
// } -----> uses nearly same number of cycles but uses much less energy
//using single core (2000 instr. with single core vs 1000 instr. with 2k cores)
// in local memory, then copy it to private registers accordingly using all cores
//Calculation type : "F"
fluxB = ( abs(sgn(r-(nc-1)))*sgn(1+sgn(r-(nc-1))) )*(-(km)*(r)*conc[r] + 2*(km)*frag_term - 2*(ka)*conc[r]*conc[0] + 2*(ka)*conc[r-1]*conc[0]);
// is zero if r is not greater than (nc-1)
//Calculation type : "N"
fluxC = ( 1-abs(sgn(r-(nc-1))) )*((kn)*pow(conc[0],(nc)) + 2*(km)*frag_term - 2*(ka)*conc[r]*conc[0]);
//zero if r and nc-1 are not equal
flux=fluxA+fluxB+fluxC; //only one of these can be different than zero
flux=flux*( -sgn(r-(nc-1))*sgn(1-sgn(r-(nc-1))) )
//zero if r > (nc-1)
return flux;
}
Okay, let me open a bit:
if(a>b) x+=y;
can be taken as
if a-b is negative sgn(a-b) is -1
then adding 1 to that -1 gives zero ==> satisfies lower part of comparison(a<b)
x+= (sgn(a-b) +1) = 0 if a<b (not a>b), x unchanged
if(a-b) is zero, sgn(a-b) is zero
then we should multiply the upper solution with sgn(a-b) too!
x+= y*(sgn(a-b) +1)*sgn(a-b)
means
x+= y*( 0 + 1) * 0 = 0 a==b is satisfied too!
lets check what happens if a>b
x+= y*(sgn(a-b) +1)*sgn(a-b)
x+= y*(1 +1)*1 ==> y*2 is not acceptable, needs another sgn on outherside
x+= y* sgn((sgn(a-b)+1)*sgn(a-b))
x+= y* sgn((1+1)*1)
x+= y* sgn(2)
x+= y only when a is greater than b
when there are too many
abs(sgn(r-(nc-1))
then you can re-use it as
tmp=abs(sgn(r-(nc-1))
..... *tmp*(tmp-1) ....
...... +tmp*zxc[s] .....
...... ......
to decrease total cycles even more! Register accessing can be in the level of terabytes/s so shouldnt be a problem. Just as doing that for global access:
tmpGlobal= conc[r];
...... tmpGlobal * tmp .....
.... tmpGlobal +x -y ....
all private registers doing stuff in terabytes per second.
Warning: reading from conc[-1] shouldnt cause any faults as long as it is multiplied by zero if the real address of conc[0] is not real zero already . But writing is hazardous.
if you need to escape from conc[-1] anyway, you can multiply the index with some absolut-ified value too! See:
tmp=conc[i-1] becomes tmp=conc[abs((i-1))] will always read from positive index, the value will be multiplied by zero later anyway. This was lower bound protection.
You can apply a higher bound protection too. Just this adds even more cycles.
Think about using vector-shuffle operations if working on a pure scalar values is not fast enough when accessing conc[r-1] and conc[r+1]. Shuffle operation between a vector's elements is faster than copying it through local mem to another core/thread.