Sqlite (C API) and query (select) on cyclic/symmetric values with user defined functions - c++

I'm using Sqlite with C++ and have two similar problems :
1) I need to select 4 entries to make an interpolation.
For example, my table could look like this :
angle (double) | color (double)
0 0.1
30 0.5
60 0.9
90 1.5
... ...
300 2.9
330 3.5
If I want to interpolate the value corresponding to 95°, I will use the entries 60°, 90°, 120° and 150°.
To get those entries, my request will be SELECT color FORM mytable WHERE angle BETWEEN 60 and 150, no big deal.
Now if I want 335°, I will need 300°, 330°, 360°(=0°) and 390°(=30°).
My query will then be SELECT color FORM mytable WHERE angle BETWEEN 300 and 330 OR angle BETWEEN 0 and 30.
I can't use SELECT color FORM mytable WHERE angle BETWEEN 300 and 390 because this will only return 2 colors.
Can I use the C API and user defined functions to include some kind of modulo meaning in my queries ?
It would be nice if I could use a user defined function to use the query [...] BETWEEN 300 and 390 and get as result the rows 300, 330, 0 and 30.
2) An other table looks like this :
speed (double) | color (double) | var (double)
0 0.1 0
10 0.5 1
20 0.9 2
30 1.5 3
... ... ...
In reality due to symmetry, color(speed) = color(-speed) but var(-speed) = myfunc(var(speed)).
I would like to make queries such as SELECT * FROM mytable WHERE speed BETWEEN -20 and 10 and be able to make a few operations with the API on the "virtual" rows with a negative speed and return them as a regular result.
For example I would like the result of the query SELECT * FROM mytable WHERE speed BETWEEN -20 and 10 to be like this :
speed (double) | color (double) | var (double)
-20 0.9 myfunc(2)
-10 0.5 myfunc(1)
0 0.1 0
10 0.5 1
Is that possible ?
Thanks for your help :)

I would suggest to use a query with two intervals :
SELECT * from mytable WHERE (speed >= MIN(?1,?2) AND speed <= MAX(?1,?2)) OR ((MAX(?1,?2) > 360) AND (speed >= 0 AND speed <= MAX(?1,?2)%360));
This example works fine if ?1 and ?2 are positive.

Related

PowerBI Matrix Average instead of Subtotal and Conditional Formatting According to That Average

Hello I am just new in powerBI and it is still hard to work on for me.
I have a matrix like that
DATE Sales Refund
26 Agu 45 5
p1 10 3
p2 15 2
p3 20 0
27 Agu 60 1
p1 15 1
p2 20 0
p3 25 0
In the date parts I have subtotals as it normally does. However, I want to show the average of that day there and when I get the average I will make conditional formatting according to it. If a cell is below average I will mark it with red point and in refunds I will do it for the values above the average.
Is there a way to do that. I searched for it for awhile but could not find.
The output I want is like that. (star is for red point.)
DATE Sales Refund
26 Agu 15 1.66
p1 10* 3*
p2 15 2*
p3 20 0
27 Agu 20 0.33
p1 15* 1*
p2 20 0
p3 25 0
Thanks.
You can colour the background; For example, create this measure:
AVG =
IF( SELECTEDVALUE(RefundTab[Sale] ) < CALCULATE(AVERAGE(RefundTab[Sale]), ALL(RefundTab[Code])),0,1)
From menu -> Conditional formatting -> Background color:
And here:
OR
you can create measure where we return string instead of number where we put some unicode value:
SumSaleIf =
var _sale = sum(RefundTab[Sale])
var _IfAVG = CALCULATE(AVERAGE(RefundTab[Sale]), ALL(RefundTab[Code]))
var _check = if(_sale < _IfAVG, _sale & UNICHAR(128315), _sale &"")
return _check

How to calculate cumulative product in SAS?

I need to create a variable that takes the product of the values of all prior values and including the one in the current obs.
data temp;
input time cond_prob;
datalines;
1 1
2 0.2
3 0.3
4 0.4
5 0.6
;
run;
Final data should be:
1 1
2 0.2 (1*0.2)
3 0.06 (0.2* 0.3)
4 0.024 (0.06 * 0.4
5 0.0144 (0.024 *0.6)
This seems like a simple code but I can't get it to work. I can do cumulative sums but cumulative product is not working when using the same logic.
Use the RETAIN functionality.
For the first record I set it to a value of 1 because anything multiplied by 1 will stay the same.
data want;
set temp;
retain cum_product 1;
cum_product = cond_prob * cum_product;
run;

Converting a timestamp to freshness index

I have a column in dataframe which has article and its publication date (timestamp). I need to use this information to find out a freshness score of an article.
articleId publicationDate
0 581354 2017-09-17 15:16:55
1 581655 2017-09-18 07:37:51
2 580864 2017-09-16 06:44:39
3 581610 2017-09-18 06:30:30
4 581605 2017-09-18 07:22:24
Most recent article should get higher score. Timewindow should be half an hour (2 articles published in half an hour must get same score)
Some of the code below might be redundant but it seems to work:
df['score'] = df['publicationDate'] - df['publicationDate'].max()
df['score'] = (df['score'] / np.timedelta64(1, 'm')).apply(lambda x: (round(x / 30) * 30 + 30) / 30 if x else x).rank(method='max')
So you convert timedelta to minutes, then round it to 30, and finally rank that value.
It can also be a one-liner if you please:
df['score'] = ((df['publicationDate'] - df['publicationDate'].max()) / np.timedelta64(1, 'm')).apply(lambda x: (round(x / 30) * 30 + 30) / 30 if x else x).rank(method='max')
Explaination:
(df['publicationDate'] - df['publicationDate'].max() - subtract all dates from most recent one
(df['score'] / np.timedelta64(1, 'm')) - convert timedelta into minutes
.apply(lambda x: (round(x / 30) * 30 + 30) / 30 if x else x) - roundup to 30 minutes excluding most recent timestamp
.rank(method='max') rank the results giving upper value to all those that have same rank.
EDIT:
To change rank of those older than 2 days you can use this:
df['diff'] = (df['publicationDate'] - df['publicationDate'].max()).apply(lambda x: x.days)
df.loc[df['diff']<=-2, 'score'] = 0
First line will give you timedelta in whole days, and second one will change rank to 0 where days are less or equal to -2.

removing duplicates in SAS Enterprise Guide

I have data that looks somewhat like this:
A B C
1 X 0.5
1 X 0.6
1 X 0.7
1 Y 100
1 Y 101
2 X 0.4
...
I want to remove the duplicates so that the data looks like:
A B C
1 X 0.5
1 Y 100
2 X 0.4
...
Can anyone help please? I have tried using sort and sql but it did not work.
Try SORT again. In EG look at the SORT Task. There should be an option to select unique key and or unique observations.Your keys appear to be A/B.

Optimise conversion to integer - pandas

I have a DataFrame with 80,000 rows. One column 'prod_prom' contains either null values or string representations of numbers, i.e. including ','. I need to convert these to integers. So far I have been doing this:
for row in DF.index:
if pd.notnull(DF.loc[row, 'prod_prom']):
DF.loc[row, 'prod_prom'] = int(''.join([char for char in DF.loc[row, 'prod_prom'] if char != ',']))
But it is extremely slow. Would it be quicker to do this in list comprehension, or with an apply function? What is best practice for this kind of operation?
Thanks
So if I understand right, you have data like the following:
data = """
A,B
100,"5,000"
200,"10,000"
300,"100,000"
400,
500,"2,000"
"""
If that is the case probably the easiest thing is to use the thousands option in read_csv (the type will be float instead of int because of the missing value):
df = pd.read_csv(StringIO(data),header=True,thousands=',')
A B
0 100 5000
1 200 10000
2 300 100000
3 400 NaN
4 500 2000
If that is not an possible you can do something like the following:
print df
A B
0 100 5,000
1 200 10,000
2 300 100,000
3 400 NaN
4 500 2,000
df['B'] = df['B'].str.replace(r',','').astype(float)
print df
A B
0 100 5000
1 200 10000
2 300 100000
3 400 NaN
4 500 200
I changed the type to float because there are no NaN integers in pandas.