I have the following code to compute modulo between two floating point numbers:
auto mod(float x, float denom)
{
return x>= 0 ? std::fmod(x, denom) : denom + std::fmod(x + 1.0f, denom) - 1.0f;
}
It does only work partially for negative x:
-8 0
-7.75 0.25
-7.5 0.5
-7.25 0.75
-7 1
-6.75 1.25
-6.5 1.5
-6.25 1.75
-6 2
-5.75 2.25
-5.5 2.5
-5.25 2.75
-5 3
-4.75 -0.75 <== should be 3.25
-4.5 -0.5 <== should be 3.5
-4.25 -0.25 <== should be 3.75
-4 0
-3.75 0.25
-3.5 0.5
-3.25 0.75
-3 1
-2.75 1.25
-2.5 1.5
-2.25 1.75
-2 2
-1.75 2.25
-1.5 2.5
-1.25 2.75
-1 3
-0.75 3.25
-0.5 3.5
-0.25 3.75
0 0
How to fix it for negative x. Denom is assumed to be an integer greater than 0. Note: fmod as is provided by the standard library is broken for x < 0.0f.
x is in the left column, and the output is in the right column, like so:
for(size_t k = 0; k != 65; ++k)
{
auto x = 0.25f*(static_cast<float>(k) - 32);
printf("%.8g %.8g\n", x, mod(x, 4));
}
Note: fmod as is provided by the standard library is broken for x < 0.0f
I guess you want the result to always be a positive value1:
In mathematics, the result of the modulo operation is an equivalence class, and any member of the class may be chosen as representative; however, the usual representative is the least positive residue, the smallest non-negative integer that belongs to that class (i.e., the remainder of the Euclidean division).
The usual workaround was shown in Igor Tadetnik's comment, but that seems not enough.
#IgorTandetnik That worked. Pesky signed zero though, but I guess you cannot do anything about that.
Well, consider this(2, 3):
auto mod(double x, double denom)
{
auto const r{ std::fmod(x, denom) };
return std::copysign(r < 0 ? r + denom : r, 1);
}
1) https://en.wikipedia.org/wiki/Modulo
2) https://en.cppreference.com/w/cpp/numeric/math/copysign
3) https://godbolt.org/z/fdr9cbsYT
I have a dataset which have columns Event and Time. I need to create columns Group and Cumulative. What I need to measure is the duration of the Event 'Event1_Stop' until an 'Event1_Start' appears. Last group should sum the time meaning that the stop is ongoing and no start for the event has entered.
My data sample is:
data have;
length Event $15;
input Event $ Time;
datalines;
Event3_Start 0.2
Event2_Start 0.4
Event2_Stop 0.2
Event1_Stop 0.2
Event3_Start 0
Event4_Start 0.5
Event3_Stop 0.2
Event1_Start 0
Event4_Stop 0
Event4_Stop 0
Event1_Stop 0.3
Event3_Start 0.3
Event1_Start 0
Event3_Start 0.4
Event3_Stop 0
Event1_Stop 0.2
Event3_Start 0.2
Event2_Start 0.4
run;
The result dataset that I need to obtain is:
data have;
length Event $15;
input Event $ Time Group Cumulative;
datalines;
Event3_Start 0.2 0 0
Event2_Start 0.4 0 0
Event2_Stop 0.2 0 0
Event1_Stop 0.2 1 0.9
Event3_Start 0 1 0
Event4_Start 0.5 1 0
Event3_Stop 0.2 1 0
Event1_Start 0 0 0
Event4_Stop 0 0 0
Event4_Stop 0 0 0
Event1_Stop 0.3 2 0.6
Event3_Start 0.3 2 0
Event1_Start 0 0 0
Event3_Start 0.4 0 0
Event3_Stop 0 0 0
Event1_Stop 0.2 3 0.8
Event3_Start 0.2 3 0
Event2_Start 0.4 3 0
run;
Thanks for your suggestions.
Regards.
Thanks to #mkeintz on SAS forum for the solution:
data stop_to_start (keep=group cumulative);
set have end=end_of_have;
group+(event='Event1_Stop');
if event='Event1_Stop' then cumulative=0;
cumulative+time;
if end_of_have or event='Event1_Start' ;
run;
data want;
set have;
if _n_=1 or event='Event1_Start' then group=0;
cumulative=0;
if event='Event1_Stop' then set stop_to_start;
run;
I have a problem with sas macro and macro variable. When I use it, I get information: 'A character operand was found in the %eval function or %if condition were numeric.
I have something like distribution (d1-d5) and I want to get similar variables but shifted about diff (data before diff are equal 0). Below example table - of course I need to do something for much bigger table.
Example_table
Name d1 d2 d3 d4 d5 diff
A 0.2 0.2 0.1 0.2 0.3 1
B 0.3 0.1 0.4 0.3 0 2
C 0.1 0.2 0 0.4 0.3 2
Table I want to get: (new_table)
Name n1 n2 n3 n4 n5 diff
A 0 0.2 0.2 0.1 0.2 1
B 0 0 0.3 0.1 0.4 2
C 0 0 0.1 0.2 0 2
Data example_table;
Name = A B C;
d1 = 0.2 0.3 0.1;
d2 = 0.2 0.1 0.2;
d3 = 0.1 0.4 0;
d4 = 0.2 0.3 0.4;
d5 = 0.3 0 0.3;
diff = 1 2 2;
run;
%macro distr ();
%local i;
%do i = 1 %to 5;
if &i. <= diff then n&i. = 0;
else n&i. = d%eval(&i. - diff);
/* I cant compute this eval, it looks like diff is character variable..., but it doesn't */
%end;
%mend;
Data new_table;
Set example_table;
%distr();
run;
The macro processor knows nothing about the values of your dataset variables.
You are trying to subtract the letters diff from the value of the macro variable i. That cannot work.
You will want to use SAS code to do your data manipulation, not macro code. For example by using arrays.
data example_table;
input Name d1-d5 diff ;
cards;
A 0.2 0.2 0.1 0.2 0.3 1
B 0.3 0.1 0.4 0.3 0 2
C 0.1 0.2 0 0.4 0.3 2
;
data want;
set example_table;
array d d1-d5;
array n n1-n5;
do index=1 to dim(n);
if 1 <= index-diff <= dim(d) then n[index]=d[index-diff];
else n[index]=0;
end;
drop index d1-d5;
run;
Results:
Obs Name diff n1 n2 n3 n4 n5
1 A 1 0 0.2 0.2 0.1 0.2
2 B 2 0 0.0 0.3 0.1 0.4
3 C 2 0 0.0 0.1 0.2 0.0
You're mixing up SAS and Macro language here, specifically:
%eval(&i. - diff)
%eval is a macro function, meaning it applies to the text of the code. diff is a SAS data step variable, meaning it has some value - but %eval only operates on the text itself. So %eval is trying to take &i (a number) and subtract from it the letters diff (not a number).
Fortunately it's pretty easy - &i is available to the SAS datastep, as a number. You can use an array to resolve the problem! First declare the array, then...
else n&i. = d[&i].;
Of course, you don't need to use the macro language at all here.
data new_table;
set example_table;
array d[5] d1-d5; *technically d1-d5 is unneeded here as those are the default names;
array n[5] n1-n5; *also n1-n5 unneeded, but it is more clear;
do i = 1 to dim(d);
if i <= diff then n[i] = 0;
else n[i] = d[i];
end;
run;
I need to create a variable that takes the product of the values of all prior values and including the one in the current obs.
data temp;
input time cond_prob;
datalines;
1 1
2 0.2
3 0.3
4 0.4
5 0.6
;
run;
Final data should be:
1 1
2 0.2 (1*0.2)
3 0.06 (0.2* 0.3)
4 0.024 (0.06 * 0.4
5 0.0144 (0.024 *0.6)
This seems like a simple code but I can't get it to work. I can do cumulative sums but cumulative product is not working when using the same logic.
Use the RETAIN functionality.
For the first record I set it to a value of 1 because anything multiplied by 1 will stay the same.
data want;
set temp;
retain cum_product 1;
cum_product = cond_prob * cum_product;
run;
*New to Python.
I'm trying to merge multiple text files into 1 csv; example below -
filename.csv
Alpha
0
0.1
0.15
0.2
0.25
0.3
text1.txt
Alpha,Beta
0,10
0.2,20
0.3,30
text2.txt
Alpha,Charlie
0.1,5
0.15,15
text3.txt
Alpha,Delta
0.1,10
0.15,20
0.2,50
0.3,10
Desired output in the csv file: -
filename.csv
Alpha Beta Charlie Delta
0 10 0 0
0.1 0 5 10
0.15 0 15 20
0.2 20 0 50
0.25 0 0 0
0.3 30 0 10
The code I've been working with and others that were provided give me an answer similar to what is at the bottom of the page
def mergeData(indir="Dir Path", outdir="Dir Path"):
dfs = []
os.chdir(indir)
fileList=glob.glob("*.txt")
for filename in fileList:
left= "/Path/Final.csv"
right = filename
output = "/Path/finalMerged.csv"
leftDf = pandas.read_csv(left)
rightDf = pandas.read_csv(right)
mergedDf = pandas.merge(leftDf,rightDf,how='inner',on="Alpha", sort=True)
dfs.append(mergedDf)
outputDf = pandas.concat(dfs, ignore_index=True)
outputDf = pandas.merge(leftDf, outputDf, how='inner', on='Alpha', sort=True, copy=False).fillna(0)
print (outputDf)
outputDf.to_csv(output, index=0)
mergeData()
The answer I get however is instead of the desired result: -
Alpha Beta Charlie Delta
0 10 0 0
0.1 0 5 0
0.1 0 0 10
0.15 0 15 0
0.15 0 0 20
0.2 20 0 0
0.2 0 0 50
0.25 0 0 0
0.3 30 0 0
0.3 0 0 10
IIUC you can create list of all DataFrames - dfs, in loop append mergedDf and last concat all DataFrames to one:
import pandas
import glob
import os
def mergeData(indir="dir/path", outdir="dir/path"):
dfs = []
os.chdir(indir)
fileList=glob.glob("*.txt")
for filename in fileList:
left= "/path/filename.csv"
right = filename
output = "/path/filename.csv"
leftDf = pandas.read_csv(left)
rightDf = pandas.read_csv(right)
mergedDf = pandas.merge(leftDf,rightDf,how='right',on="Alpha", sort=True)
dfs.append(mergedDf)
outputDf = pandas.concat(dfs, ignore_index=True)
#add missing rows from leftDf (in sample Alpha - 0.25)
#fill NaN values by 0
outputDf = pandas.merge(leftDf,outputDf,how='left',on="Alpha", sort=True).fillna(0)
#columns are converted to int
outputDf[['Beta', 'Charlie']] = outputDf[['Beta', 'Charlie']].astype(int)
print (outputDf)
outputDf.to_csv(output, index=0)
mergeData()
Alpha Beta Charlie
0 0.00 10 0
1 0.10 0 5
2 0.15 0 15
3 0.20 20 0
4 0.25 0 0
5 0.30 30 0
EDIT:
Problem is you change parameter how='left' in second merge to how='inner':
def mergeData(indir="Dir Path", outdir="Dir Path"):
dfs = []
os.chdir(indir)
fileList=glob.glob("*.txt")
for filename in fileList:
left= "/Path/Final.csv"
right = filename
output = "/Path/finalMerged.csv"
leftDf = pandas.read_csv(left)
rightDf = pandas.read_csv(right)
mergedDf = pandas.merge(leftDf,rightDf,how='inner',on="Alpha", sort=True)
dfs.append(mergedDf)
outputDf = pandas.concat(dfs, ignore_index=True)
#need left join, not inner
outputDf = pandas.merge(leftDf, outputDf, how='left', on='Alpha', sort=True, copy=False)
.fillna(0)
print (outputDf)
outputDf.to_csv(output, index=0)
mergeData()
Alpha Beta Charlie Delta
0 0.00 10.0 0.0 0.0
1 0.10 0.0 5.0 0.0
2 0.10 0.0 0.0 10.0
3 0.15 0.0 15.0 0.0
4 0.15 0.0 0.0 20.0
5 0.20 20.0 0.0 0.0
6 0.20 0.0 0.0 50.0
7 0.25 0.0 0.0 0.0
8 0.30 30.0 0.0 0.0
9 0.30 0.0 0.0 10.0
import pandas as pd
data1 = pd.read_csv('samp1.csv',sep=',')
data2 = pd.read_csv('samp2.csv',sep=',')
data3 = pd.read_csv('samp3.csv',sep=',')
df1 = pd.DataFrame({'Alpha':data1.Alpha})
df2 = pd.DataFrame({'Alpha':data2.Alpha,'Beta':data2.Beta})
df3 = pd.DataFrame({'Alpha':data3.Alpha,'Charlie':data3.Charlie})
mergedDf = pd.merge(df1, df2, how='outer', on ='Alpha',sort=False)
mergedDf1 = pd.merge(mergedDf, df3, how='outer', on ='Alpha',sort=False)
a = pd.DataFrame(mergedDf1)
print(a.drop_duplicates())
output:
Alpha Beta Charlie
0 0.00 10.0 NaN
1 0.10 NaN 5.0
2 0.15 NaN 15.0
3 0.20 20.0 NaN
4 0.25 NaN NaN
5 0.30 30.0 NaN