How I can set data with specific system - sas

I have little problem. As example tables HAVE1 and HAVE2 I want create table like WANT, set below specific row data from HAVE2 - to all column (since COL1 to COL19, without COL20) - and get table like WANT. How I can do?
data HAVE1;
infile DATALINES dsd missover;
input ID NAME $ COL1-COL20;
CARDS;
1, A1, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19 ,20
2, A2, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20
3, B1, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 12, 13, 14, 15, 16, 16, 20, 21 , 21, 22
4, B2, 1, 20, 3, 20, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 23, 22, 23
5, C1, 20, 2, 3, 4, 5, 6, 7, 8, 9, 10, 30, 12, 13, 14, 15, 16, 17, 17, 17, 17
6, C2, 1, 2, 3, 20, 5, 6, 7, 8, 02, 10, 11, 12, 30, 14, 15, 16, 17, 18, 19, 20
;run;
Data HAVE2;
infile DATALINES dsd missover;
input ID NAME $ WARTOSC;
CARDS;
1, SUM, 50000
2, SUM, 55000
3, SUM, 60000
;run;
DATA WANT;
infile DATALINES dsd missover;
input ID NAME $ COL1-COL20;
CARDS;
1, A1, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19 ,20
1, SUM_1 ,50000,50000,50000,50000,50000,50000,50000,50000,50000,50000,50000,50000,50000,50000,50000,50000,50000,50000,50000
2, A2, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20
2, SUM_2, 55000,55000,55000,55000,55000,55000,55000,55000,55000,55000,55000,55000,55000,55000,55000,55000,55000,55000,55000
3, B1, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 12, 13, 14, 15, 16, 16, 20, 21 , 21, 22
3, SUM_3,60000,60000,60000,60000,60000,60000,60000,60000,60000,60000,60000,60000,60000,60000,60000,60000,60000,60000,60000
4, B2, 1, 20, 3, 20, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 23, 22, 23
5, C1, 20, 2, 3, 4, 5, 6, 7, 8, 9, 10, 30, 12, 13, 14, 15, 16, 17, 17, 17, 17
6, C2, 1, 2, 3, 20, 5, 6, 7, 8, 02, 10, 11, 12, 30, 14, 15, 16, 17, 18, 19, 20
;run;

So it sounds like you just need to reformat the second dataset to match what you want and them combine them. Just copy the value of WARTOSC to all of the columns and drop the original WARTOSC variable.
data HAVE1;
infile CARDS dsd truncover;
input ID NAME $ COL1-COL5;
CARDS;
1, A1, 1, 2, 3, 4, 5
2, A2, 1, 2, 3, 4, 5
3, B1, 3, 4, 5, 6, 7
4, B2, 1, 20, 3, 20, 5
5, C1, 20, 2, 3, 4, 5
6, C2, 1, 2, 3, 20, 5
;
data HAVE2;
infile CARDS dsd truncover;
input ID NAME $ WARTOSC;
CARDS;
1, SUM, 50000
2, SUM, 55000
3, SUM, 60000
;
data have2_fixed;
set have2;
name=catx('_',name,id);
array col col1-col5;
do over col ; col=wartosc; end;
drop wartosc;
run;
data want ;
set have1 have2_fixed;
by id;
run;
You could actually make the changes during the merge if the datasets are large.
data want ;
set have1 have2 (in=in2);
by id;
array col col1-col5;
if in2 then do;
name=catx('_',name,id);
do over col ; col=wartosc; end;
end;
drop wartosc;
run;
Results:
Obs ID NAME COL1 COL2 COL3 COL4 COL5
1 1 A1 1 2 3 4 5
2 1 SUM_1 50000 50000 50000 50000 50000
3 2 A2 1 2 3 4 5
4 2 SUM_2 55000 55000 55000 55000 55000
5 3 B1 3 4 5 6 7
6 3 SUM_3 60000 60000 60000 60000 60000
7 4 B2 1 20 3 20 5
8 5 C1 20 2 3 4 5
9 6 C2 1 2 3 20 5

Your wanted table is quite peculiar, you might be better off producing a report instead of a data set that you might simply proc print.
Regardless, the step will, for have2, require transformation of name and replication of wartosc.
For example:
data want (drop=wartosc);
set have1 end=end1;
output;
if not end2 then
set have2(rename=id=id2) end=end2;
if id = id2 then do;
array col col1-col20;
do over col; col=wartosc; end;
name = catx('_', name, id);
output;
end;
run;
You might need some more logic if the case of want2 having more rows than want1 can occur.

Related

Power Query - How to add up values of two different databases into one (online sales + instore sales = total sales)?

I would like to add up the values of two different queries into one. To make it a bit simpler, it's like I have the online sales data and in-store sales data and I would like to add up the sales for each category.
For instance :
Online sales
Apple Orange Pineapple Grape
January, 1, 2023 5 3 8 3
January, 2, 2023 1 2 3 7
January, 3, 2023 2 4 7 2
January, 4, 2023 5 4 8 1
January, 5, 2023 3 8 9 9
In-store sales
Apple Orange Pineapple Grape
January, 1, 2023 1 5 9 1
January, 2, 2023 5 6 3 7
January, 3, 2023 2 3 8 6
January, 4, 2023 1 2 3 7
January, 5, 2023 3 5 1 6
What I would like to have is something like this :
Total sales
Apple Orange Pineapple Grape
January, 1, 2023 6 8 17 4
January, 2, 2023 6 8 6 14
January, 3, 2023 4 7 15 8
January, 4, 2023 6 6 11 8
January, 5, 2023 6 13 10 15
In my original databases, I have way more columns and rows so it's almost impossible to do it manually.
Do you have any suggestions?
If you want, you can copy and paste this basic code in Power Query to have these two data sets.
Online sales :
#table({"Date", "Apple", "Orange", "Pineapple", "Grape"},{
{"January, 1, 2023", 5, 3, 8, 3},
{"January, 2, 2023", 1, 2, 3, 7},
{"January, 3, 2023", 2, 4, 7, 2},
{"January, 4, 2023", 5, 4, 8, 1},
{"January, 5, 2023", 3, 8, 9, 9}})
In-Store Sales :
#table({"Date", "Apple", "Orange", "Pineapple", "Grape"},{
{"January, 1, 2023", 1, 5, 9, 1},
{"January, 2, 2023", 5, 6, 3, 7},
{"January, 3, 2023", 2, 3, 8, 6},
{"January, 4, 2023", 1, 2, 3, 7},
{"January, 5, 2023", 3, 5, 1, 6}})
Unpivot then repivot, so you don't have to adjust code based on number of columns
Append the tables
Right click the date column and unpivot other columns
Click select the Attribute column, transform .. pivot column .. and use Value as the values
Sample code:
let Source = Table.Combine({#"Online Sales", #"In-Store Sales"}),
#"Unpivoted Other Columns" = Table.UnpivotOtherColumns(Source, {"Date"}, "Attribute", "Value"),
#"Pivoted Column" = Table.Pivot(#"Unpivoted Other Columns", List.Distinct(#"Unpivoted Other Columns"[Attribute]), "Attribute", "Value", List.Sum)
in #"Pivoted Column"
Basically you can append both the queries (Query1 & Query 2) and then apply group by like below:

Django ORM fill 0 for missing date

I'm using Django 2.2.
I want to generate the analytics of the number of records by each day between the stand and end date.
The query used is
start_date = '2021-9-1'
end_date = '2021-9-30'
query = Tracking.objects.filter(
scan_time__date__gte=start_date,
scan_time__date__lte=end_date
)
query.annotate(
scanned_date=TruncDate('scan_time')
).order_by(
'scanned_date'
).values('scanned_date').annotate(
**{'total': Count('created')}
)
Which produces output as
[{'scanned_date': datetime.date(2021, 9, 24), 'total': 5}, {'scanned_date': datetime.date(2021, 9, 26), 'total': 3}]
I want to fill the missing dates with 0, so that the output should be
2021-9-1: 0
2021-9-2: 0
...
2021-9-24: 5
2021-9-25: 0
2021-9-26: 3
...
2021-9-30: 0
How I can achieve this using either ORM or python (ie., pandas, etc.)?
Use DataFrame.reindex by date range created by date_range with DatetimeIndex by DataFrame.set_index:
data = [{'scanned_date': datetime.date(2021, 9, 24), 'total': 5},
{'scanned_date': datetime.date(2021, 9, 26), 'total': 3}]
start_date = '2021-9-1'
end_date = '2021-9-30'
r = pd.date_range(start_date, end_date, name='scanned_date')
#if necessary convert to dates from datetimes
#r = pd.date_range(start_date, end_date, name='scanned_date').date
df = pd.DataFrame(data).set_index('scanned_date').reindex(r, fill_value=0).reset_index()
print (df)
scanned_date total
0 2021-09-01 0
1 2021-09-02 0
2 2021-09-03 0
3 2021-09-04 0
4 2021-09-05 0
5 2021-09-06 0
6 2021-09-07 0
7 2021-09-08 0
8 2021-09-09 0
9 2021-09-10 0
10 2021-09-11 0
11 2021-09-12 0
12 2021-09-13 0
13 2021-09-14 0
14 2021-09-15 0
15 2021-09-16 0
16 2021-09-17 0
17 2021-09-18 0
18 2021-09-19 0
19 2021-09-20 0
20 2021-09-21 0
21 2021-09-22 0
22 2021-09-23 0
23 2021-09-24 5
24 2021-09-25 0
25 2021-09-26 3
26 2021-09-27 0
27 2021-09-28 0
28 2021-09-29 0
29 2021-09-30 0
Or use left join by another DataFrame create from range with replace misisng values to 0:
r = pd.date_range(start_date, end_date, name='scanned_date').date
df = pd.DataFrame({'scanned_date':r}).merge(pd.DataFrame(data), how='left', on='scanned_date').fillna(0)

Replace spaces with first character of the line on each line [duplicate]

This question already has answers here:
Can Vim's substitute command handle recursive pattern as sed's "t labe"?
(2 answers)
Closed 2 years ago.
I have data that looks like this:
1, 100 200 3030 400 50023
2, 30 444 44334 441 123332
3, 100 200 3030 400 50023
I need to turn it into this:
1, 100
1, 200
1, 3030
1, 400
1, 50023
2, 30
2, 444
2, 44334
2, 441
2, 123332
etc.
I was able to do it with a vim macro but the data is far too. I was hoping something like awk could do it. But I am not really familiar with it.
Any help would be apperciated.
$ cat input
1, 100 200 3030 400 50023
2, 30 444 44334 441 123332
3, 100 200 3030 400 50023
$ awk '{for(i=2;i<=NF;i++) printf "%s %s\n", $1, $i}' input
1, 100
1, 200
1, 3030
1, 400
1, 50023
2, 30
2, 444
2, 44334
2, 441
2, 123332
3, 100
3, 200
3, 3030
3, 400
3, 50023
awk -F',' '{split($2,a," "); for (i in a) print $1, "," , a[i]}'
explanation:
awk -F',' -- Set field seprator as ,
'{split($2,a," "); -- Split column 2 using " "(space) as delimiter and populate array a
for (i in a) print $1, "," , a[i]} -- Loop to access all element of array'
Demo :
renegade#Renegade:~$ cat test.txt
1, 100 200 3030 400 50023
2, 30 444 44334 441 123332
3, 100 200 3030 400 50023
renegade#Renegade:~$ awk -F',' '{split($2,a," "); for (i in a) print $1, "," , a[i]}' test.txt
1 , 100
1 , 200
1 , 3030
1 , 400
1 , 50023
2 , 30
2 , 444
2 , 44334
2 , 441
2 , 123332
3 , 100
3 , 200
3 , 3030
3 , 400
3 , 50023
renegade#Renegade:~$

Extract specific rows from SAS dataset based on a particular cell value of a variable

I want to extract specific set of rows from a large SAS dataset based on a particular cell value of a variable into a new dataset. In this dataset, I have 6 variables. Following is an example of this dataset:
Variable names: Var1 Var2 Var3 Var4 Var5 Var6
Row 1 A 1 2 3 4 5
Row 2 B 1 2 3 4 5
Row 3 A 1 2 3 4 5
Row 4 B 1 2 3 4 5
Row 5 Sample 1 2 3 4 5
Row 6 A 1 2 3 4 5
Row 7 B 1 2 3 4 5
Row 8 A 1 2 3 4 5
Row 9 B 1 2 3 4 5
Row 10 A 1 2 3 4 5
Row 11 B 1 2 3 4 5
Row 12 A 1 2 3 4 5
Row 13 B 1 2 3 4 5
From this dataset, I want to select a set of next 8 rows starting from a row in which Var 1 has a value = "Sample". I want to extract multiple such sets of 8 rows from this dataset into a new dataset. Can someone please guide me how I can accomplish this in SAS?
Thank you
Would the output statement work for you?
data have;
infile datalines dsd dlm=",";
input Variable_names : $char10.
Var1 : $char10.
Var2 : 8.
Var3 : 8.
Var4 : 8.
Var5 : 8.
Var6 : 8.;
datalines;
Row 1 , A , 1, 2, 3, 4, 5
Row 2 , B , 1, 2, 3, 4, 5
Row 3 , A , 1, 2, 3, 4, 5
Row 4 , B , 1, 2, 3, 4, 5
Row 5 , Sample, 1, 2, 3, 4, 5
Row 6 , A , 1, 2, 3, 4, 5
Row 7 , B , 1, 2, 3, 4, 5
Row 8 , A , 1, 2, 3, 4, 5
Row 9 , B , 1, 2, 3, 4, 5
Row 10, A , 1, 2, 3, 4, 5
Row 11, B , 1, 2, 3, 4, 5
Row 12, A , 1, 2, 3, 4, 5
Row 13, B , 1, 2, 3, 4, 5
;
run;
data want_without
want_with;
set have;
if strip(Var1) = "Sample" then output want_with;
else output want_without;
run;
One way to do this is to set a counter to 8 whenever the previous record has var1="Sample", and then decrement the counter for each record. And only output records where counter is >= 1.
data want ;
set have ;
if lag(var1) = "Sample" then counter = 8 ;
else counter+(-1) ; *counter is implicitly retained ;
if counter>=1 then output ;
* drop counter ;
run ;
You can set a counter and output as desired, use the RETAIN coupled with an IF (& OUTPUT) statement. You may need to tweak the IF condition but I think you get the idea here.
data want;
set have;
retain counter 10;
if strip(Var1) = "Sample" then counter=1;
else counter+1;
if 2<=counter<=9 then OUTPUT;
*if 2<=counter<=9; *this is the same as above, but less code;
run;

awk remove characters after number

I'm writing a AWK script to clean up a data stream so it's usable for analysis. Right now, I have the following issue.
I have a data stream that looks like this:
56, 2
64, 3
72, 0
80, -1-
88, -3--
96, 1
04, -2-
12, -7----
20, -1-
28, 7
36, 1
44, -3--
52, 3
60, 0
68, 0
76, -3--
84, -5---
92, 1
00, 4
08, 3
16, -2-
24, -3--
32, 1
40, 3
And I want to remove any dash that occurs after a number character, keep the minus in front of the numbers, so it would look like this:
56, 2
64, 3
72, 0
80, -1
88, -3
96, 1
04, -2
12, -7
20, -1
28, 7
36, 1
44, -3
52, 3
60, 0
68, 0
76, -3
84, -5
92, 1
00, 4
08, 3
16, -2
24, -3
32, 1
40, 3
I know how to do this with sed (sed 's/-*$//'), but how could this be done with only awk so i can use it in my script?
Cheers
One way, simply using sub():
awk '{ sub(/-+$/, "", $NF); print }' infile
It yields:
56, 2
64, 3
72, 0
80, -1
88, -3
96, 1
04, -2
12, -7
20, -1
28, 7
36, 1
44, -3
52, 3
60, 0
68, 0
76, -3
84, -5
92, 1
00, 4
08, 3
16, -2
24, -3
32, 1
40, 3
Using awk:
awk -F '-+$' '{$1=$1} 1' file
Using sed:
sed -i.bak 's/-*$//' file
Another possible solution :
awk -F "-+$" '{str=""; for(i=1; i<=NF; i++){str=str""$i} print str}' file
But, I think that sed is a better solution in this case.
Regards,
Idriss