Adding label to SAS variables - sas

I have a dataset in SAS called "Flight" and i want to label all the rows of one particular column "Carrier" with values which matches
("Flght_carrier_code") from another dataset
called "Airlines".
Please suggest some method.
Sample Data set 1 - "Flight"
date carrier flight tailnum air_time
01-01-2013 UA 1714 N24211 227
01-01-2013 AA 1141 N619AA 160
01-01-2013 B6 725 N804JB 183
01-01-2013 DL 461 N668DN 116
01-01-2013 UA 1696 N39463 150
01-01-2013 B6 507 N516JB 158
01-01-2013 EV 5708 N829AS 53
01-01-2013 B6 79 N593JB 140
01-01-2013 AA 301 N3ALAA 138
01-01-2013 B6 49 N793JB 149
01-01-2013 B6 71 N657JB 158
Sample Data set 2 - "Airlines"
Flght_carrier_Code name
9E Endeavor Air Inc.
AA American Airlines Inc.
AS Alaska Airlines Inc.
B6 JetBlue Airways
DL Delta Air Lines Inc.
EV ExpressJet Airlines Inc.
F9 Frontier Airlines Inc.
FL AirTran Airways Corporation
HA Hawaiian Airlines Inc.
MQ Envoy Air
OO SkyWest Airlines Inc.
UA United Air Lines Inc.
US US Airways Inc.
VX Virgin America
WN Southwest Airlines Co.
YV Mesa Airlines Inc.

The labelling you describe can be considered a row labeling, but more common terminology is:
value mapping
formatting
left join
merge
lookup
Note: SAS format is like an automatic within system left-join.
SQL
You tagged proc-sql so one approach is a left-join which would retain rows and carrier values that do not have a match. You might also want an sql view to prevent creating a newer larger table
proc sql;
create view work.flights_v as
select
coalesce(airlines.name, flights.carrier) as carrier_name
, flights.*
from
flights
left join
airlines
on
flights.carrier = airlines.Flght_carrier_Code
;
FORMATS
Custom formats are typically involved with a variables data presentation level processing at viewing and output rendering time -- for example: EG grid, ViewTable, a procedures output. A custom format can be created from a data set such as airlines. Custom formats can permanent (persist after a SAS session ends) or temporary (exist only during a SAS session). Read your documentation about Proc FORMAT CNTLIN= if you want to try that way.

I would just add label to that code above
proc sql;
create view work.flights_v as
select
coalesce(airlines.name, flights.carrier) as carrier_name label="carrier"
, flights.*
from
flights
left join
airlines
on
flights.carrier = airlines.Flght_carrier_Code
;

Related

How to use regex capturing-group in custom US Address information in Office365

I'm trying to create a custom U.S Address classification label in Azure Information Protection to match possible U.S Addresses
Regex (it works Java8 - e.g. https://regex101.com/):
^(\d+) ?(\w)? (.*?) ?((?<= )avenue|ave|court|ct|street|st|drive|dr|lane|ln|road|rd|blvd|plaza|parkway|pkwy)? ?((?<= )\d*)?$
But when I try to set this code in Azure Information Protection I receive the error message below:
You cannot configure a pattern with groups or multiple match conditions like (.*, .+, .{0,n} or .{1,n}). Remove the group or the multiple match condition from the pattern to continue.
Is there a way to circumvent this situation? Is it possible to reach the same result in another way?
Sample Data to test:
66-4 Parkhurst Rd, Chelmsford MA 1824
591 Memorial Dr, Chicopee MA 1020
55 Brooksby Village Way, Danvers MA 1923
137 Teaticket Hwy, East Falmouth MA 2536
42 Fairhaven Commons Way, Fairhaven MA 2719
374 William S Canning Blvd, Fall River MA 2721
121 Worcester Rd, Framingham MA 1701
677 Timpany Blvd, Gardner MA 1440
337 Russell St, Hadley MA 1035
295 Plymouth Street, Halifax MA 2338
1775 Washington St, Hanover MA 2339

Replace the word blank with 0 in card visualisation Power BI

I have my data say table "info" -
sl.no
zone
type
cost
01
east
typeA
223288
02
east
typeB
8897
03
east
typeC
2219
04
east
typeD
7628
05
north
typeB
10900
06
north
typeC
5998
In this data there are two zones east and north, and 4 types - A,B,C & D. Type A & D dosnt have north zone . In the visualisation(card), if I select north zone with type A and D their cost shows blank(according to the data which is correct). so what I want is where the zone is missing i want their cost to show 0 rather than showing written "blank". If its possible to do so please help me to get it.
Use this to measure and change it with your appropriate value in your table:
Measure = IF(ISBLANK(MAX([type])),"0",MAX([type]))

Creating Columns From Stacked Data

Piggy backing on a similar question I asked
(Summing a Column By Group In a Dataset With Macros)...
I have the following dataset:
Month Cost_Center Account Actual Annual_Budget
May 53410 Postage 23 134
May 53420 Postage 7 238
May 53430 Postage 98 743
May 53440 Postage 0 417
May 53710 Postage 102 562
May 53410 Phone 63 137
May 53420 Phone 103 909
May 53430 Phone 90 763
June 53410 Postage 13 134
June 53420 Postage 0 238
June 53430 Postage 48 743
June 53440 Postage 0 417
June 53710 Postage 92 562
June 53410 Phone 73 137
June 53420 Phone 103 909
June 53430 Phone 90 763
I would like to "splice" it so each month has its own respective column for Actual while summing the numeric values by Account.
So for example, I want the output to look like the following:
Account May_Actual_Sum June_Actual_Sum Annual_Budget
Postage 14562 37960 255251
Phone 4564 2660 32241
The code below provided by a fellow user works great when not needing to further dis-aggregated by month; however, I'm not sure if it's possible to do so (I tired adding a 'by month clause' - didn't work).
proc means data=Test N SUM NWAY STACKODS;
class Account_Description;
var Actual annual_budget;
by month;
ods output summary = summary_stats1;
output out = summary_stats2 N = SUM= / AUTONAME;
data want;
set summary_stats2;
run;
Use PROC MEANS to get summaries - same as last time. Please read up the documentation on PROC MEANS to understand how the CLASS statements works and how you can control the different levels of output.
Use PROC TRANSPOSE to flip the data wide. Since the budget amount is consistent across rows you'll be fine.
I'm guessing your next set of question will then be how to sort the columns correctly because your months won't sort and how to reference them dynamically to calculate the month to date changes. Which are some of the reasons why this data structure is not recommended.
data have;
input Month $ Cost_Center $ Account $ Actual Annual_Budget;
cards;
May 53410 Postage 23 134
May 53420 Postage 7 238
May 53430 Postage 98 743
May 53440 Postage 0 417
May 53710 Postage 102 562
May 53410 Phone 63 137
May 53420 Phone 103 909
May 53430 Phone 90 763
June 53410 Postage 13 134
June 53420 Postage 0 238
June 53430 Postage 48 743
June 53440 Postage 0 417
June 53710 Postage 92 562
June 53410 Phone 73 137
June 53420 Phone 103 909
June 53430 Phone 90 763
;
;
;;
run;
*summarize;
proc means data=have noprint nway;
class account month;
var actual annual_budget;
output out=temp sum=actual_total budget_total;
run;
*transpose;
proc transpose data=temp out=want prefix=Month_;
by account budget_total;
var actual_total;
id month;
run;
Output:
I cannot think of a way to generate this report using just one PROC. You will need to do some post processing of PROC MEANS or PROC SUMMARY results to get to this:
proc means data=have SUM ;
class Account month;
var Actual annual_budget;
output out = summary_stats SUM=;
run;
/* Look at summary_stats to understand it's structure here */
/* Otherwise you will not understand the following code */
proc sort data = summary_stats;
where _type_ in (2,3);
by account;
run;
data want;
set summary_stats;
by account ;
retain May_Actual_Sum June_Actual_Sum Annual_Budget_sum;
if first.account then Annual_Budget_sum = Annual_Budget;
else do;
select(month);
when ('May') May_Actual_Sum = actual;
when ('June') June_Actual_Sum = actual;
/* List other months also here. Can use some macros here to make the code compact and expandable for future enhancements */
end;
end;
if last.account then output;
keep account May_Actual_Sum June_Actual_Sum Annual_Budget_sum;
run;

SAS using Datalines - "observation read not used"

I am a complete newb to SAS and I only know is basic sql. Currently taking Regression class and having trouble with SAS code.
I am trying to input two columns of data where x variable is State; y variable is # of accidents for a simple regression.
I keep getting this:
ERROR: No valid observations are found.
Number of Observations Read 51
Number of Observations Used 0
Number of Observations with Missing Values 51
Is it because datalines only read numbers and not charcters?
Here is the code as well as the datalines:
Data Firearm_Accidents_1999_to_2014;
ods graphics on;
Input State Sum_OF_Deaths;
Datalines;
Alabama 526
Alaska 0
Arizona 150
Arkansas 246
California 834
Colorado 33
Connecticut 0
Delaware 0
District_of_Columbia 0
Florida 350
Georgia 413
Hawaii 0
Idaho 0
Illinois 287
Indiana 288
Iowa 0
Kansas 44
Kentucky 384
Louisiana 562
Maine 0
Maryland 21
Massachusetts 27
Michigan 168
Minnesota 0
Mississippi 332
Missouri 320
Montana 0
Nebraska 0
Nevada 0
New_Hampshire 0
New_Jersey 85
New_Mexico 49
New_York 218
North_Carolina 437
North_Dakota 0
Ohio 306
Oklahoma 227
Oregon 41
Pennsylvania 465
Rhode_Island 0
South_Carolina 324
South_Dakota 0
Tennessee 603
Texas 876
Utah 0
Vermont 0
Virginia 203
Washington 45
West_Virginia 136
Wisconsin 64
Wyoming 0
;
run; proc print;
proc reg data = Firearm_Accidents_1999_to_2014;
model State = Sum_OF_Deaths;
ods graphics off;
run; quit;
OK, some different levels of issues here.
ODS GRAPHICS go before and after procs, not inside them.
When reading a character variable you need to tell SAS using an informat.
This allows you to read in the data. However your regression has several issues. For one, State is a character variable and you can do regression with a character variable. I think that issue is beyond this forum. Review your regression basics and check what you're trying to do.
Data Firearm_Accidents_1999_to_2014;
informat state $32.;
Input State Sum_OF_Deaths;
Datalines;
Alabama 526
Alaska 0
Arizona 150
Arkansas 246
California 834
Colorado 33
....
;
run;

How to use proc transpose on variables that contain numbers separated with _?

Hi I am new to sas I have a question regarding proc transpose
I have this data
Input
School Name State School Code 26/07/2009 02/08/2009 09/08/2009 16/08/2009
Northwest High IL 14556 06 06 06 06
Georgia High GA 147 05 05 05 06
Macy Hgh TX 45456 NA NA NA NA
The desired output is
School Name State School Code Date Absent
Northwest High IL 14566 26/07/2009 6
Northwest High IL 14556 02/08/2009 6
Northwest High IL 14556 09/08/2009 6
Northwest High IL 14556 16/08/2009 6
Georgia High GA 147 26/07/2009 5
Georgia High GA 147 02/08/2009 5
Georgia High GA 147 09/08/2009 5
Georgia High GA 147 16/08/2009 6
Macy Hgh TX 45456 26/07/2009 NA
Macy Hgh TX 45456 02/08/2009 NA
Macy Hgh TX 45456 09/08/2009 NA
Macy Hgh TX 45456 16/08/2009 NA
This is the code I have written
proc sort data=work.input;
by School_Name State School_Code;
run;
proc transpose data=work.input out=work.inputModified;
by by School_Name State School_Code;
run
I get this error saying that No variables to transpose I think the issue is since the variables are actual numbers like this _26_07_2009 sas does not recognize them,
But I don't get the desired output the dates are actual variables when imported into sas they become _26_07_2009. Note there are about 185 dates and they are actual variables.
Thanks
The following transpose does the job:
proc transpose data=work.input out=work.inputModified;
by School_Name State School_Code;
var _:;
run;
Notice the _: notation - it picks up all variables which start with an underscore and transposes them.
As I mentioned in the link in my comments earlier, if you do not explicitly specify the variables you want to tranpose- then proc transpose by default looks for numeric variables that are not in the by variable list to transpose. However, since your date variables are read-in as strings [due to the presence of NAs] it was saying NOTE: No variables to transpose.
You can use the following to convert the date and absent columns into numeric columns.
data inputModified2;
set inputModified;
format date date9.;
date = input(compress(tranwrd(_name_,'_','')), ddmmyy8.);
if col1 NE 'NA' then absent = input(col1, 8.);
else absent=.;
drop _name_ col1;
run;