Anyone knows why the response to this question is A.
Given the SAS data sets ONE and TWO:
The following SAS program is submitted:
Proc sql;
Select two.*,budget from one <insert JOIN operator here> two on one.year=two.year,
Quit;
The following output is desired:
Which JOIN operator completes the program and generates the desired output?
A. FULL JOIN
B. INNER JOIN
C. LEFT JOIN
D. RIGHT JOIN
Answer: A
Thanks for your time and inputs
You can answer this without seeing the data. The reason to chose FULL JOIN is when you want observations for values of YEAR that only appear in one of the datasets.
Related
I am new to SAS and I have the following problem:
When trying to join records I just imported (in one table) with records I have stored in another table.
What happens is that I am going to run the code in SAS daily, and I need the table that I am going to create today (17/05/2021) by importing a file 'X', to join the table that I created yesterday (16/05/2021) by importing a file 'Y'.
And so the code will be executed tomorrow, the next day and so on.
In conclusion the records will accumulate as the days go by.
To tackle this problem, I am first creating two variables, one with the date of the day the code will be executed and the other with the date of the last execution.
%let daily_date = 20210423; /*AAAAMMDD*/
%let last_execution_date = 20210422; /*AAAAMMDD*/
Then the import of a file is done, we can see that the name of this created table has the date of the day in which the code is being executed.
data InputAC.RA_ratings&daily_date;
infile "&ruta_InputRA." FIRSTOBS=2
dsd lrecl=4096 truncover;
input
#1 RA_Customer_ID $10.
#11 Rating_ID 10.
#21 ISRM_Model_Overlay_ID $10.
#31 Constant_ID 10.
#41 Value $100.
;
run;
proc sort data=inputac.RA_ratings&daily_date;
by RA_Customer_ID Rating_ID;
quit;
Finally the union of InputAC.RA_ratings&daily_date with InputAC.RA_ratings&last_execution_date is made. ('InputAC.RA_ratings&last_execution_date' should be the table that was imported at an earlier date than today.)
data InputAC.RA_ratings&fec_diario;
merge
InputAC.RA_ratings&fec_diario
InputAC.RA_ratings&ultima_fecha_de_ejecucion;
by RA_Customer_ID Rating_ID;
run;
This is how the tables are being stored on the server.
(Ignore date 20210413, let's imagine it is 20210422)
However, I have to perform this task without using the variable 'last_execution_date'.
I've been researching but I still can't find any SAS function that can help me with this problem.
I hope someone can help me, thank you very much in advance.
This is a pretty complex and interesting question from an operations point of view. The answer depends on a few things.
How much control do you have over the execution of this process?
Is "yesterday" guaranteed, or does the process need to work if "last execution date" is not yesterday?
What should happen if the process is run twice today?
The best practices way to solve this is to have a dataset (or table) that stores the last execution date. That allows you to handle #2 trivially, and the answer to #3 might guide exactly how you store this but is easily handled anyway.
Say for example you have a table, MetaAC.LastExecDate (or, in spanish, MetaAC.UltimaFecha or similar). It could store things this way:
data LastExecDate;
timestamp = datetime();
execdate = input(&daily_date,yymmdd8.);
run;
proc append base=MetaAC.LastExecDate data=LastExecDate;
run;
This lets you store an arbitrary execdate even if it's not today, and also store when you ran it (for audit purposes), and you could even add who ran it if that's interesting (there is a macro variable &sysuserid or similar). Then put all this at the bottom of your process, and it updates as you go.
Then, you can pull out from this the exact info you want - for example,
proc sql;
select max(execdate)
into :last_exec_date
from MetaAC.LastExecDate
where execdate ne today()
;
quit;
Now, if you don't have control over this for some reason, you could determine this in a different way. Again, the exact process depends on your circumstances and your answers to 2 and 3.
If your answer to 2 is you always want it to be yesterday, then this is really easy - just do this:
%let daily_date=20210517;
%let last_execution_date = %sysfunc(putn(%sysevalf(%sysfunc(inputn(&daily_date,yymmdd8.))-1),yymmddn8.));
%put &=last_execution_date;
The two %sysfuncs just do the input/put from SAS datastep inside the macro language, and %sysevalf lets you do math.
If you don't want it to always be the prior day (if there are weekends, or other days you don't necessarily want to assume it's the prior day), then your best bet is to either use the dictionary tables to look at what's there and find the largest date prior to your date, or maybe use a x command to look at the folder and do the same thing (might be easier to use OS command than to use SQL for this, sometimes SQL dictionary tables can be slow).
I am working in SAS Enterprise guide and have a one column SAS table that contains unique identifiers (id_list).
I want to filter another SAS table to contain only observations that can be found in id_list.
My code so far is:
proc sql noprint;
CREATE TABLE test AS
SELECT *
FROM data_sample
WHERE id IN id_list
quit;
This code gives me the following errors:
Error 22-322: Syntax error, expecting on of the following: (, SELECT.
What am I doing wrong?
Thanks up front for the help.
You can't just give it the table name. You need to make a subquery that includes what variable you want it to read from ID_LIST.
CREATE TABLE test AS
SELECT *
FROM data_sample
WHERE id IN (select id from id_list)
;
You could use a join in proc sql but probably simpler to use a merge in a data step with an in= statement.
data want;
merge oneColData(in = A) otherData(in = B);
by id_list;
if A;
run;
You merge the two datasets together, and then using if A you only take the ID's that appear in the single column dataset. For this to work you have to merge on id_list which must be in both datasets, and both datasets must be sorted by id_list.
The problem with using a Data Step instead of a PROC SQL is that for the Data step the Data-set must be sorted on the variable used for the merge. If this is not yet the case, the complete Data-set must be sorted first.
If I have a very large SAS Data-set, which is not sorted on the variable to be merged, I have to sort it first (which can take quite some time). If I use the subquery in PROC SQL, I can read the Data-set selectively, so no sort is needed.
My bet is that PROC SQL is much faster for large Data-sets from which you want only a small subset.
I have been working on a two step process in SAS EG that creates a temporary table (connected to Netezza) then uses that table to build the final summary table. When creating the final table, I am trying to have two calculated columns that represent averages of all sales quotes and only customers that accepted the quote (i.e. the expectation is that the customers who accept have a lower average quote than all inquiries combined). In order to segment customers that accepted the quoted offer versus those that did not, I have attempted to use Sum(Case When...) and Sum with a Boolean operator.
I have the following code with the final three sum statements trying to create the same column in different ways (just looking for one that works). The first two attempts return an error saying that it was unable to identify a function that satisfies the given argument types. The final attempt (which is where I began things) does not recognize the syntax, which I feel is correct. The error occurs after the closed parentheses around "END." Any help would be greatly appreciated:
EXECUTE (
CREATE TABLE SAMPLE1 AS
SELECT DATE
,STATE
,SUM(INQRY_CN) AS INQ_COUNT
,(SUM(BOUND_IN)/INQ_COUNT) AS CLOSURE
,SUM(QUOTE_AMOUNT) AS AVG_QTD_AM
**,SUM(QUOTE_AMOUNT*(DAY_OF_QUOTE <> '01JAN1901')) AS AVG_BND_AM
,SUM(QUOTE_AMOUNT*(BOUND_IN=1)) AS AVG_BND_AM
,SUM(CASE WHEN BOUND_IN=1 THEN QUOTE_AMOUNT
ELSE . END) AS AVG_BND_AM**
FROM TEMP_TABLE
GROUP BY 1,2
ORDER BY 1,2
) BY NETEZZA;
DISCONNECT FROM NETEZZA;
QUIT;
This should be very simple, but somehow I confuse myself.
data in_both
missing_name (drop = name);
merge employee (in=in_employee)
hours (in = in_hours);
by ID;
if in_employee and in_hours then output in_both;
else if in_employee and not in_hours then output missing_name;
run;
I have two questions:
(1): For the first statement "missing_name(drop = name)", I understand that, it means keep all the data except the column whose head is name. But keep which data here? What is the input?
(2): I know we can create two datasets within one data step, but that means we should use "data in_both missing_name", instead of "data in_both", right?
Many thanks for your time and attention. I appreciate your help.
(1) The DROP= option refers to dropping variables from the dataset MISSING_NAME. With no drop= or keep= option, all variables that exist in EMPLOYEE or HOURS would be written to MISSING_NAME. You can run PROC CONTENTS on the four datasets to see which variables are included in each.
(2) As written, your code will output two datasets IN_BOTH and MISSING_NAME. As #Tom just commented, your current DATA statement already lists both datasets, because the semicolon ends the statement, not the white space/carriage return.
The DATA statement is determining which datasets will be created by the data step. The dataset options, like the DROP= option in your example, can we used to control which of the variables are written into those datasets.
It is the OUTPUT statement that is deciding which observations will be written. So in your example your IF/THEN/ELSE logic is determining which output statements to execute.
Using your posted code:
data in_both
missing_name (drop = name);
merge employee (in=in_employee)
hours (in = in_hours);
by ID;
run;
Inputs - merge_employee & hours
Outputs - in_both & missing_name
In this example the output missing_name has the column NAME dropped.
The best way to view what's going on if the line breaks are confusing is to look for the semi-colon. At first glance I got a little confused too!
First off, I know pretty much nothing about SAS and I am not a programmer but an accountant, but here it goes:
I am trying to compare two data sets to identify differences between them, so I am using the 'proc compare' command as follows:
proc compare data=table1 compare=table2
criterion=.01;
run;
This works fine, but it compares line by line and in order, so if table2 is missing a row half way through, then all entries after that row will be returned as not equal.
How do I ask the comparison to be made based on a variable so that the proc compare finds the value associated with variable X in table 1, and then makes sure that the same variable X in table 2 has the same value?
The ID statement in PROC COMPARE is used to match rows. This code may work for you:
proc compare data=table1 compare=table2 criterion=.01;
id X;
run;
You may need to use PROC SORT to sort the data by X before doing the PROC COMPARE. Refer to the PROC COMPARE documentation for details on the ID statement to determine if you should sort or not.
Here is a link to the PROC COMPARE documentation:
http://support.sas.com/documentation/cdl/en/proc/61895/HTML/default/a000057814.htm