Append data or Replace based on column values in SAS - sas

I'm trying to save the results from a forecast(dataset) into a historical dataset in SAS servers.
I already have the path of the historical dataset but what I'm trying to do here is to append the results if they don't exist or replace them if they already exist on the historical dataset.
Below is how the table that I want to append/replace looks:
:Agency
:Forecast_Week
:Date Fc
:SubAgency
:Value
New
12/26/22
12/27/22
One
3243262
New
12/26/22
12/28/22
One
3242355
New
12/26/22
12/29/22
Two
3225142
New
12/26/22
12/30/22
Two
3234235
So, if the records for the Agency, Forecast Week, Date, SubAgency already exists I want to replace them with the new values but if they don't exist in the historical dataset I want to append them.
Do you know how I can do this?

I did something very similar not that long ago:
proc sql;
create table temp as
select *
from table_old
where forecast_week NOT IN(select forecast_week from table_new)
;
quit;
/* Append the updated and new values */
data table_old;
set temp
table_new
;
run;
I hope this helps
I asked a similar question in this post maybe you can get some inspiration there:
SAS EG append new data and overwrite already existing rows

Related

adding a meta user to a meta group in sas

I've around 600 meta users in SAS EGRC 6.1 in the platform in SAS 9.4.
I want to add those users to a meta-group. for this, I'm using code below
libname current '/tmp/temp1'; /* for the current metadata */
libname addgrps '/tmp/temp2'; /* current augmented with the changes */
libname updates '/tmp/temp3'; /* for the updates created by the mducmp macro */
options metaserver=abc
metaport=8561
metauser='sasadm#saspw'
metapass='xyz123'
metaprotocol=bridge
metarepository=foundation;
%mduextr(libref=current);
proc copy in = current out = addgrps;
/*copy the current data to the directory for the target */
run;
data investigators_1;
set current.person;
where name in ('5806036');
rename objid = memkeyid;
keep objid;
run;
data investigator_group_1;
set current.group_info;
where name='Enterprise GRC: Incident Investigation';
rename id = grpkeyid;
keep id;
run;
proc sql;
create table grpmems as
select * from investigators_1, investigator_group_1;
quit;
proc append base = addgrps.grpmems data = grpmems;
run;
/* use the mducmp macro to create the updates data */
%mducmp(master=addgrps, target=current, change=updates)
/* validate the change data sets */
%mduchgv(change=updates, target=current, temp=work, errorsds=work.mduchgverrors)
/* apply the updates */
%mduchgl(change=updates);
for the final updated I tried both %mduchgl and %mduchglb but with both, I'm not able to get the desired results. I test it with one user.
with %mduchgl I get the below error
The symbolic reference for A52PDIUF.$A52PDIUF.AP0000NI did not resolve.
with %mduchglb I get the below error
The object reference to Person was requested without an identifier.
Errors returned from Proc Metadata prevented objects from being Added, Updated, or Deleted. Table: work.mduchglb_failedobjs
identifies 1 such objects. Consult the SAS Log for the specific Metadata Server errors returned.
Any suggestions that how can I resolve the error or another approach that I should try to achieve this.
Thanks.
I don't think you should ever modify those datasets! Everything you need to achieve should be possible using proc metadata (or data step functions as last resort).
Here is a relevant SAS Communities thread. To summarise - the following snippet will add a group to a user, so long as you have the group / user URI:
<Person Id="A5NUQPXO.AP00002V">
<IdentityGroups>
<IdentityGroup ObjRef="A5NUQPXO.A500001C" />
</IdentityGroups>
</Person>
UPDATE - for completeness, I turned this into a macro, described here
https://core.sasjs.io/mm__adduser2group_8sas.html

How to update redshift column: simple text replacement

I have a large target table with columns (id, value). I want to update value='old' to value='new'.
The simplest way would be to UPDATE target SET value='new' WHERE value='old';
However, this deletes and creates new rows and is not recommended, possibly. So I tried to do a merge column update:
# staging
CREATE TABLE stage (LIKE target INCLUDING DEFAULTS);
INSERT INTO stage (SELECT id, value FROM target WHERE value=`old`);
UPDATE stage SET value='new' WHERE value='old'; # ??? how do you update value?
# merge
begin transaction;
UPDATE target
SET value = stage.value FROM stage
WHERE target.id = stage.id and target.distkey = stage.distkey; # collocated join?
end transaction;
DROP TABLE stage;
This can't be the best way of creating the table stage: I have to do all these UPDATE delete/writes when I update this way. Is there a way to do it in the INSERT?
Is it necessary to force the collocated join when I use CREATE TABLE LIKE?
Are you updating all the rows in the table?
If yes you can use CTAS (create table as) which is recommended method
Assuming you table looks like this
table1
id, col1,col2, value
You can use the following SQL to create a new table
CREATE TABLE tmp_table AS
SELECT id, col1,col2, 'new_value'
FROM table1;
After you verify data in tmp_table
DROP TABLE table1;
ALTER TABLE tmp_table RENAME TO table1;
If you are not updating all the rows you can use a filter to do a CTAS and insert the rest of the rows to the new table, let me know if you need more info if this is the case
CREATE TABLE tmp_table AS
SELECT id, col1,col2, 'new_value'
FROM table1
WHERE value = 'old'
INSERT INTO tmp_table SELECT * from table1;
Next step would be DROP the tmp table and rename table1
Update: Based on your comment you can do the following, let me know if this solves your case.
This method basically creates a new table to replace your existing table.
I have used some of your code
CREATE TABLE stage (LIKE target INCLUDING DEFAULTS);
INSERT INTO stage SELECT id, 'new' FROM target WHERE value=`old`;
Above INSERT inserts rows to be updated with 'new', no need to run an UPDATE after this.
Bring unchanged rows
INSERT INTO stage SELECT id, value FROM target WHERE value!=`old`;
After this point you have target table which is your original table intact
stage table will have both sets of rows, updated rows with 'new' value and rows you did not want to change
To replace your target with stage
DROP TABLE target;
or to keep it further verification
ALTER TABLE target RENAME TO target_old;
ALTER TABLE stage RENAME TO target;
From a redshift developer:
This case doesn't require an upsert, or update+insert, and it is fine to just run the update:
UPDATE target SET value='new' WHERE value='old';
Another way would be to INSERT the rows you need and DELETE the other rows, but that's unnecessarily complicated.

SAS set statement using colon and creating a filename variable

So using SAS, I have a number of SAS monthend datasets named as follows:
mydata_201501
mydata_201602
mydata_201603
mydata_201604
mydata_201605
...
mydata_201612
Each has account information at particular monthend. I want to stack the datasets all into one dataset using colon rather than writing out the full set statement as follows:
data mynewdata;
set mydata_:;
run;
However there is no datestamp variable within the datasets so when I stack them I will lose the monthend information for each account. I want to know which line refers to which monthend for each account. Is there a way I can automatically create a variable that names the table the row come from. for example the long winded way would be this:
data mynewdata;
set mydata_201501 (in=a) mydata_201502 (in=b) mydata_201503 (in=c)...;
if a then tablename = 'mydata_201501';
if b then tablename = 'mydata_201502';
if c...
run;
but is there a quicker way using colon along these lines?
data mynewdata;
set mydata_:;
tablename = _tablelabel_;
run;
thanks
I always find clicking on comment links annoying, so hopefully here's the answer in your context. Use the INDSNAME= SET statement option to assign the dataset name to a variable:
data mynewdata;
set mydata_: indsname=_tablelabel_;
tablename = _tablelabel_;
run;
N.B. you can call _tablelabel_ whatever you want, and you may wish to change it so it doesn't look like a SAS generated variable name.
INDSNAME= only became a SAS SET statement option in version 9.2
Just to be clear, with my particular code, where the datasets were named mydata_yyyymm and I wanted a monthend variable with datestamp, I was able to produce this using the solution provided by mjsqu as follows (obs and keep statement provided if required):
data mynewdata;
set mydata_: (obs=100 keep=xxx xxx) indsname=_tablelabel_;
format monthend yymmdd10.;
monthend = input(scan(_tablelabel_,-1,'_'),yymmn6.);
run;

Assigning index to two concatenated tables in SAS?

I have two table with exactly the same column headers and one row each. I have the code to concatenate them which works fine.
data concatenation;
set CURR_CURR CURR_30;
run;
However, there is no index in the output to say which row corresponds to which table.
I've tried using 'create index' and 'index create' already but they don't work syntactically. Simply I'd just want to add a column of strings and move it to the front of all the other columns in the data set.
INDSNAME option on the SET statement + variable to store the information.
If you set the length statement ahead of your SET statement it will create it as the first column.
Just a note that this isn't the same as an 'index'. An index in SAS has a different meaning which isn't what you're trying to create here.
data concatenation;
length dset source $50.;
set CURR_CURR CURR_30 indsname=source;
dset=source;
run;
Reeza's answer is very similar to something I figured out that worked as well. Here's my version as an alternative.
data concatenation;
length id $ 10;
set CURR_CURR (in=a) CURR_30 (in=b);
if a then id = 'curr_curr';
else if b then id = 'curr_30';
run;

Search through the data with a loop or nested loops in SAS

I am rather a beginner in SAS. I have the following problem. Given is a big data set (my_time) which I imported into SAS looking as follows
I want to implement the following algorithm
for every account look for a status and if it is equal to na then look for the same contract after one year (one year after it gets the status na) and put the information "my_date", "status" and "money" in three new columns "new_my_date", "new_status" and "new_money" like in
I need something like countifs in excel. I found loops in SAS like DO but not for the purpose to look through all rows.
I do not even know for which key word I have to look.
I would be grateful for any hint.
A simple method would be by sorting, then exploiting the special variable prefix first. and retain statement to get the desired result.
Step 1: Sort by account, date, and status
proc sort data=have;
by account my_date status;
run;
This will guarantee that your data is in the order that you need. Since we are looking only for year+1 after the status = 'na', anything that happens in-between that doesn't matter.
Step 2: Use a data step to remember the first year when na happens for that account
data want;
set have;
by account my_date status;
retain first_na_year first_na_account;
if(first.account) then call missing(first_na_year,first_na_account);
if(status IN('na', 'tna') ) then do;
first_na_year = year;
first_na_month = month;
first_na_account = account;
end;
if( year = first_na_year+1
AND first_na_month = month
AND account = first_na_account)
AND status NOT IN('na', 'tna') )
then do;
new_status = status ;
new_my_date = my_date;
new_money = money;
end;
if(cmiss(new_status, new_my_date, new_money) ) = 0;
drop first:;
run;
For each row, we compare three things:
Is the status not 'na'?
Is the year 1 year bigger than the last time it was 'na'?
Is this the same account we're comparing?
If all are true, then we want to create the three new variables.
What's happening:
SAS is inherently a looping language, so we do not need to use a do loop here. When SAS goes to a new row, it will clear all variables in the Program Data Vector (PDV) in preparation for filling them in with the new values in the row.
Since SAS the SAS data step only goes forwards and doesn't like to go backwards, we want it to remember the first time that na occurs for that account. retain tells SAS not to discard the value of a variable when it reads a new row.
When we are done doing our comparison and we've moved onto the next account, we reset these variables to missing. by group processing allows SAS to know exactly where the first and last occurrence of the account is in the dataset.
At the end, we output only if all 3 of the new variables are not missing. cmiss counts how many variables are not missing. Note that output is always implied before the run statement, so we simply need to use an "if without then" in this case.
The final statement, drop first:;, is a simple shortcut to remove any variables that start with the phrase first. This prevents them from being shown in the final dataset.