I have two .csv files which i need to merge in sas.
The first file contains the data something like this :
column - products
3 - sales
3- sales
to more than 8 rows and then there are observations such as
00000ETH - sales
00000TRF - sales
The second file has data like this -
Columns - Products
3 - brand
4 - brand
0000ETH - brand
0000TRF - brand
Basically I have to make a new column "Brand" in the first file.
But when I import the first file , SAS makes the first observation as 000000003 while it remains as "3" in the second file. its taking the column as numeric because the first 8 rows are numeric in the first file
I have tried changing "TypeGuess rows" in the windows registry but it has not worked.
Please help !
If they're CSV files, TYPEGUESSROWS is not helpful as that is an excel setting. You can modify GUESSINGROWS directly in the proc import instead.
Try reading it in with data step, however- then you don't need to GUESS anythng.
data one;
length column $10 products $25; *or whatever is right;
infile "yourfile.csv" dlm=',' dsd lrecl=100 truncover; *or similar depending on your file;
input column $ products $;
run;
Similar for the other dataset.
Related
here is the scenario: at the beginning, I prepare to import a csv file; then in Proc SQL, I insert the record of temp data set into database, the following are my difficulties:
for the sake of audit, I want to update one record in a table in the database to record this insert operation:
update table1
set inserted_record=&SQLOBS, insert_date=today()
where filename=&csv_file_name;
But the length of the filename is more than 32 character.what should I do ? Thanks!
My SAS code is like the following:
DATA Temp1;
File_name="kkkkkkkkkkk_product_information_20200101_20211005_FULL.csv"
run;
Data work.temptable;
length
Product_ID $36
Worth_USD $9;
Format
Product_ID Char36.
Worth_USD Char9.;
Informat
Infile
input
Run;
Libname lib1 Teradata user=userid Password=xxxxxx
proc SQL;
insert into lib1.table1(col1,col2)
select prodcut_id,worth_usd from work.temp_table;
update lib1.import_summary set inserted_record=&sqlobs,operated_date=today() where file_name='&file_name';
Run;
according to the log, the SAS code can do the insert operation successfully while the update operation is not (the log shows "No rows were updated"). I check the table of import_summary, there is already a record whose file_name is "kkkkkkkkkkk_product_information_20200101_20211005_FULL.csv". It should be updated. Who can provide the comments? Thanks!
From your code shown this shouldn't affect anything, you do need to have quotes around the file name as it's likely a character field but the 32 char limit is only on data set names which this is not and the file name doesn't have a 32 character limit.
update table1
set inserted_record=&SQLOBS, insert_date=today()
where filename="&csv_file_name";
EDIT:
This needs double quotes, not single quotes:
where file_name='&file_name';
I Have a file from excel that is in a short date format, but when SAS reads it in, it turns it into numbers in the 4000 range...when I try and convert this to an excel date with the below formula, it turns the year into 2077...is there a formula to ensure that this date remains in the original format on the read in, or avoid it turning into this 4000 range that is not at all close to the 2017 and 2018 year that my file is starting in. Does that make sense?
data change_date;
format Completed_Date mmddyy8. ;
set check;
completed_date = date_completed;
if 42005 => date_completed >=43466 and date_completed ^=. then
Completed_date = Date_Completed-21916; *commented out 12-21-17 Xalka
dates back to how they are expected;
run;
I am pretty sure this is a duplicate question, but I can't find it.
This is usually caused by mixing character and date values in the same column. This made SAS import the data as a character variable and it results in the actual dates being copied as character versions of the integers that Excel uses to store dates.
Frequently this is caused by entries that look like dates but are really character strings in the Excel file. The best way to fix it is to fix the Excel file so that the column only contains dates. Otherwise you just need to convert the strings to integers and adjust the values to account for the differences in index dates.
So if your values are in a SAS dataset named HAVE in the character variable DATESTRING then you could use this data step to create a new variable with an actual date value.
data want ;
set have ;
if indexc(datestring,'-/') then date=input(datestring,anydtdte32.);
else date = input(datestring,32.) + '01JAN1900'D -2;
format date yymmdd10. ;
run;
The minus 2 is because of difference in whether to start numbering with 1 or 0 and because Excel thinks 1900 was a leap year.
Excel and SAS have different default dates in back-end.
Day 0 in SAS is 1 January 1960 and Day 0 in Excel is 1 January 1900.
So, you will need to convert excel numeric date to sas date using the below formula.
SAS_date = Excel_date - 21916;
data dateExample;
informat dt mmddyy8.;
set dates;
SAS_date = dates - 21916;
dt=sas_Date;
format dt date9.;
run;
I have a sas dataset with columns shiyas1,shiyas2,shiyas3 in it. That dataset has some other columns also. I want to combine all the columns with header with shiyas in it.
We can't use cats(shiyas1,shiyas2,shiyas3) because similar datasets have columns upto shiyas10. As I am generating general sas code, we cannot use cats(shiyas1,shiyas2 .... shiyas10).
So how can we do this?
When I tried to use cats(shiyas1,shiyas2 .... shiyas10), eventhough my dataset have columns upto shiyas3, it created columns shiyas4 to shiyas10 with . filled in them.
SO one solution is to combine shiyas till the dataset have or to delete the unnecessary shiyas columns...
Pls help me.
Use variable list.
data have;
input (shiyas1-shiyas3) (:$1.);
cards;
1 2 3
;
data want;
set have;
length cat_shiyas $ 100 /*large enough to hold the content*/
;
cat_shiyas=cats(of shiyas:);
run;
Use the of statement (which lets you read across a row, similar to arrays) with the : wildcard operator. This will concatenate all columns beginning with 'shiyas'
cats(of shiyas:)
I'm trying to concatenate multiple datasets in SAS, and I'm looking for a way to store information about individual dataset names in the final stacked dataset.
For eg. initial data sets are "my_data_1", "abc" and "xyz", each with columns 'var_1' and 'var_2'.
I want to end up with "final" dataset with columns 'var_1', 'var_2' and 'var_3'. where 'var_3' contains values "my_data_1", "abc" or "xyz" depending on from which dataset a particular row came.
(I have a cludgy solution for doing this i.e. adding table name as an extra variable in all individual datasets. But I have around 100 tables to be stacked and I'm looking for an efficient way to do this.)
If you have SAS 9.2 or newer you have the INDSNAME option
http://support.sas.com/kb/34/513.html
So:
data final;
format dsname datasetname $20.; *something equal to or longer than the longest dataset name including the library and dot;
set my_data_1 abc xyc indsname=dsname;
datasetname=dsname;
run;
Use the in statement when you set each data set:
data final;
set my_data_1(in=a) abc(in=b) xyc(in=c);
if a then var_3='my_data_1';
if b then var_3='abc';
if c then var_3='xyz';
run;
What's wrong with the below SAS code? The single date column cannot be read correctly.
DATA test;
INPUT mydate MMDDYY8.;
FORMAT mydate YYMMDD10.;
DATALINES;
01-22-98
03-03-97
;
PROC PRINT DATA = test;
RUN;
Edit: Thanks for the answer. Another follow-up question is, when I try to read CSV format where datetime is quoted, it always fails to read correctly. How to read CSV format with quoted datetime values correctly? DSD option doesn't help much in my case.
Try left-aligning the datalines.
Though SAS is a free format language. I.e. Any statement can start in any line, one statement can span across multiple lines, multiple statement can be on online.
However with the datalines - statement that represents data within the code, data should start from column 1 / at least in column 2. Hence if the first two columns are blank, SAS assumes that the row is blank and goes to the next row.
Hence the mistake in your code is to start the data from the right column.