I have a macro in SAS EG that reads all the csv files from a folder into SAS EG. The number of files in this folder are always changing but the columns in each file remain the same. How do I write a macro that unions together the tables that are read in?
Related
I have to perform statistical analysis on a file with hundreds of observations and 7 variables(columns)on SAS. I know that it is necessary to insert all the observations after "cards" or "datalines". But I can't write them all obviously. How can I do? Moreover, the given data file already is .sas7bdat.
Then, since (in my case) the multiple correspondence analysis requires only six of the seven variables, does this affect what I have to write in INPUT or/and in CARDS?
You only use CARDS when you're trying to manually write a data set. If you already have a SAS data set (sas7bdat) you can usually use that directly (there are some exceptions but likely don't apply here).
First create a libname to the folder where the file is:
libname myFiles 'path to fodler with sas file';
Then load it into your work library - this is a temporary space that is cleaned up when you're done so no files here are saved permanently.
This copies it over to that library - which is often faster.
data myFileName;
set myFiles.myFileName;
run;
You can just work with the file from that library by referencing it as myFiles.myFileName in your code.
proc means data=myFiles.myFileName;
run;
This should get you started, but you should take the SAS free e-course to understand the basics, it will save you time overall.
Just tell SAS to use the dataset. INPUT statement (and CARDS/DATALINES or INFILE statement) are for reading from text files.
proc corresp data='/my directory/mydataset.sas7bdat' .... ;
...
run;
You could also make a libref that points to the directory and use two level name to reference the dataset.
libname myfiles '/my directory/';
proc corresp data=myfiles.mydataset .... ;
...
run;
I am following this example:
LazySimpleSerDe for CSV, TSV, and Custom-Delimited Files - TSV example
Summary of the code:
CREATE EXTERNAL TABLE flight_delays_tsv (
yr INT,
quarter INT,
month INT,
...
div5longestgtime INT,
div5wheelsoff STRING,
div5tailnum STRING
)
PARTITIONED BY (year STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
ESCAPED BY '\\'
LINES TERMINATED BY '\n'
LOCATION 's3://athena-examples-myregion/flight/tsv/';
My questions are:
My tsv does not have column names
(my tsv)
Is it ok if I just list the columns as c1,c2… and all of them as string ?
I do not understand this:
PARTITIONED BY (year STRING)
in the example, the column ‘year’ is not listed in any of the columns…
Column names
The column names are defined by the CREATE EXTERNAL TABLE command. I recommend you name them something useful so that it is easier to write queries. The column names do not need to match any names in the actual file. (It does not interpret header rows.)
Partitioning
From Partitioning Data - Amazon Athena:
To create a table with partitions, you must define it during the CREATE TABLE statement. Use PARTITIONED BY to define the keys by which to partition data.
The field used to partition the data is NOT stored in the files themselves, which is why they are not in the table definition. Rather, the column value is stored in the name of the directory.
This might seem strange (storing values in a directory name!) but actually makes sense because it avoids situations where an incorrect value is stored in a folder. For example, if there is a year=2018 folder, what happens if a file contains a column where the year is 2017? This is avoided by storing the year in the directory name, such that any files within that directory are assigned the value denoted in the directory name.
Queries can still use WHERE year = 2018 even though it isn't listed as an actual column.
See also: LanguageManual DDL - Apache Hive - Apache Software Foundation
The other neat thing is that data can be updated by simply moving a file to a different directory. In this example, it would change the year value as a result of being in a different directory.
Yes, it's strange, but the trick is to stop thinking of it like a normal database and appreciate the freedom that it offers. For example, appending new data is as simple as dropping a file into a directory. No loading required!
I was given a list of files that need to be imported into SAS, however I am struggling to import them correctly. The files are formatted as such:
There is one "Header File" that contains a few lines of metadata followed by:
RECORD 1
Header column 1
Header column 2
Header column 3
Header column 4
Record 2
Header column 1
Header column 2
Header column 3
Header column 4
Header column 5
Header column 6
.
.
.
RECORD 3
.
.
.
And then "data files" which contain no meta data (that I am aware of) and are simply column ("|") delineated.
I was told these files were generated using SAS and I believed them to be a library, however:
Proc CIMPORT data="C..."
did not work.
I can import them individually using
Proc Import data="";
DBMS=DLM;
Run;
I asked this question earlier to no avail, I included more information this time. I feel like this is something that is really easy that I am just missing somehow. Thank you very much in advance.
You can use PROC IMPORT to read the pipe delimited files. Use the getnames=no; statement to tell it to generate its own names. You can then use the metadata from the first file to generate RENAME statements to change the names.
PROC CIMPORT is for reading transport files generated by PROC CPORT.
For a more complete example with code please post some actual examples of the data files, especially the one with the metadata. If the metadata is complete then you could probably skip the PROC IMPORT and just use the metadata to directly write data steps to read the data files.
I have a directory containing a set of SAS scripts, data files and also csv files. I want to now associate this directory with a library in SAS. I've created a library with the path name to the directory but the contents of the library are empty when I look via SAS but not via Windows Explorer.
How can I create a SAS library with all of these existing files inside it?
Using Windows 8 SAS 9.3
PS. New to SAS hence what is possibly a very easy question.
A SAS library can contain objects such as SAS tables, SAS views, SAS catalogs (only SAS can read this catalog, on the operating system it's just a file).
In SAS catalog you can write e.g. SAS formats, SAS macros.
To assign a library use a libname statment.
libname libref base "path_to_OS_catalog";
If you want to read a file using SAS e.g. CSV, TXT files, use a filename statment. Check a documentation for examples.
filename fileref "path_to_file";
In other words, it's correct that your library is empty, because in a OS catalog there are no SAS objects.
We are evaluating the time taken for two set of codes in SAS. Is there a way we can write/ tabulate option fullstimer results in a SAS dataset, without copying the entire log file into a notepad?
I would go about it like this.
Create separate SAS program files containing your code for each approach. Include options fullstimer at the top of both.
Batch submit your programs and write the logs to permanent files using the -log command line option.
Create a simple program that reads in both logs and compares the results.
The last step can be accomplished by using data steps with the INFILE statement and restricting the input records to those which are standard output from FULLSTIMER. Then you can compare the created datasets however you wish, e.g. via PROC COMPARE.
SAS has provided a log parsing macro that looks as though it should do the sort of thing that you want. It's available here:
http://support.sas.com/kb/34/301.html