Using hptmine procedure in SAS, is it possible to write the output to the same file or write to a file in the same format as in the input after Stemming operation is done
It seems it is not possible. Stemming is the part of parsing phase of hptmine procedure. According to documentation the only output data set for parse statement that contains a full list of terms after stemming operation is done, is OUTTERMS= data set.
Instead, to replace terms for their stems in your original file, you can make use of OUTTERMS= data set in a sas code that will follow hptmine procedure.
For example, create a dictionary:
proc sql;
create table work.child_parent as
select child.term as term_child, parent.term as term_parent
from OUTTERMS child
inner join OUTTERMS parent
on child.parent = parent.key
where child._ispar="." and child.role ne "NOUN_GROUP"
;
quit;
and use the dictionary as a lookup table in a code that goes through original text file and replaces each encountered term_child for its stem -term_parent.
Related
I have to perform statistical analysis on a file with hundreds of observations and 7 variables(columns)on SAS. I know that it is necessary to insert all the observations after "cards" or "datalines". But I can't write them all obviously. How can I do? Moreover, the given data file already is .sas7bdat.
Then, since (in my case) the multiple correspondence analysis requires only six of the seven variables, does this affect what I have to write in INPUT or/and in CARDS?
You only use CARDS when you're trying to manually write a data set. If you already have a SAS data set (sas7bdat) you can usually use that directly (there are some exceptions but likely don't apply here).
First create a libname to the folder where the file is:
libname myFiles 'path to fodler with sas file';
Then load it into your work library - this is a temporary space that is cleaned up when you're done so no files here are saved permanently.
This copies it over to that library - which is often faster.
data myFileName;
set myFiles.myFileName;
run;
You can just work with the file from that library by referencing it as myFiles.myFileName in your code.
proc means data=myFiles.myFileName;
run;
This should get you started, but you should take the SAS free e-course to understand the basics, it will save you time overall.
Just tell SAS to use the dataset. INPUT statement (and CARDS/DATALINES or INFILE statement) are for reading from text files.
proc corresp data='/my directory/mydataset.sas7bdat' .... ;
...
run;
You could also make a libref that points to the directory and use two level name to reference the dataset.
libname myfiles '/my directory/';
proc corresp data=myfiles.mydataset .... ;
...
run;
I have a table in SAS which contains the format information I want. I want to bin this data into the categories given.
What I don't know how to do is create either an xform or a format file from the data.
An example table looks like this:
TxtLabel Type FmtName label Hlo count
. I FAC1f 0 O 1
1996 I FAC1f 1 2
1997 I FAC1f 2 3
I want to date all years in a different data set as after 1997 OR before 1996.
The problem is that I know how to do this by hard coding it, but these files changes the numbers each time so I'm hoping to use the information in the table to generate the bins rather than hard code them.
How do I go about binning by data using a column from another dataset for my categorization?
Edit
I have two data sets, one which looks like the one I have included and one which has a column titled "YEAR". I want to bin the second data set using the categories from the first. In this case there are two available years in TxtLabel. There are multiple tables like this, I'm looking at how to generate PROC Format code from the table, rather than hard coding the values.
This should run to create the desired format
Proc FORMAT CNTLIN=MyCustomFormatControlData;
run;
You can then use it in a DATA Step, or apply it to a column in a data set.
Binning the data might be construed as 'data set splitting' but your question does not make it clear if that is so. Generic arbitrary splitting is often done with one of these techniques:
wall paper source code resolved from macro variables populated from information garnered in a Proc SQL or Proc FREQ step
dynamic data splitting using hash object for grouping records in memory, and saved to a data set with an .output() call.
Sample code for explicit binning
data want0 want1 want2 want3 want4 want5 wantOther;
set have;
* explicit wall paper;
select (put(year,FAC1f.));
when ('0') output want0;
when ('1') output want1;
when ('2') output want2;
when ('3') output want3;
when ('4') output want4;
when ('5') output want5;
otherwise output wantOther;
run;
This is the construct that source code generated by macro can produce, and requires
one pass to determine the when/output lines that are to be generated
a second pass to apply the lines of code that were generated.
If this is the data processing that you are attempting:
do some research (plenty of info out there)
write some code
make a new question if you get errors you can't resolve
Proc FORMAT
Proc FORMAT has a CNTLIN option for specifying a data set containing the format information. The structure and values expected of the Input Control Data Set (that CNTLIN) is described in the Output Control Data Set documentation. Some of the important control data columns are:
FMTNAME
specifies a character variable whose value is the format or informat name.
LABEL
specifies a character variable whose value is associated with a format or an informat.
START
specifies a character variable that gives the range's starting value.
END
specifies a character variable that gives the range's ending value.
As the requirements of the custom format to be created get more sophisticated you will need to have more information variables in the input control data set.
In R I could just write something like model$deviance and model$df.residual, but I can't seem to find any way of doing this in SAS.
Whereas R functions produce an object that has sub-objects that you can extract into a variable, SAS procedures create tables. If you see a statistic in some table that you want to use in another part of your program, you can use the Output Delivery System (ODS) to write that table to a data set, as follows:
1) Use the ODS TRACE ON statement to discover the name of the table (or look it up in the documentation)
2) Use the ODS OUTPUT statement to write the table to a data set.
For example, if you are interested in the many goodness-of-fit diagnostic statistics (including the statistics for deviance and chi-square residuals), you can discover that the "Criteria for Assessing Goodness of Fit" table has the name ModelFit. Therefore, putting
ODS OUTPUT ModelFit=FitStatistics;
inside your PROC GENMOD call will create a data set called "FitStatistics" that contains the statistics you want.
We are evaluating the time taken for two set of codes in SAS. Is there a way we can write/ tabulate option fullstimer results in a SAS dataset, without copying the entire log file into a notepad?
I would go about it like this.
Create separate SAS program files containing your code for each approach. Include options fullstimer at the top of both.
Batch submit your programs and write the logs to permanent files using the -log command line option.
Create a simple program that reads in both logs and compares the results.
The last step can be accomplished by using data steps with the INFILE statement and restricting the input records to those which are standard output from FULLSTIMER. Then you can compare the created datasets however you wish, e.g. via PROC COMPARE.
SAS has provided a log parsing macro that looks as though it should do the sort of thing that you want. It's available here:
http://support.sas.com/kb/34/301.html
I am trying to breakdown data file into small files, with one of the variables as a part of the name for those files. To be specific, I have a bunch of Census tracts, plus other variables. I am reading them into the matrix, perform some operations and now would like to export the data out of the loop and save it as external data file, with census tract as a part of the name; this has to be done without breaking the loop or quitting IML as I am moving onto the next tract:
read i = first census tract;
append data from other matrix;
save out file as "rld_'census_tract' value";
read next census tract;
repeat;
I tried symput function but it requires using data null inside the IML which breaks the flow.
I don't know the solution in IML (or even if there is), but I'd suggest a different solution.
Write all of your matrices out to a single dataset (either by appending them all together or by appending to a single dataset as the loop progresses, whichever is easier), and append 'census tract' as one variable in that dataset. Then use a sas datastep to write them out to separate files afterwards. If you're talking about writing out to separate sas datasets, you can either use conditional logic or you can create a macro call to do it; if you are writing external files such as CSV or text file, you can use a filename variable (filevar option on the file statement) and write it out that way.
This will be fairly efficient (in particular for the external file method) and doesn't require staying in IML.
In SAS/IML 12.1 you can use the USE and CREATE statements with arguments that are evaluated at run time.
If you have not yet upgraded to SAS/IML 12.1, you can use CALL SYMPUT and the SYMGET function to fill and retrieve a macro variable. These functions work in the IML language: no need to use DATA NULL. See my article on macros and loops in IML.