Specify missing values in csv file - stata

I am writing a file to csv before reading it into Stata. How do i specify missing values in the csv file, so that when the csv is read into Stata they are automatically coded by as missing?

Found answer - if values are set as . in the csv file, they are read into stata as missing values.

Related

Importing ZIPPED CSV files (depending on their NAME)

I try to import a selection of ZIPPED CSV files into SAS BASE 9.4. This is working file by file, but I would like to import all CSV files named like "AXAMCE" followed by any other character.
I have working code for 1 file, but the ZIP contains also other file which I'd like to read beginning with the same characters "AXAMCE"
How to read all CSV files in the ZIP beginning with AXAMCE.D1905 into 1 SAS file?
infile ziplib("AXAMCE.D190524.T0210.CSV") firstobs =2 ;

SAS Problem: Data infile leads to 0 observations

I'm having an issue with SAS 9.4. See code below;
data myData;
infile 'D:\folder1\folder2\myData.xlsx';
input var1 var2 var3;
This results in SAS executing this successfully, recognizing the 3 variables but containing 0 observations. Is there something wrong with how the code is written? Has anyone encountered this issue? Thank you in advance.
Since an XLSX file is binary (in particular it is just a ZIP file) your data step is not finding any lines of text to read. Most likely the reason you got 0 observations is that when searching for the second or third space delimited word to read it went past the end of the file. So the data step stopped at the INPUT statement and never reached the end of the first iteration to write an observation.
You will need to either use PROC IMPORT or a LIBNAME statement using the XLSX engine to read an XLSX file. Or use Excel to save the file as a delimited text file, then you could read using a simple data step.
If your data is in an Excel format, you should be able to do PROC IMPORT to read it in.
PROC IMPORT DATAFILE="D:\folder1\folder2\myData.xlsx" DBMS=XLSX OUT=myData;
RUN;

Error While opening ARFF file in WEKA

I had an excel sheet i converted it to ARFF file using online facility, but when i tried to open it in the 3.8 version of WEKA software it is showing me this error--
I have attached the image of the dialog box which is popping. Please help me out.
Thanks in advance
You can just open the CSV file with Weka explorer and save it as arff you really need it in that format.
EDIT:
Your problem is not the file format. It seems that all of your values are numeric. J48 is a classification algorithm and therefore Weka won't let you use it on numeric data.
Which column in the data is the target?
IF you want to use classification algorithm, you need to do one of the following: use a numeric-to-nominal filter on the target feature, use an arff file where you specify that the target column is nominal or rename the values of the column to non-numeric values. Here is a link to an arff file where the last column (race) is defined as nominal: https://drive.google.com/open?id=0B7b0iysQV1SEcjJJUE1lc19fR2c

how to read only specific coulmns from external tab delimited files using proc import into sas dataset

is it possible to read only specific variabiles from text files into sas as sas dataset using proc import? I have very big data in my text file which contains like 1000 observations and more than 42,000 variables. I tried reading this file into sas using proc import but i failed in doing so i thought may be because of size issues. Now i decided to read only specific variables (columns) really i do not need all of them from this big text file so that i can reduce size of file to read into sas system. is there any ideas to read like this by using data step? Suggestion or help would be appreciated
Thanking you very much,
If you're talking about a delimited text file, you can read in specific variables, contingent on them being consecutive from the first variable. You can't use PROC IMPORT to do this, however; you'd have to write the datastep yourself, though you could conceivably have PROC IMPORT help you write it.
For example, if you have 10 variables and only want the first 3, then you can read it in like this:
data want;
infile "mydata.txt" dlm=',' lrecl=255 missover dsd;
input x $ y $ z $;
run;
However, you must read in all variables from the first variable to the last variable you are interested in, even if you aren't interested in all of the intermediate variables. You can't "skip" them with a delimited text file.
If you have a fixed width text file, you can read in whatever columns you want (but you can't use PROC IMPORT to read in a fixed width text file).

How to rename a text file with SAS on Windows XP?

I have a process in SAS that creates a .csv. I have another process in Python that waits until the file exists and then does something with that file. I want to make sure that the Python process doesn't start doing its thing until SAS is done writing the file.
I thought I would just have SAS create the .csv and then rename the file after it was done. Can I do the renaming in SAS? Then the python process could just wait until the renamed file existed.
EDIT: I accidentally posted this question twice and then accidentally deleted both versions. Sorry about that.
I think better than renaming would be to write a shell script that executes the SAS program, then only executes the Python script once the SAS program has exited without errors. The syntax for this varies somewhat depending on your OS, but wouldn't be too difficult.
With verion 9.2 and later SAS has a rename function that should work just the way you would like.
You could generate your output in a SAS dataset and then write to the .csv file only when you're finished with it. Not sure how you're creating the csv, but I usually just do this:
data _null_;
file './ouptut_file.csv' dlm=',';
set dataset_name;
put var1 var2;
run;
If you do this at the very end of your SAS code, no csv file will be generated until you're finished manipulating the data.
There are more than one way of doing this.
Does the python script monitor for .csv files only? Or it will be triggered when any new file is created in that directory?
If it will trigger only to .csv, then you could simply output to, say, a .txt file, then rename it using the x statement (OS command), or the rename function as #cmjohns suggested.
Another option is, you could output the .csv file to a different directory, then just move it to the directory that python is monitoring. As long as the file is in the same disk, moving the file will be done in an instant.
You could also modify the python script to only look for .csv, then do the option 1.
You could even create a "flag" file for python to check when sas has finished with the .csv.
But if possible, I think I would go with #jgunnink.
How about having the watching program watch for a separate small file instead of the large CSV file? That would be written in one write operation by SAS and reduce the risk of triggering the read before the write is done.
data _null_;
file 'bigfile.csv' dsd ;
set bigdata;
put (_all_) (+0);
run;
data _null_;
file 'bigfile.txt' ;
put 'bigfile.csv written';
run;