Loop for creating several empty csv files - stata

I need to create 215 empty csv files with Stata and save them on my computer.
Since this is a repetitive task, a loop would be perfect. Each file would have a similar but different name (for example Data_Australia, Data_Austria and so on).
How do I create a loop to generate several empty csv datasets with Stata?
I tried the community-contributed command touch but it works well when you only need to generate one empty dataset.

Assuming you want a completely empty file (no header row or anything), just open a file to write to, and immediately close it again.
cd "C:\Users\My\Directory"
local country_names Australia Austria "Republic of Korea" // add all the names here
foreach country_name in `country_names' {
file open f1 using "Data_`country_name'.csv", write replace
file close f1
}
If you have the names stored as a string variable, say country, you can instead loop through the values in that variable (in this case stopping when it reaches the end or an empty row).
local row = 1
while country[`row'] != "" {
file open f1 using "Data_`=country[`row']'.csv", write replace
file close f1
local ++row
}

Related

Append CSVs in folder; how to skip / delete first rows in each file

I have 25 CSV files in a folder linked as my data source. The 1st row in each file contains just the file name in column A, followed by the column headers in the 2nd row (this is how the files are generated and sent to me; I do not have access to the database).
CSVs first 2 rows
When I remove the first row of the sample file, then promote headers, then Close & Apply, I get a list of errors which are essentially the redundant column header rows in each of the subsequent 24 files in the folder.
Error List
Upon suggestion, I changed the end of the first Applied Step in Transform Sample File from QuoteStyle.None]) to QuoteStyle.Csv]). This did not solve and didn't seem to change anything.
Another suggestion was that I could just proceed with the errors; filter as needed but that it wouldn't be a problem. This seems risky/sloppy to me, but maybe it's fine and I'm just a nervous newb?
Many thanks for any input!

appending a column to an existing file in fortran

I'm am trying to write a script that creates an output file with all my data, but my data has different lengths. So I was thinking of writing one file, then adding a new column to it with the other data. I am open to any other suggestions.
So, for example, I would write one file with 3 columns with all my coordinates, then later add a fourth column including temperature or something. the coordinates would have a longer line length since they are measured more frequently.
this is what I've tried before
24 format(a4, 1x, 2(ES12.4, 1x),i4, 1x, f8.3,1x,ES12.4,1x, 3(i4,1x))
25 format(20x, f8.3,1x,ES12.4,1x, 3(i4,1x))
do while (.true.)
read(unit=802,fmt=2,end=122)coll2,t2,ered,tred,hb_alpha,hb_ii,hb_ij,ehh_ii,ehh_ij,rg_avg,e2e_avg
write(8,25) ttotal+t, hb_alpha,hb_ii-hb_alpha,hb_ij, colltotal+coll
end do
write(8,24) fname_digits, ttotal+t, colltotal+coll, betahb
everything is within another do-loop to read from one file to the next. the variables in the do-loop have a longer length than the variables in the second write statement.
I would expect all the data in one file, with varying line lengths.

PDI - Multiple file input based on date in filename

I'm working with a project using Kettle (PDI).
I have to input multiple file of .csv or .xls and insert it into DB.
The file name are AAMMDDBBBB, where AA is code for city and BBBB is code for shop. MMDD is date format like MM-DD. For example LA0326F5CA.csv.
The Regexp I use in the Input file steps look like LA.\*\\.csv or DT.*\\.xls, which is return all files to insert it into DB.
Can you indicate me how to select the files the file just for yesterday (based on the MMDD of the file name).
As you need some "complex" logic in your selection, you cannot filter based only on regexp. I suggest you first read all filenames, then filter the filenames based on their "age", then read the file based on the selected filenames.
In detail:
Use the Get File Names step with the same regexp you currently use (LA.*\.csv or DT.*\.xls). You may be more restrictive at that stage with a Regexp like LA\d\d\d\d.....csv, to ensure MM and DD are numbers, and DDDD is exactly 4 characters.
Filter based on the date. You can do this with a Java Filter, but it would be an order of magnitude easier to use a Javascript Script to compute the "age" of you file and then to use a Filter rows to keep only the file of yesterday.
To compute the age of the file, extract the MM and DD, you can use (other methods are available):
var regexp = filename.match(/..(\d\d)(\d\d).*/);
if(regexp){
var age = new Date() - new Date(2018, regexp[1], regexp[2]);
age = age /1000 /60 /60 /24;
};
If you are not familiar with Javascript regexp: the match will test
the filename against the regexp and keep the values of the parenthesis
in an array. If the test succeed (which you must explicitly check to
avoid run time failure), use the values of the match to compute the
corresponding date, and subtract the date of today to get the age.
This age is in milliseconds, which is converted in days.
Use the Text File Input and Excel Input with the option Accept file from previous step. Note that CSV Input does not have this option, but the more powerful Text File Input has.
well I change the Java Filter with Modified Java Script Value and its work fine now.
Another question, how can I increase the Performance and Speed of my current transformation(now I have 2 trans. for 2 cities)? My insert update make my tranformation to slow and need almost 1hour and 30min to process 500k row of data with alot of field(300mb) and my data not only this, if is it work more fast and my company like to used it, im gonna do it with 10TB of data/years and its alot of trans and rows. I need sugestion about it

foreach loop running but not giving results

I am having trouble running a foreach loop. The loop runs without error but gives no output. Can someone tell me what they think might be going on? Many thanks in advance!
Here is the code:
cd "O:\RESEARCH\ikhilko\Subway Big Data project"
local datafiles : dir . files "*.txt"
foreach file in `datafiles' {
insheet using `file',
clear
insheet using `file',
drop v9-v43
save date1, replace
}
UPDATE:
Interestingly, the code runs when I just type it into the command line, rather than doing it from the .do file, any idea what might be going on there?
It is important to note that local macros are precisely that, i.e. defined and visible only locally.
Locally means within
the same interactive session
or
the same program
or
the same do file (or do file editor contents)
or
the same part of the do file (or ...) executed by selection
Locality is, it seems, biting you here. A local macro defined in one place is not visible in another. A local macro reference will evaluate to missing, i.e. an empty string, if the macro is not visible.
Some code for the debugging. display the contents of your local datafiles to see what's going into the loop:
local datafiles : dir . files "*.txt"
display `"`datafiles'"'
local wordx : word 1 of `datafiles'
display `"`wordx'"'
foreach file in `datafiles' {
display "`file'"
}
(The code does not format well in the comments section.)

How to refresh file view

In Enterprise Guide 4.2, is there a way to refresh your view of a file short of deleting it from the Process Flow then reopening it?
My Google-fu has failed to provide an answer (one way or the other) and my SAS admin has said he's not aware of a way (but to let him know if I find one).
A definitive "no" (from documentation) or a "yes" with example would be much appreciated.
I have a log file that's updated when I run my SAS program from the command line (outside of EG). I edit my code within EG, and I'd like to peek at the log file to see the results. Currently I have to delete the log file from my Process Flow then reopen it to see the updated log.
From your last comment on your question, it sounds like you are running a non-interactive SAS program on a server (from a PuTTY session) and looking at the log file with your EG client, is that correct? If so, there are much easier ways to watch the log file.
When you mention PuTTY, I'll assume your server is UNIX. If so, use the tail command with the -f option. For example, if your SAS program is named "myprog.sas", it will create a log file named "myprog.log", so try this command at your UNIX prompt:
tail -f myprog.log
The -f option means to continue writing output to your terminal window as lines are written to the log. When you get tired of watching (or your see the SAS "end of job" message), type the letter "q" to quit.
EG in intended to be the application that you use to actually execute your SAS program. Running things from the UNIX prompt is outside the design (and you lose all those cool EG features), as well as miss out any site features that have been set up for you in the metadata environment.
If I'm completely off-base, please clarify your question.
When using SAS EG or SAS Studio in a SAS platform were the compute nodes are hosted in Linux machines, I always use code to see the contents of an output file created by SAS; The only requirements are that you know the fullpath of the file you want to browse and that you have the privileges to read from it.
The simple idea is to use a generic DATA step to:
Access the file
Read line by line
Filter the data, by contents, position, line number, etc.
Print the data to the SAS log so you can see it there
Here is a simple example to get you going:
First I create a file for the test. You already have it!, so use yours.
/* create a test file */
data _null_;
file '/folders/myshortcuts/test/file'; /* Note I'm using fullpath */
put "The begining, :)";
put "line 2 in my file is shy and likes to hide.";
put "line 3, all good so far.";
put "line 4 in my file is to remain private.";
put "Finally the last line in my file!";
run;
Then, here is the code to read its data
data _null_;
/*--------
references which input file will be read
setting variable 'theEnd'=1 when
reaches end-of-file
Note I'm using fullpath
--------*/
infile '/folders/myshortcuts/test/file' end=theEnd;
/*--------
reads one line at a time from input file
storing it in variable _infile_
--------*/
input;
/*--------
contents of file will be writen in the log, to keep it readable,
mark where the contents of the file will follow
--------*/
if _n_=1 then
put "-----start file ----";
/*--------
filter out shy, private or unwanted data
--------*/
if _n_ ne 4; /* continue only if row number is not 4 */
if indexw(_infile_,"shy") le 0; /* continue only if data does not contains 'shy' */
/*--------
write the data you want, complete line read in this case
--------*/
put _N_= "->" _infile_;
/*--------
mark where the data in the file has ended
--------*/
if theEnd then put "-----end file ----";
run;
Your SAS log will look like this:
NOTE: The infile '/folders/myshortcuts/test/file' is:
Filename=/folders/myshortcuts/test/file,
Owner Name=sasdemo,Group Name=sas,
Access Permission=-rw-rw-r--,
Last Modified=11Jan2017:22:42:56,
File Size (bytes)=160
-----start file ----
_N_=1 ->The begining, :)
_N_=3 ->line 3, all good so far.
_N_=5 ->Finally the last line in my file!
-----end file ----
NOTE: 5 records were read from the infile '/folders/myshortcuts/test/file'.
The minimum record length was 16.
The maximum record length was 43.
NOTE: DATA statement used (Total process time):
real time 0.02 seconds
cpu time 0.03 seconds