adjusting the project.xml file in a SAS Enterprise Guide project outside SAS EG - sas

We are going to migrate our EG projects (over 1000 projects) to a new environment.
In the old environment we use "W-Latin" as encoding on the Teradata database.
In the new environment we will start using "UTF-8" as encoding on the Teradata database.
And a lot of other changes which I believe are not relevant for this question.
To prevent data issues we will have to replace functions like REVERSE, etc with KREVERSE, etc
We could do this by opening al projects and clicking through it to change the functions in the expression builder.
This would be really time consuming, considering that we have over 1000 .egp files
We already have a code scanner that unzips the .egp file and detects al the use of these functions in the project.xml file.
The next step could be that we find and replace the functions and put the project.xml file back in the .egp file.
Who can tell me how to put the project.xml file back in the .egp file without corrupting the .egp file

I was able to do this.
tl;dr -- Zip the files back up and change the extension to .egp.
Created a new EG project and added a code node to create sample data:
data test;
do cat = "A", "B", "C";
do i=1 to 10;
r = rannor(123);
output;
end;
end;
drop i;
run;
I then added a Query node to the output to do a "SUM" of the r column by cat.
Ran the flow and got expected output.
Saved the EG project.
Opened the EG Project in 7zip and extracted the archive to a location.
In project.xml, I found the section for the Query and changed the SUM to MEAN
<Expression>
<LHS_TYPE>LHS_FUNCTION</LHS_TYPE>
<LHS_DMCOLGROUP>Numeric</LHS_DMCOLGROUP>
<RHS_TYPE>RHS_COLUMN</RHS_TYPE>
<RHS_DMCOLGROUP>Numeric</RHS_DMCOLGROUP>
<InFormat />
<LHS_String>MEAN</LHS_String>
<LHS_Calc />
<OutputType>OPTYPE_NOTSET</OutputType>
<RHS_StringOne>r</RHS_StringOne>
<RHS_StringTwo />
</Expression>
Selected the files and added them to an achieve using 7zip. Selected "zip" compression and saved the file with ".egp" extension.
I opened the project in EG and ran the flow. The output was now the MEAN of R and not the SUM.

Related

Call the added file in Hive to use for UDF

I have a file which contains holidays and it is required by UDF to use this file to run and calculate the given business days for two dates. Issue I have is when I add the file, it goes to a working directory but this directory differs every session.
Unlike in the example below from Hive Resources - This is not what is happening.
hive> add FILE /tmp/tt.py;
hive> list FILES;
/tmp/tt.py
hive> select from networks a
MAP a.networkid
USING 'python tt.py' as nn where a.ds = '2009-01-04' limit 10;
This is what I am getting and the alpha numeric keeps changing.
/mnt/tmp/a17b43d5-df53-4eea-8e2c-565471b49d25_resources/holiday2021.csv
I need to make this file located in a more permanent folder and this hive sql can be executed into any of the 18 nodes.

How can I read 7z files directly from SAS?

I'm a SAS beginner programmer who has a lot of compressed files under 7z format. Because of lack of space in the server I work, I need to open files directly from their compressed form. I've found the following SAS documentation about Reading Compressed Text Files:
https://support.sas.com/resources/papers/proceedings/proceedings/sugi31/155-31.pdf
However, I do not obtain any result using the next code, for example:
FileName Com7zipa Pipe '7za e "rie_mbco_matriz_07.7z" "rie_mbco_matriz_07.sas7bdat" -y -so';
Data DataSet07;
infile Com7zipa;
Input NRO_DOC;
run;
I hope you can help me.
Best regards,
Jean Pierre
As far as I am aware, gzipis supported in 9.4M5 or above but 7z is not. Although server space is limited, you will likely have at least a fair amount of SAS WORK directory space allocated. You can use x commands to unzip the file to the WORK directory instead and read it from there. Note that you will need to enable x commands first using this method.
Let's assume a file named myfile.7z has a csv file in it named myfile.csv.
/* Create a macro variable holding the location of the WORK directory */
%let workdir = %sysfunc(getoption(work));
x "7za e /my/dir/myfile.7z -o&workdir.";
proc import
file = "&workdir./myfile.csv"
out = myfile
dbms = csv
replace;
run;

How to automate and import files that are located in date sequential folders into SAS?

I currently have 700 folders that are all sequentially named.
The naming convention of the folders are as follows:-
2011-08-15_2011-08-15
2011-08-16_2011-08-16
2011-08-17_2011-08-17
...
2013-09-20_2013-09-20
There are 10 txt files within each folder that have the same naming convention.
With the txt files all being the same, what I am trying to achieve is to automate the infile and then use the name of the folder, eg 2011-08-15_2011-08-15 or part of eg. 2011-08-15 to then be the name of the created data set.
I can successfully import all the txt files so there is no issue there, the issue is i don't want to be changing the folder name each time in the infile step.
'C:\SAS data\Extract\2011-08-17_2011-08-17\abc.txt'
Is there an easier way to read these files in? I can find code for sequential txt/csv files but can find nothing to reference a folder and then rename the data set.
You should be able to wildcard the folders/files into a single fileref, e.g.
filename allfiles "c:\SAS_data\extract\*\*.txt" ;
data alldata ;
length fn _fn $256. ;
infile allfiles lrecl=256 truncover filename=_fn ;
fn = _fn ; /* Store the filename */
input ;
put _INFILE_ ;
run ;
The wildcard folder & file works in SAS Unix, not sure about SAS PC.

How to refresh file view

In Enterprise Guide 4.2, is there a way to refresh your view of a file short of deleting it from the Process Flow then reopening it?
My Google-fu has failed to provide an answer (one way or the other) and my SAS admin has said he's not aware of a way (but to let him know if I find one).
A definitive "no" (from documentation) or a "yes" with example would be much appreciated.
I have a log file that's updated when I run my SAS program from the command line (outside of EG). I edit my code within EG, and I'd like to peek at the log file to see the results. Currently I have to delete the log file from my Process Flow then reopen it to see the updated log.
From your last comment on your question, it sounds like you are running a non-interactive SAS program on a server (from a PuTTY session) and looking at the log file with your EG client, is that correct? If so, there are much easier ways to watch the log file.
When you mention PuTTY, I'll assume your server is UNIX. If so, use the tail command with the -f option. For example, if your SAS program is named "myprog.sas", it will create a log file named "myprog.log", so try this command at your UNIX prompt:
tail -f myprog.log
The -f option means to continue writing output to your terminal window as lines are written to the log. When you get tired of watching (or your see the SAS "end of job" message), type the letter "q" to quit.
EG in intended to be the application that you use to actually execute your SAS program. Running things from the UNIX prompt is outside the design (and you lose all those cool EG features), as well as miss out any site features that have been set up for you in the metadata environment.
If I'm completely off-base, please clarify your question.
When using SAS EG or SAS Studio in a SAS platform were the compute nodes are hosted in Linux machines, I always use code to see the contents of an output file created by SAS; The only requirements are that you know the fullpath of the file you want to browse and that you have the privileges to read from it.
The simple idea is to use a generic DATA step to:
Access the file
Read line by line
Filter the data, by contents, position, line number, etc.
Print the data to the SAS log so you can see it there
Here is a simple example to get you going:
First I create a file for the test. You already have it!, so use yours.
/* create a test file */
data _null_;
file '/folders/myshortcuts/test/file'; /* Note I'm using fullpath */
put "The begining, :)";
put "line 2 in my file is shy and likes to hide.";
put "line 3, all good so far.";
put "line 4 in my file is to remain private.";
put "Finally the last line in my file!";
run;
Then, here is the code to read its data
data _null_;
/*--------
references which input file will be read
setting variable 'theEnd'=1 when
reaches end-of-file
Note I'm using fullpath
--------*/
infile '/folders/myshortcuts/test/file' end=theEnd;
/*--------
reads one line at a time from input file
storing it in variable _infile_
--------*/
input;
/*--------
contents of file will be writen in the log, to keep it readable,
mark where the contents of the file will follow
--------*/
if _n_=1 then
put "-----start file ----";
/*--------
filter out shy, private or unwanted data
--------*/
if _n_ ne 4; /* continue only if row number is not 4 */
if indexw(_infile_,"shy") le 0; /* continue only if data does not contains 'shy' */
/*--------
write the data you want, complete line read in this case
--------*/
put _N_= "->" _infile_;
/*--------
mark where the data in the file has ended
--------*/
if theEnd then put "-----end file ----";
run;
Your SAS log will look like this:
NOTE: The infile '/folders/myshortcuts/test/file' is:
Filename=/folders/myshortcuts/test/file,
Owner Name=sasdemo,Group Name=sas,
Access Permission=-rw-rw-r--,
Last Modified=11Jan2017:22:42:56,
File Size (bytes)=160
-----start file ----
_N_=1 ->The begining, :)
_N_=3 ->line 3, all good so far.
_N_=5 ->Finally the last line in my file!
-----end file ----
NOTE: 5 records were read from the infile '/folders/myshortcuts/test/file'.
The minimum record length was 16.
The maximum record length was 43.
NOTE: DATA statement used (Total process time):
real time 0.02 seconds
cpu time 0.03 seconds

SAS: Set current folder to the folder containing the running program

I've just started learning SAS because I'm required to use it for a statistics course. For this course, the university provides SAS 9.2 through their virtual-machine setup: I make a reservation in their system, they generate a VM on one of their servers, and I connect to the VM using Microsoft's Remote Desktop client. The virtual machines are generated and erased per session; settings are reset every time, and files must be stored on my client computer (which is accessible in the VM by a UNC path).
Within this setup, when I open a program file stored on my laptop, I've only been able to access the accompanying data files (each stored in the same folder as the program) either by hardcoding the full path or by updating the "current folder" setting at the beginning of each session. The first is problematic because it means the program won't run anywhere else - in particular, when I email it to the professor. The second is inconvenient, because browsing to this particular UNC path is time consuming, and I already have to browse to the same path to open the program file.
I want to make this easier by programmatically setting the current folder to the folder containing the program. Then I could just open the file and get to work. I've found some examples of getting the filename of the program file, of getting the path to a fileref, and of (link limit exceeded) setting the current folder, but I haven't been able to combine them in the right way. Please connect the dots for me.
To programmatically change the Windows current directory from SAS, you can use the X command, which is what really happens when you use the "Change current folder" dialog box:
x 'cd "\\computername\share name\folder"';
You can also do this using the SYSTEM data step function, a method I prefer because you get a return code (but more typing of course):
data _null_;
rc = system( 'cd "\\computername\share name\folder"' );
if rc = 0
then putlog 'Command successful';
else putlog 'Command failed';
run;
Note the UNC path is surrounded with double-quotes, which is necessary if the path contains blanks.
Of course, this still requires you to manually type in the command, but it might be something you could add to the program source code. If your VM environment allowed you to maintain some permanent presence on the server, you could save this command into a start-up file.
I would ask your professor for advice; if you are working with data given to you as part of your class, you may only need to send just the source code. On the other hand, if you are creating output data as part of your assignment, your professor might want your to deliver source code and SAS data sets. Surely he or she will have some procedure.
Complete Answer:
SAS's obtuse notation requires some strange delimiter fiddling to combine my partial solution (finding the path) with #Bob Duell's partial solution (setting the current folder). There seem to be two key rules involved:
&var is expanded in double-quoted strings ("&var"), but not single-quoted strings ('&var')
Quotes in &var are not treated as delimiters after expansion
So the solution is to compute a string of the quoted path (where the quotes are part of the string), and expand that within a double-quoted parameter to X or SYSTEM:
%let qsrc=%str(%")&src%str(%");
X "cd &qsrc"
It's not required to store the string, both &src and &qsrc can be expanded in-place, which yields a single statement solution:
X "cd %str(%")%substr(%sysget(SAS_EXECFILEPATH),1,%eval(%length(%sysget(SAS_EXECFILEPATH))-%length(%sysget(SAS_EXECFILENAME))))%str(%")";
This executes correctly, but breaks the syntax coloring in the GUI. Within a string, %str(%") and "" both expand to ", so replacing %str(%") with "" both executes correctly and is colored correctly in the GUI:
X "cd ""%substr(%sysget(SAS_EXECFILEPATH),1,%eval(%length(%sysget(SAS_EXECFILEPATH))-%length(%sysget(SAS_EXECFILENAME))))""";
This inherits the limitation that it only works when SAS_EXECFILEPATH and SAS_EXECFILENAME are defined, which is the case when running from within the Windows GUI editor. It's also subject to any limitations on in the "cd" command, which SAS intercepts rather than invoking the Windows command line. I expect it will fail on paths containing quotes.
A partial answer: One way to get the containing folder from the filename of the program file
Spread out & logging steps:
/* Find PathName of folder containing program */
%let FullName=%sysget(SAS_EXECFILEPATH);
%put FullName: &FullName.;
%let FullLen=%length(&FullName);
%put FullLen: &FullLen.;
%let BaseName=%sysget(SAS_EXECFILENAME);
%put BaseName: &BaseName.;
%let BaseLen=%length(&BaseName);
%put BaseLen: &BaseLen.;
%let PathLen=%eval(&FullLen.-&BaseLen.);
%put PathLen: &PathLen.;
%let PathName=%substr(&FullName,1,&PathLen);
%put PathName: &PathName.;
Consolidated & silent:
/* Find src folder */
%let src=%substr(%sysget(SAS_EXECFILEPATH),1,%eval(%length(%sysget(SAS_EXECFILEPATH))-%length(%sysget(SAS_EXECFILENAME))));
This only works when SAS_EXECFILEPATH and SAS_EXECFILENAME are defined, and it's not clear when that is. It does work when using the Windows GUI editor.