How to approach reading a large codebase

How to approach reading a large codebase - sas

I've moved to a larger organization where there's lots of existing code(mainly SAS - the WPS version).
This is the first time I'm using the language, and I'm having trouble understanding the code, I'm not able to figure out how to approach understanding the large codebase.
P.S : Existing questions were not SAS-specific, I posted so people with SAS experience could help

I have converted 1000's of line of code from SAS to Teradata SQL and this are my learnings. if you have basic SAS knowledge, you should be fine.
It can be complex too, once I had issue with very complex regex code, which was difficult for me at that point of time. following two steps helped me out.
Read the code step by step and if you are not clear than run the code step by step in
development area, make sure you are not overwriting permanent tables. This will help understand what is happening in each step. Write comments for yourselves at each step, so that you can understand better.
If you are assigned for rewrite then run original code(step by step) and rewritten code step by step and compare results (do not overwrite permanent datasets). Also compare final resultant sets too.

Related

Reverse engineer SAS code to create a mapping document

I have inherited a large base of SAS code. I need to reverse engineer to create some mapping document, so that given a field in the final output dataset, we can easily trace it all the way back to one of the inputs.
I can create it by hand, but can SAS automatically generate something like this?

No, I don't think there is any ready-made automated way of doing this.
Bear in mind that it is possible to create variables and pass them through a whole series of procs and data steps without mentioning them by name anywhere in the source code. Some sort of run-time analysis is therefore unavoidable.
Reeza's suggestion of using proc scaproc will yield some useful information for code executed within a single self-contained job running in a single SAS session, and the ATTR option in the record statement might be of some help to you when tracing the lineage of variables, but I'm afraid that however you approach this, it's going to take quite a lot of work.

How to make SAS Enterprise Guide show only the latest data created in Output Data?

Also how to make SAS Enterprise Guide to go to Output Data tab after a successful run instead of Results tab?

It's going to go to Results if you produce results, I don't think there's any way around that. You will get all of the datasets produced in that program in the Output Data tab, as well.
However, I'm going to venture a guess here: you're using EG like a single window programming environment, right? You have a single long program? If so, then the answer is simple: split your program up. EG is intended to have lots of small programs, each of which have a very specific goal and output. Think of it as steps: each program node is just one step, could be even just one data step. Then you link several programs together, in sequence, and run them all as a process flow (or an ordered list).
If you treat it that way, then it works very much like you say: you can have one program that has one output dataset and no results window, if your last program in a process flow just produces one dataset and no results.

SAS Code to examine another SAS program

I have a bit of a problem trying to code up what I want to do in SAS and I was hoping to get some advice from someone. I was wondering if it is possible to write code that will examine another piece of existing SAS code and bring up a list of the required input datasets and variables. I am wanting to invoke other SAS code in an automated process using the %include function and a prompt for the user to define the exact name/ location of the code as this will be different every time. But before this I want to somehow check this code, rename an existing dataset to be the input dataset and check that I have all the required variables before running the %include.
I was hoping someone might be able to tell me if this is at all possible and if so what function I would use. I am using EG 5.1 if that makes any difference.
Thanks for your help.
Steph.
P.S. Thanks for your help guys. Sorry if this question is outside the scope of this site, I thought there might be a simple function to achieve this, similar to %include. Also, I have never posted on this site before so apologies if I did stuff wrong.

I have a data set and I need to create an excel sheet in exact below format...Is there any way to do so?

Assume here is the data set.....
Aspect Evaluation Quarter Percentage
HOST/HOSTESS DIVERSIONS /687 Excellent Q1 40%
ROCKIN' BAR D / WAVEBANDS/ EVOLUTION Excellent Q1 50%
KNOWLEDGE OF SERVER TEAM – ROTATION Excellent Q1 60%
Trying to generate below Excel Sheet with same color and Structure, assume the above percentage will be populated in “% Within” column ......
Any way to get the excel in this required format....?I appreciate any help...
Thanks,
Sam

If you're going to do color and such, you have a few options. PROC EXPORT won't do it, of course. So instead, you need to do either Excel Tagsets, DDE, or create an unformatted sheet and use a macro from a template to copy the colors in.
Benefits/Drawbacks:
Excel Tagsets:
Benefits: Make the exact format entirely in SAS code. Have a great deal of control with a fairly simple interface. Uses the powerful PROC TEMPLATE to define styles, which allows highly portable and reusable code.
Drawbacks: Makes an .xml file that is readable by excel, not actually a .xls/.xlsx file. Does have some limitations in what it can do. Can be buggy. Probably the slowest to code of the three options, unless you are very familiar with it.
DDE:
Benefits: Once you make the template (once) in Excel, can make exactly what you want fully in SAS. Can do 100% of what Excel does.
Drawbacks: Uses somewhat outdated method, so fewer SAS programmers are familiar with it. Requires Excel to be installed on the machine, and open (you can open it as part of the DDE program). Somewhat slower to copy data in, and requires more careful checking to verify data went where it should go. Requires knowing DDE commands.
Template/copy:
Benefits: Likely fastest method in terms of set up time. Can do everything exactly like what excel does. Easy for other programmers to understand, as long as they know Excel/VBA and SAS.
Drawbacks: requires outside-of-SAS step to run copy macro (could be called from SAS via DDE or batch file, but more commonly would be done by hand). Does require some knowledge of VBA as well as SAS.
In general, I recommend trying Excel Tagsets first; if they don't work for your needs, try either of the other two options. Some good papers on Excel Tagsets for the beginner:
http://support.sas.com/resources/papers/proceedings11/170-2011.pdf
http://support.sas.com/resources/papers/proceedings12/207-2012.pdf
http://www2.sas.com/proceedings/forum2008/036-2008.pdf
I think you could create the above pretty easily using excel tagsets and proc report; follow the first paper in particular as it seems to be the most similar to what you're doing. If you run into any issues, post them as separate questions and we should be able to help you out.

How to quickly debug when something wrong in code workflow?

I have frequently encounter the following debugging scenario:
Tester provide some reproduce steps for a bug. And to find out where the problem is, I try to play with these reproduce steps to get the minimum necessary reproduce steps. Sometimes, luckily I found that when do a minor change to the steps, the problem is gone.
Then the job turns to find the difference in code workflow between these two reproduce steps. This job is tedious and painful especially when you are working on a large code base and it go through a lot code and involve lots of state changes which you are not familiar with.
So I was wondering is there any tools available to compare "code workflow". As I've learned the "wt" command in WinDbg, I thought it might be possible to do it. For example, I can run the "wt" command on some out most functions with 2 different reproduce steps and then compare the difference between outputs. Then it should be easy to found where the code flow starts to diverge.
But the problem with WinDBG is "wt" is quite slow (maybe I should use a log file instead of output to screen) and not very user-friendly (compared with visual studio debugger) ... So I want to ask you guys is there any existing tools available . or is it possible and difficult to develop a "plug-in" for visual studio debugger to support this functionality ?
Thanks

I'd run it under a profiler in "coverage" mode, then use diff on the results to see which parts of the code were executed in one run by not the other.

Sorry, I don't know of a tool which can do what you want, but even if it existed it doesn't sound like the quickest approach to finding out where the lower layer code is failing.
I would recommend to instrument your layer's code with high-level logs so you can know which module fails, stalls, etc. In debug, your logger can write to file, to output debug window, etc.
In general, failing fast and using exceptions are good ways to find out easily where things go bad.

Doing something after the fact is not going to cut it, since your problem is reproducing it.
The issue with bugs is seldom some interal wackiness but usually what the user's actually doing. If you log all the commands that the user enters then they can simply send you the log. You can substitute button clicks, mouse selects, etc. This will have some cost but certainly much less than something that keeps track of every method visited.

I am assuming that if you have a large application that you have good logging or tracing.
I work on a large server product with over 40 processes and over one million lines of code. Most of the time the error in the trace file is enough to identify the location of problem. However sometimes the error I see in the trace file is caused by some earlier code and the reason for this can be hard to spot. Then I use a comparative debugging technique:
Reproduce the first scenario, copy the trace to a new file (if the application is multi threaded ensure you only have the trace for the thread that does the work).
Reproduce the second scenario, copy the trace to a new file.
Remove the timestamps from the log files (I use awk or sed for this).
Compare the log files with winmerge or similar, to see where and how they diverge.
This technique can be a little time consuming, but is much quicker than stepping through thousand of lines in the debugger.
Another useful technique is producing uml sequence diagrams from trace files. For this you need the function entry and exit positions logged consistently. Then write a small script to parse your trace files and use sequence.jar to produce uml diagrams as png files. This is a great way to understand the logic of code you haven't touched in a while. I wrapped a small awk script in a batch file, I just provide trace file and line number to start then it untangles the threads and generates the input text to sequence.jar then runs its to create the uml diagram.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js