New to SAS - need help getting started - sas

My department maintains all sorts of jobs and reports based on SAS, in a mainframe/batch environment (ie ugly JCL green screens).
I have been enrolled in an expensive training program from the SAS Institute. One of the first parts of the training asks me to install files from a zip file. "Open SAS" and run some files. I can't "open SAS" because I don't have it. It is embedded on the mainframe.
They provide some extremely limited instructions to work in z/OS. but I can't even figure out the basics like how to make a dataset to put the learning file into. They really give no guidance. They assume you already know how to use SAS.
Anyway, the training shows examples in Windows using SAS Enterprise Guide. I would like to get that and use it instead, at least for learning the SAS language. But when I called SAS just to find out if that is a free download, or if not how much it would cost, they said they would call me back and never did. So I just want to know a ballpark for how much it would cost me to get this tool. Also, if I had that tool, would I be able to use it to access the jobs on the Base SAS that I already have (on the mainframe) or would I have to purchase another Base SAS for Windows? I haven't been able to find answers to these questions via a Google search, and the SAS company didn't call me back. Can anyone with more knowledge about this help me out?

As far as I know, SAS Institute does not provide their software to individuals. They work with organizations and the yearly licence could cost tens to hundreds thousand dollars, depending on the components included and the number of processors or users.
There was crippleware SAS Learning Edition but they discontinued it.
I wonder if you can ask for refund for your expensive training program. Alternatively, you can try to run SAS scripts in batch mode on your mainframe. There are some third-party solutions for IDE like EMACS Speaks Statistics (ESS). You will lose functionality like dataset viewer.

Related

Code to run that can help benchmark a new SAS drive?

We just got a new Server at work that will be used only for running SAS code, and I've been asked to run some tests and make sure that it's performing better than our other servers. I'm not an expert at this so I want to make sure I avoid making naive mistakes that don't properly measure the performance of the server. My header looks like this:
options fullstimer;
%LET BenchStartTime = %sysfunc(datetime(),22.);
Which I use as a check for the "real time" report in the log. I have a vague understanding of the difference between "user cpu time" and "system cpu time", but if anyone wants to offer up additional information on that, that would be helpful.
Anyway, the main point of this post is that I want to know if there are any standard benchmark tests that I should be using in order to see if this new server is better than the old ones. Currently I'm using something I found online which is just appending a bunch of copies of sashelp.class (but I think this might be a bad idea because pulling from the C: and loading it into a different drive might be the same across all servers, right? If the C: is the slowest drive, doesn't that become the bottleneck?), and I'm also using code that basically generates a bunch of random data of a fixed size and comparing runtimes. Is this the correct approach? Is there something else I should be doing? How many times should I be running these benchmark tests to make sure that it isn't a fluke?
Thanks for your help!
I would test by doing the things that you normally do. If you run larger merges, then you're basically talking I/O; so just make a very large dataset, write it out, read it in, etc., and perform the same test on the other machine. Perform each test a few times each in a fresh SAS session.
Further, it sounds like you need to make sure the new server can handle multiple concurrent sessions. You can simulate this in part by submitting many connections from one computer by using SAS/CONNECT; that allows you to start multiple concurrent sessions. Then do so, perhaps I would create a script that starts a local SAS session, signs on to the server and rsubmits a job of a normal difficulty that takes maybe 5 to 10 minutes, and then run that script 20 times concurrently (you can script this or just double click a .bat file 20 times). See how it handles it as compared to the other server.
On SAS 9.2 and later you can use the %SASHOME%/SASFoundation/9.x/sasiotest.exe utility. You will find some further Guidance in Support note 51659.
Edit:
A similiar utility can be downloaded for Unix Plattform, details in Support note 51660.

Can I create a somewhat complex Mechanical Turk HIT without much programming experience?

I have a task that seems well-suited to Mturk. I've never before used the service, however, and despite reading through some of the documentation I'm having a difficult time judging how hard it would be to set up a task. I'm a strong beginner or weak intermediate in R. I've messed around with a project that involved a little understanding of XML. Otherwise, I have no programming or web development skills (I'm a statistician/epidemiologist). I'm hoping someone can give me an idea of what would be involved in creating my task so I can decide of it is worth the effort to learn how to create a HIT.
Essentially, I have recurring projects that require many graphs to be digitized (i.e. go from images to x,y coordinates). The automatic digitization software that I've tried isn't great for this task because some of the graphs are from old journal articles and they have gray-scale lines that cross each other multiple times. Figuring out which line is which requires a little human judgement. Workflow for the HIT would be to have each Mturker:
Download a properly named empty Excel workbook.
Download a JPEG of the graphs.
Download a free plot digitization program.
Open the graph in the plot digitization software, calibrate the axes, trace the outline of each curve, paste the coordinates into the corresponding Excel workbook that I have given them, extract some numbers off the graph into a second sheet of the same workbook.
Send me the Excel files.
I'd have these done in duplicate to make sure that there is acceptable agreement between the two Mturkers who did each graph.
Is this a reasonable task to accomplish via Mechanical Turk? If so, can a somewhat intelligent person who isn't a programmer/web developer pull it off? I've poked around the internet a bit but I still can't tell if I just haven't found the right resource to teach me how to do this or if I'd need 5 years of experience as a web developer to pull it off. Thanks.
No this really isn't a task for Mechanical Turk at all. Not only because you are requiring them to download a bunch of stuff which they won't do, but it's way too complex for them to have confidence they are doing it right and will get paid. Pay is binary so could go through all that for nothing.
You are also probably violating terms of service if they have to divulge personal info for the programs.
If you have a continuous need for this then MAYBE you can prequalify people by creating qualification on the service and then using just those workers.

How can I monitor who is online and their CPU Usage in SAS Management Console?

I am tasked to manage company's SAS server (Red Hat Enterprise Linux Server release 6.3 64 bit(Santiago)
As the title suggests, I need to see who is using SAS Enterprise Guide and SAS Enterprise Mining software and How much CPU they consume on their jobs while they are working. I learned that SAS does not have a built in tool for that and I have to use a 3rd party software. I am seeking a mostly graphical tool if it exists.
I'm not sure what sort of ready-made graphical tools might be available for your particular OS. However, one option you could investigate further would be to execute a Linux top command via a filename pipe statement in SAS. You could then use this as the infile in a data step, and then get SAS to produce some graphical output from it.
It may take quite a bit of fiddling around to get exactly what you want, but if you persevere you should be able to produce the sort of output that you're interested in - e.g. the total CPU time / number of processes per user etc over a given interval.

Is there any company that is still using Pro*C SQL?

There is a semester where each student need to develop a system using VB.NET. As time goes by, its become easy to just copy and paste others project and edit the interface, images, etc. So my lecturers decided to change from using VB.NET, now we all have to use Pro*C SQL to connect to the Oracle using C++.
I personally support and help the lecturers to provide the guide necessary to start on using Pro*C since I have done it before. The reason is that without proper knowledge of basic programming, students will not be able to just copy, paste and edit any Pro*C project.
My question is that, how practical is this kind of approach? And does any other company that are still using Pro*C SQL out there? Google does not bring me much recent result. I hope this is the best place to ask this kind of question
Yes. I have been part of many companies which use pro*c for many batch program development. Retail-billing,car-rental and telecom billing are some of the domains that I have worked with that still use pro*c. This methodology is quite successful and I have seen many applications that have been running for last 15 years or so. Hope I answered your question.

Batch Geocoding with Garmin Mapsource

I lost track of this effort years ago but have need to geocode thousands of addresses nightly. I must use the very accurate database sitting on the machine, installed when the Nuvi map update installed Mapsource.
When I contacted Garmin years ago, they expressed an interest in providing an API for this, but then I heard nothing and did not follow up. Their database is provided by navtec? I believe. Anyone have experience with that format?
I posted on the Garmin Developer forum a while ago, but its a little lethargic over there :)
Has anyone done this?
Does anyone know how it might be done without an API; meaning database structure and calls?
I'll take a solution in any language.
Added:
Garmin has expressed an interest in making this available to me. They just have not done it.
I do not know the database format.
I am NOT looking for an online solution or any other "alternative". This question is very specific.
Talk to Navtec directly. They will sell you or license you their database directly. The database tables are clearly documented, then write your own Geocoder on top. Took me about a week 4 years ago, and I was marginally profficient in SQL at the time.
You can geocode up to 10,000/day by city with NN4D after you get their free application key.
You can geocode for $18 per 1,000 with CoreLogic (aka Proxix)
Yahoo looked most promising because it has the Hadoop feature, which is also currently being utilized at Navteq. I've contacted a guy at Navteq who uses Hadoop, and I'm awaiting his feedback. According to Ben Lorica's article on Datameer O'Reilly.com entitled "Big Data Tool for Business Analysts", Datameer can upload from spreadsheets to Hadoop. Hadoop is a pipeline to Navteq.
Starting point - a list of the tools at the GIS Dept at USC
(I can only have one link because I'm new, but I'll add the rest when I get my points up.
naveteq uses oracle format
BUT HOLD 1 SECOND:
doing 1000 lookups(per night) is easy,
doing 10000 lookups(per night) requires a good server,
doing 1000000 lookups(per night) requires a cluster
letting them do the searches requires less hardware(and more traffic) using xml-rpc or similar rpc would be the best( for everyone)
buy oracle db and start working
you can use almost anything BUT keeping in mind the volume you should use a compile language like c++
gpsbabel.org has lots of stuff on converting between lots of GPS formats, and a downloadable tool. My limited experience, mostly with google maps, streetview etc. is that geocoding is not very accurate.
cM
The free IBM DB2 Express-C DBMS comes with Spatial Extender that can be used to GEOcode US addresses. See a webinar on this. Don't know if this is exact fit but it can't hurt to take a look.
Also take a quick look the DB2 documentation http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/topic/com.ibm.db2.luw.spatial.topics.doc/doc/csbp3008.html