I would like to use SAS code to build data mining models. I am aware of SAS Enterprise Miner and SAS Enterprise Guide but I prefer to write my own code. What SAS procedures can I use to do so without the need for Enterprise Miner or Enterprise Guide?
Related
This publication:
Pdf
mentions that SAS can only run (MapReduce?) some modeling PROCs on Hadoop. Does anyone know these PROCs or even better is there an exhaustive list? Thanks.
Christian
I haven't seen a list of procedures that will run against Hadoop, it also changes quickly due to additions to SAS Software along with changes in Hadoop.
There is a SAS Support for Hadoop page at SAS 9.4 Support for Hadoop | RESOURCES / THIRD-PARTY SOFTWARE REFERENCE
There are a lot of articles off this page but the link for the Hadoop Support Matrix is helpful see what SAS Products support Hadoop including the distribution, version, and any known issues.
There are several papers available but I think this one about Data Modeling Considerations in Hadoop and Hive by Clark Bradley, Ralph Hollinshead, Scott Kraus, Jason Lefler, Roshan Taheri October 2013 might be a good fit for you.
SAS has a variety of products that interact with many database type systems including Hadoop, and the options are growing along with Hadoop. There are some newer products such as SAS Data Loader for Hadoop (and others) that support running jobs using the Spark Engine instead of MapReduce. Also just to clarify most Hadoop clusters run yarn/mapreduce2, not just Mapreduce 1. SAS can support Mapreduce, but most of the time clusters are running yarn/mapreduce2 which SAS also supports. In addition some SAS programs running against Hadoop may not require a Mapreduce job at all sometimes, depending on what you are trying to do.
There are only a few common engines in Hadoop to use honestly. You have the older Mapreduce (1) which is much older. Then came yarn/mapreduce2 after that which is likely the most common execution engine to date. The Spark Engine has been available for a few years but it still pretty new, it is supposed to be faster yet not as flexible as Mapreduce2 from what I've heard. I think Hortonworks has an engine called Apache Tez which will work with SAS from my experience. Apache Tez still uses yarn just instead of Mapreduce2 engine it uses Apache Tez. It seems for the most part your SAS client won't even know the difference, and so far I haven't ran into any issues with yarn running Tez. There may be a few smaller projects out there, but these are the only ones I run across.
I have two platforms and what I want to do is, when I am using Qlik Sense, I want to tell SAS to do the some regression analysis on the table that I loaded on Qlik Sense and show me the results in Qlik Sense. Is it possible interconneting these two software?
Note: SAS ODBC connector is available but I am not sure if installing this will allow me to use SAS scripts on Qlik Sense Editor. It seems this tool only allows me to see SAS table on Qlik View. I am not sure if it allows me to send commands to SAS like regression, decision tree etc..
If sas is installed on windows os then How to troubleshoot or catch its performance of sas DI jobs on unix? using any tools or commands or By using nmon?
Thank you...
Performance of each job in SAS Data Integration studio can be measured using a few techniques
ARM logging - Enabling/utilizing the Audit and Performance Measurement capabilities with SAS (commonly referred to as ARM logging).
You can add a parameter FULLSTIMER=I in the autoexec.sas for the session or for each session which will help in giving performance stats, there are many a code snippets or tools out there which can parse and give you fancy performance stats on the jobs.
I have been exploring a few more things now a days which is provided with SAS 9.4 called Environment Manager. This is a daemon which can be configured and give a lot of performance stats and other things. It is web based and a very handy tool for Admins.
Hope this information helps!
I am trying to migrate DB2 LUW 9.7 databases and data to PostgreSQL 9.3. Any suggestion on which will be the best approach to do it? Which will be the best tool or any open source tool available to perform this?
The db2look utility can reverse-engineer your DB2 tables into DDL statements that will serve as a good starting point for your PostgreSQL definitions. To unload the data from each table, use the EXPORT command, which dumps the results of any SQL SELECT to a delimited text file. Although the db2move utility can handle both of those tasks, is not going to be of much help to you because it extracts the table data into IBM's proprietary PC/IXF format.
If you're moving off of DB2 because of price, IBM provides a free-as-in-beer version called DB2 Express-C, which shares the same core database engine as paid editions of DB2. Express-C is a first-rate, industrial strength DBMS that does not have the sort of severe limitations that other commercial vendors impose on their no-cost engines.
We have datasets that are created and stored in SAS. I'm curious if it is possible to access those datasets from a remote SQL client. Why? To make those datasets more broadly accessible within our organization.
Yes, you can license a product called SAS/SHARE that includes something called SHARE*NET. This is a very useful product that typically is installed in a BI server environment but I suppose it's possible to run on a local desktop.
Basically, you "register" SAS libraries to a service which then makes the data available to external clients over ODBC. This makes the data sets available as "tables" for applications like Excel, so I'm sure you can use other clients as well.
The SAS ODBC driver itself does not require a license, but the SAS/SHARE software does. I use it to make data available to many users who do not have direct access to my UNIX server.
It might be possible through SAS/ACCESS (or something similar), but SAS datasets typically cannot be understood by third-party software.