How do i convert MSSQL query to Postgres Query - database-migration

I have to migrate complex SQL query need to convert in Postgres.
Complex SQL query :
More than 4 table Join , lots of filter , Aggregate functions, CASE when then etc.
For Ex:
Sample input
Select
ROW_NUMBER() OVER (ORDER BY getdate()) AS ID,
GETDATE() as time,
arc.Customer as Customer,
GroupBy4KPI,
arc.MasterAccount,
arc.CustomerClass,
from tablename_1 arc left join tablename_2 varc on arc.Customer = varc.Customer
Please suggest if we have any custom function or tool to accomplish conversion of DML statement from MSSQL to PSQL.
Fact
Such, 1600 Queries i have to parse. So, this job is repetitive one. i have to perform or parse MSSQL queries on daily basis.
Any help on it would be much appreciated ?

Related

Athena equivalent to information_schema

For background, I come from a SQLServer background and make heavy use of the system tables & information_schema, to tell me all about my tables and columns.
I didn't expect the exact same power in Athena, but currently very shocked and frustrated with what little seems to be available - unless I've missed something ?
For example, 'describe mytable' - just describes 1 table at a time.
How about showing the columns for ALL tables in one result ?
It also does not output the table name, nor allow you to manually add that in as a custom column.
All the results of these "show/list/describe" commands seem to produce a text list - not a recordset, so you cannot take the results and join them to other tables or views to make more complex outputs.
Is there any other way to query the contents of my databases ?
Thanks in advance
Athena is based on Presto. Presto provides information_schema schema and I checked and it is accessible in Athena.
You can run e.g. a query like:
SELECT * FROM information_schema.columns;
to get a list of columns of all tables.
You can filter this by "database":
SELECT * FROM information_schema.columns WHERE table_schema = '<databasename>';
Note however that these types of queries are not necessarily very performant.

The processing of large data sets in sas

I am looking for solutions or ideas how to speed the processing of large data sets in sas.
What would you recommend?
What is better data step or proc sql procedure?
Speeding up your data processing depends on where your data is saved.
Your data can be either in:
SAS Table,
Database Table (Miscrosfot SQL, Oracle, DB2, MYSQL, ..
etc.)
Use SAS Data Step when:
You are querying/processing SAS tables,
You want to do iterative
processing (ex. retaining values or using arrays).
Use Proc SQL when:
You are querying a large Database table,
You can do a SQL "Pass Through" where you send SQL code to be
executed on the DB server and only the output is sent to SAS (instead
of bringing the entire tables through the network to SAS and then filter it),
You want to query SAS Tables but prefer SQL joins to data step merges.
Another topic you should consider is efficiency programming; where you are optimising your query and look-ups.
I find Proc SQL to be better for my use cases. We may need some more specifics on the size and variety of data your trying to join/export etc.
Give us some info on that and we can try to help.
Tips:
Limit the fields your pulling over
Subset data
Anecdotally from my experience Proc SQL seems faster.
Here are two tips on speeding up queries with Proc SQL:
In general, you want to rule out as much data as possible when querying. If you are usingProc SQL, the order of the restrictions in the where clause matters. Put the most restrictive parts first.
For example, if I'm querying a database for teachers with the last name "JONES", that were hired after Jan 2005, I would structure my where clause like this: where last_name = 'JONES' and hire_date > 200501 I would do this because last name is likely to exclude more records than the hire date restriction.
When possible, don't use Select * instead, list out the specific columns that you need. Remember, even if you are doing a calculation with a column, you don't have to include that column in your select statement.
Here is a very useful resource for understanding how to use proc sql efficiently. I recommend reading it in it's entirety if you do a lot of work with large data sets in SAS.
http://www2.sas.com/proceedings/sugi29/127-29.pdf

SAS, SQL explicit passthrough, multiple Teradata databases

I have inherited a steep Teradata SQL query which runs on 3 Teradata databases.
Preferring not to get bogged down in the functional aspects of the query (with various windowing statements), I would like to pass the query explicitly through to Teradata (same server).
The construct that I am familiar with connects to only one database, e.g.:
proc sql;
connect to teradata (user="userid" password="password1" mode=teradata
database=DB1 tdpid="MyServer");
create table TD_Results as
select * from connection to TERADATA
(
... TD SQL CODE
... TD SQL CODE
);
quit;
Does anyone have an idea as to how the original TD SQL query referencing 3 databases could be used via passthrough?
Thanks.
Q.
What Teradata calls a DATABASE is what ORACLE calls a SCHEMA. You just use a two level name to reference the tables.
select a.x,b.y,c.z
from db1.table1 a
, db2.table2 b
, db3.table3 c
If you mean that you need to select from multiple servers then I think you need to look into using QueryGrid syntax. In that syntax you can add the server name with a trailing # on the table reference.
select a.x,b.y,c.z
from db1.table1 a
, db2.table2#server2 b
, db3.table3 c

Reading (even joining) a very large (1.1bn row) table in Enterprise Guide from Teradata

Hopefully you guys can help with what I'm hoping is quite a simple question for those in the know!
I live (well, work) in SAS Enterprise Guide and am trying to perform a simple left join against a table in Teradata.
The table is extremely large (700+ columns, 1.1bn rows) and so far I have been connecting via a LIBNAME statement at the top of my program, followed by the usual PROC SQL to read the data.
The issue I am having is its is extremely slow. I performed the join successfully using 90 rows on the left table and it took 3 hours to complete. The real table I want to use has something like 15,000 rows.
I have tried to connect via the SQL Pass-Through method, but this throws a hosts file error, which I can't fix due to corporate security limitations.
Has anyone had any experience performing this kind of task?
I should mention that I can run a simple select * query in Teradata SQL Assistant is just over 1 minute (16,666,666 obs/s!) so the limitation must be somewhere between SAS/Teradata, or even SAS itself.
I'm sorry I haven't posted actual code snippets as they're on my work machine but this has been bugging me for ages so thought I'd see if I'm missing any tricks.
Thanks in advance for your help.
So you're joining a SAS data set to a Teradata table and want to return the matching records. You'll want to use SAS's DBMASTER= data set option. It designates which of the tables is larger. By telling SAS this, it knows which table to move.
Here I assume librefs have already been assigned and that the Teradata table is larger--more obs--than the SAS data set.
proc sql threads; select tdTable.* from sastables.sasTable1, td.tdTable(dbmaster=yes)
where tdTable.idNum=sasTable1.idNum; quit;
If by chance your SAS data set is larger, you'll want to use the MULTI_DATASRC_OPT= option. Either google these terms or look in the SAS/Access to Relational Databases manual. It's pretty good.
Good luck.
Have you considered creating a volatile table in Teradata? Since this is created in your spool allocation you shouldn't need explicit permissions to create the table. Once created you can load the SAS data set into the Volatile table and collect statistics on the table's join columns and filter columns. This will help the optimizer understand the demographics about your "small" table. The volatile table will only persist for the duration of your session and is not accessible across multiple sessions.
Then rewrite your SAS code to push-down the SQL to Teradata joining the large table to your volatile table. The results can be returned to SAS and loaded into another data set.
CREATE VOLATILE TABLE MyTable, NO FALLBACK
( ColA SMALLINT NOT NULL,
ColB VARCHAR(10) NOT NULL
) PRIMARY INDEX (ColA)
ON COMMIT PRESERVE ROWS /* This is important */
;
The primary index is how Teradata distributes the data and accesses the data. Tables distributed on the same column will join "AMP local" and will not require a redistribution. This is not always possible, as your primary index selection has to consider even distribution as well as access path. The primary index does not have to be unique, but can be.
Hope this helps.

Limiting results in PROC SQL

I am trying to use PROC SQL to query a DB2 table with hundreds of millions of records. During the development stage, I want to run my query on an arbitrarily small subset of those records (say, 1000). I've tried using INOBS to limit the observations, but I believe that this parameter is simply limiting the number of records which SAS is processing. I want SAS to only fetch an arbitrary number of records from the database (and then process all of them).
If I were writing a SQL query myself, I would simply use SELECT * FROM x FETCH FIRST 1000 ROWS ONLY ... (the equivalent of SELECT TOP 1000 * FROM x in SQL Server). But PROC SQL doesn't seem to have any option like this. It's taking an extremely long time to fetch the records.
The question: How can I instruct SAS to arbitrarily limit the number of records to return from the database.
I've read that PROC SQL uses ANSI SQL, which doesn't have any specification for a row limiting keyword. Perhaps SAS didn't feel like making the effort to translate its SQL syntax to vendor-specific keywords? Is there no work around?
Have you tried using the outobs option in your proc sql?
For example,
proc sql outobs=10; create table test
as
select * from schema.HUGE_TABLE
order by n;
quit;
Alternatively, you can use SQL passthrough to write a query using DB2 syntax (FETCH FIRST 10 ROWS ONLY), although this requires you to store all your data in the database, at least temporarily.
Passthrough looks something like this:
proc sql;
connect to db2 (user=&userid. password=&userpw. database=MY_DB);
create table test as
select * from connection to db2 (
select * from schema.HUGE_TABLE
order by n
FETCH FIRST 10 ROWS ONLY
);
quit;
It requires more syntax and can't access your sas datasets, so if outobs works for you, I would recommend that.
When SAS is talking to a database via SAS syntax, part of the query can be translated to DBMS language equivalent - this is called implicit pass through. The rest of the query is "post-processed" by SAS to produce final result.
Depending on SAS version, DBMS vendor and DBMS version, and in some cases even some connection/libname options, different parts of SAS syntax are translatable/considered compatible between SAS and DBMS and thus sent to be performed by DBMS instead of SAS.
With SAS SQL options - INOBS and OUTOBS - I've worked a lot with MS SQL and Oracle via different versions of SAS, but I haven't seen those ever translated to TOP xxx type of queries, so this is probably not supported yet, although when query touches just DMBS data (no joins to SAS data etc), should be quite doable.
So I think you're left with the so called explicit pass-through - specific SAS SQL syntax to connect to database. This type of queries look like this:
proc sql;
connect to oracle as db1 (user=user1 pw=pasw1 path=DB1);
create table test_table as
select *
from connection to db1
( /* here we're in oracle */
select * from test.table1 where rownum <20
)
;
disconnect from db1;
quit;
In SAS 9.3 the syntax can be simplified - if there's already a LIBNAME connection, you can reuse it for explicit pass-through:
LIBNAME ORALIB ORACLE user=...;
PROC SQL;
connect to oracle using ORALIB;
create table work.test_table as
select *
from connection to ORALIB (
....
When connecting using libname be sure to use READBUFF (I usually set some 5000 or so) or INSERTBUFF options (1000 or more) when loading database.
To see if implicit pass-through takes place, set sastrace option:
option sastrace=',,,ds' sastraceloc=saslog nostsuffix;