Infomatica ISO-8601 date - informatica

I am using Informatica Powercenter and need to insert strings/dates similar to this one from a flat file to Oracle.
2013-07-23T13:55:33.0000000-04:00
In Oracle the solution is
to_timestamp_tz(‘2013-07-23T13:55:33.0000000-04:00’,’YYYY-MM-DD”T”HH24:MI:SS.FF7TZR’)
What is the best way to convert this string/date so that Informatica will interpret it correctly? None of the Informatica expressions I see handle the "-4:00" correctly.

This can be done, what is you target table ,I know for teradata as below
create you column in you target table as TIMESTAMP(6) WITH TIME ZONE
keep the value of data in a varchar column and connect it to the target column (mentioned above)
use TPT as you load and it will load the data.
If you are using oracle table find the equivalent.

Related

Retrieve acolumn name from in underlying dataset (Before it got renamed in Power BI)

I'm trying to build a dynamic data dictionary for my Power BI data set. To do that, I am querying the DMVs in DAX studio to get the objects names and descriptions from the model directly.
Used query for the columns details:
SELECT * from $SYSTEM.TMSCHEMA_COLUMNS
However, when I run this query, I'm getting ExplicitName = SourceColumn. I had assumed that the SoruceColumn would contain the column name before any transformation in PowerQuery. Does anyone have any idea on how to get the original column name (the name of the column in the SQL Server DB per example)?
I have found a solution for this. You can find the technical column names in:
select * from $SYSTEM.DISCOVER_STORAGE_TABLE_COLUMNS where [COLUMN_TYPE] = 'BASIC_DATA'
If building this type of dynamic data dictionary interests anyone, do let me know. I can share the end result when I'm done.

Reading (even joining) a very large (1.1bn row) table in Enterprise Guide from Teradata

Hopefully you guys can help with what I'm hoping is quite a simple question for those in the know!
I live (well, work) in SAS Enterprise Guide and am trying to perform a simple left join against a table in Teradata.
The table is extremely large (700+ columns, 1.1bn rows) and so far I have been connecting via a LIBNAME statement at the top of my program, followed by the usual PROC SQL to read the data.
The issue I am having is its is extremely slow. I performed the join successfully using 90 rows on the left table and it took 3 hours to complete. The real table I want to use has something like 15,000 rows.
I have tried to connect via the SQL Pass-Through method, but this throws a hosts file error, which I can't fix due to corporate security limitations.
Has anyone had any experience performing this kind of task?
I should mention that I can run a simple select * query in Teradata SQL Assistant is just over 1 minute (16,666,666 obs/s!) so the limitation must be somewhere between SAS/Teradata, or even SAS itself.
I'm sorry I haven't posted actual code snippets as they're on my work machine but this has been bugging me for ages so thought I'd see if I'm missing any tricks.
Thanks in advance for your help.
So you're joining a SAS data set to a Teradata table and want to return the matching records. You'll want to use SAS's DBMASTER= data set option. It designates which of the tables is larger. By telling SAS this, it knows which table to move.
Here I assume librefs have already been assigned and that the Teradata table is larger--more obs--than the SAS data set.
proc sql threads; select tdTable.* from sastables.sasTable1, td.tdTable(dbmaster=yes)
where tdTable.idNum=sasTable1.idNum; quit;
If by chance your SAS data set is larger, you'll want to use the MULTI_DATASRC_OPT= option. Either google these terms or look in the SAS/Access to Relational Databases manual. It's pretty good.
Good luck.
Have you considered creating a volatile table in Teradata? Since this is created in your spool allocation you shouldn't need explicit permissions to create the table. Once created you can load the SAS data set into the Volatile table and collect statistics on the table's join columns and filter columns. This will help the optimizer understand the demographics about your "small" table. The volatile table will only persist for the duration of your session and is not accessible across multiple sessions.
Then rewrite your SAS code to push-down the SQL to Teradata joining the large table to your volatile table. The results can be returned to SAS and loaded into another data set.
CREATE VOLATILE TABLE MyTable, NO FALLBACK
( ColA SMALLINT NOT NULL,
ColB VARCHAR(10) NOT NULL
) PRIMARY INDEX (ColA)
ON COMMIT PRESERVE ROWS /* This is important */
;
The primary index is how Teradata distributes the data and accesses the data. Tables distributed on the same column will join "AMP local" and will not require a redistribution. This is not always possible, as your primary index selection has to consider even distribution as well as access path. The primary index does not have to be unique, but can be.
Hope this helps.

Read large tables from Teradata into SAS via DI Studio for further processing

I am creating a ETL job in SAS DI Studio to read Teradata table into SAS, then to apply user written code on top of it and then load the SAS work table into Teradata again using Teradata Table Loader.
What is the best way to read the large teradata table (50Cr or more records) into SAS work space? I am currently using following Base SAS code to create a work table:
data work.out_table;
set db.in_table;
<sas-statements>;
run;
Try adding FASTEXPORT=YES option in the Library reference for Teradata or adding dataset option for the Input table within Table option tab of Append or TableLoader transformation.
I'm assuming you are on Teradata 13 > greater. If you are using older version then instead of FASTEXPORT=YES use DBSLICEPARM=ALL parameter. Since FASTEXPORT uses TPT API and that was introduced from Teradata 13 onwards.

Parquet: read particular columns into memory

I have exported a mysql table to a parquet file (avro based). Now i want to read particular columns from that file. How can i read particular columns completely? I am looking for java code examples.
Is there an api where i can pass the columns i need and get back a 2D array of table?
If you can use hive, creating a hive table and issuing a simple select query would be by far the easiest option.
create external table tbl1(<columns>) location '<file_path>' stored as parquet;
select col1,col2 from tbl1;
//this works in hive 0.14
You can use JDBC driver to do that from java program as well.
Otherwise, if you want to stay completely in java, you need to modify the avro schema by excluding all the fields but the ones you want to fetch. Then when you read the file supply the modified schema as reader schema and it will only read the included columns. But you will get you original avro record back with excluded fields nullified, not a 2D array.
To modify the schema look at org.apache.avro.Schema and org.apache.avro.SchemaBuilder. make sure that modified schema is compatible with the original schema.
Options:
Use Hive table to create table with all columns with storage format parquet and read the required columns by specifying the column names
Create Thrift for the table and use the thrift fields to read the data from code (Java or Scala)
You can also use apache drill that natively parse parquet files.

NULL TO NOT NULL ALTER TABLE IN SAS

I'm having difficulties running a SAS Data Intergration job.
One column needs to be removed from the target's structure,
but cannot be removed because of the NULL constraint.
Do I need to remove the constraint first?
How do I do that?
Thank you in advance,
Gal.
Does the physical table exists without the column? If so, then the constraint is only in the metadata. Recreate the metadata and you should be fine.
If the physical table exists with the column, then you need to recreate that table without the column. You will still need to refresh the table metadata for DI Studio to pick it up.