Select for Teradata Primary/Foreign key relationship - foreign-keys

Im in the process of learning to properly pull appropriate metadata from a Teradata database and a large part of what I need is to pull all existing primary/foreign keys within a database. I am still very much a beginner with Teradata as well as big data in general, so a simplified explanation would be nice.
A simplified version of a select statement would also be incredibly helpful. Thanks in advance.

Foreign Keys: dbc.All_RI_ParentsV[X]
PK/Unique: dbc.IndicesV[X]. Unique Indexes got a UniqueFlag Y, if it was defined as a PK in the Create Table IndexType will be P. Multi-column indexes got one row per column all sharing the same IndexNumber, 1 is always the PI.
But as Teradata is a DWH you might have tables without defined PK and you will hardly find any defined FKs.

Related

Join 2 table in power bi

I need help on this issue as i don't have any experience in Power Bi. I want to join 2 table in Power Bi where it have the same column which is Part_Number. How can i make this 2 table to match by Part Number and return the value?
Recon Table
Inventory Table
I would like to have Part Number, Part Name, QTY, Total Quantity as the result. Hope that i can the clarification i need. Thanks a lot!
For this case you simply must merge the tables. It doesn't look like you have done a lot of research on the matter though, so it's hard to understand exactly what you need help with.
To merge your two tables in Power Query, I would right click in the left hand side menu and select Merge Queries as New.
After that you simply follow the on-screen instructions and select your two tables and their respective key columns. After merging you can choose to disable load of your two original tables to save space in your data model, but this depends on your requirements.
If this was my data model, I would think on why joining these tables are necessary, instead of using these two tables as fact tables, and creating a third table to handle the part number dimension with associated part metadata.
Read the docs: Merge queries in Power Query

Understanding Dimension Tables - Best Approach - Power BI

I wanted to know what would be the best approach for creating the dim tables. Can I maintain it as a single table with all fields and use them as required or create separate dim tables and use them individually.
Can someone please help me out here
PS: I'm a beginner here.
Creating 1 table per dimension is the best practice. In data warehouse concept, you will get 4 types of schema as below-
Start Schema
Snowflakes Schema
Galaxy Schema
Combined Schema
People select any of the above based on their Data type/nature, requirement and other parameter. But in all case, there are single table per dimension. This is easy to maintain and give better performance.

Redshift DISTSTYLE KEY. Deciding whats the best column to define as KEY

Well I recently got into this area of Redshift, trying to optimize disk usage and performance of my database, and having read lots of information on AWS about the topic, I still have some doubts.
First of all, to my database structure. Per schema, I have 3 master tables, with 3 different IDs, these are now DISTSTLYE ALL tables, being small in size.
Each master table has different amounts of IDs,
the date table --> largest one (#1 most joined)
the store table --> medium one (#3 most joined)
the item table --> smallest one (#2 most joined)
Then I have my core table, which has needed combinations of these IDs to display additional information about them. Anyway, this table should be a DISTSTYLE KEY type, based on my knowledge. Well, which of the 3 IDs should I select to be my DIST KEY?
Whats the criteria for this decision? I understand that for joins I need to look at the Sort Key, well that has been understood and defined to the ID_date, because its the most joined table. So now, what about the distribution per node of this table?
I'm sorry if I'm rambling, I dont want to leave any information out. If I have, feel free to ask! Thanks for taking the time to read!
You'll find the best advice on Amazon Redshift best practices for designing tables. It goes into quite a bit of detail.
However, my rule of thumb is:
The DISTKEY should be the column most used in JOINs between tables
The SORTKEY should be the column most used in WHERE statements
Use DISTSTYLE ALL for small lookup tables

Reading (even joining) a very large (1.1bn row) table in Enterprise Guide from Teradata

Hopefully you guys can help with what I'm hoping is quite a simple question for those in the know!
I live (well, work) in SAS Enterprise Guide and am trying to perform a simple left join against a table in Teradata.
The table is extremely large (700+ columns, 1.1bn rows) and so far I have been connecting via a LIBNAME statement at the top of my program, followed by the usual PROC SQL to read the data.
The issue I am having is its is extremely slow. I performed the join successfully using 90 rows on the left table and it took 3 hours to complete. The real table I want to use has something like 15,000 rows.
I have tried to connect via the SQL Pass-Through method, but this throws a hosts file error, which I can't fix due to corporate security limitations.
Has anyone had any experience performing this kind of task?
I should mention that I can run a simple select * query in Teradata SQL Assistant is just over 1 minute (16,666,666 obs/s!) so the limitation must be somewhere between SAS/Teradata, or even SAS itself.
I'm sorry I haven't posted actual code snippets as they're on my work machine but this has been bugging me for ages so thought I'd see if I'm missing any tricks.
Thanks in advance for your help.
So you're joining a SAS data set to a Teradata table and want to return the matching records. You'll want to use SAS's DBMASTER= data set option. It designates which of the tables is larger. By telling SAS this, it knows which table to move.
Here I assume librefs have already been assigned and that the Teradata table is larger--more obs--than the SAS data set.
proc sql threads; select tdTable.* from sastables.sasTable1, td.tdTable(dbmaster=yes)
where tdTable.idNum=sasTable1.idNum; quit;
If by chance your SAS data set is larger, you'll want to use the MULTI_DATASRC_OPT= option. Either google these terms or look in the SAS/Access to Relational Databases manual. It's pretty good.
Good luck.
Have you considered creating a volatile table in Teradata? Since this is created in your spool allocation you shouldn't need explicit permissions to create the table. Once created you can load the SAS data set into the Volatile table and collect statistics on the table's join columns and filter columns. This will help the optimizer understand the demographics about your "small" table. The volatile table will only persist for the duration of your session and is not accessible across multiple sessions.
Then rewrite your SAS code to push-down the SQL to Teradata joining the large table to your volatile table. The results can be returned to SAS and loaded into another data set.
CREATE VOLATILE TABLE MyTable, NO FALLBACK
( ColA SMALLINT NOT NULL,
ColB VARCHAR(10) NOT NULL
) PRIMARY INDEX (ColA)
ON COMMIT PRESERVE ROWS /* This is important */
;
The primary index is how Teradata distributes the data and accesses the data. Tables distributed on the same column will join "AMP local" and will not require a redistribution. This is not always possible, as your primary index selection has to consider even distribution as well as access path. The primary index does not have to be unique, but can be.
Hope this helps.

Can I set up relations between data set in SAS?

I am moving from Relational Database and new to SAS. Now I need to import several CSV files into SAS, and there are relationships between those files. Just like the relationships between tables in database. I am wondering, does the same concept exist in SAS such as the foreign key which I need to set up, or should I just import those files directly regardless of the relationships because no such things in SAS?
Since the concept of a foreign key exists in your head, it also exists in SAS. But it is (generally) not "a supported attribute" that you'd actually use to tag data fields. SAS is low overhead in terms of having to do much upfront data definition, especially for ad hoc work.
Just import your files as they are.
And coming from relational databases, you should probably look at "proc SQL" as the fastest & best way of using your data manipulation skills in SAS.
In SAS, you have the concept of Integrity Constraint, as mjsqu referred to in comments. It is how you enforce relations between datasets. It is quite simple to use, and the syntax should be relatively familiar to someone coming from a strong SQL background.
Integrity constraints include:
Check (list of valid values written into the dataset)
Not Null (may not have a missing value)
Unique (may not have duplicate values)
Primary Key
Foreign Key (also known as a "referential constraint", as it is the only one that checks another table)
Here's an example of some of SAS's Integrity Constraints in action:
proc sql;
create table employees (
employee_id num primary key,
name char(16),
gender char(1),
dob num format=date9.,
division num not null);
create table division_table (
division num primary key,
division_name char(16)
);
alter table employees
add constraint gender check(gender in ('M','F'))
add constraint division foreign key(division) references division_table
on delete restrict on update restrict;
*this insert fails because of Division not containing a value for 1;
insert into employees (employee_id,name,gender,dob,division) values (1,'Johnny','M','06JAN1945'd,1);
insert into division_table (division,division_name) values (1,'Division 1');
*Now it succeeds;
insert into employees (employee_id,name,gender,dob,division) values (1,'Johnny','M','06JAN1945'd,1);
quit;
You can also add constraints in PROC DATASETS, which will be more familiar for SAS users (and is probably slightly easier syntax for those unfamiliar with SQL).