Redshift column names - amazon-web-services

I'm new to redshift. I have some tables in 'abc' schema whose column names and primary key information needs to be extracted. Can someone guide.
Assume schema name is 'abc' and table name is 'xyz' whose columns are required to be listed in a single row.

Use v_generate_tbl_ddl.sql provided by AWS Labs, you can use table level or schema level filters.

You can query the SVV_COLUMNS table. It includes:
Scheme Name
Table Name
Column Name

I used this query to get a list of primary keys. A lot of additional info can be added. Postres official documentation to pg_ tables helps a lot.
SELECT
f.attname AS column_name
FROM
pg_catalog.pg_namespace n
JOIN pg_catalog.pg_class c ON
n.oid = c.relnamespace
JOIN pg_catalog.pg_attribute f ON
c.oid = f.attrelid
JOIN pg_catalog.pg_constraint p ON
p.conrelid = c.oid
AND f.attnum = ANY (p.conkey)
WHERE
n.nspname = 'schema_name'
AND c.relkind = 'r'
AND c.relname = 'table_name'
AND p.contype = 'p'
AND f.attnum > 0
ORDER BY
f.attnum;

Related

Why does left join in redshift not working?

We are facing a weird issue with Redshift and I am looking for help to debug it please. Details of the issue are following:
I have 2 tables and I am trying to perform left join as follows:
select count(*)
from abc.orders ot
left outer join abc.events e on **ot.context_id = e.context_id**
where ot.order_id = '222:102'
Above query returns ~7000 records. Looks like it is performing default join as we have only 1 record in [Orders] table with Order ID = ‘222:102’
select count(*)
from abc.orders ot
left outer join abc.events e on **ot.event_id = e.event_id**
where ot.order_id = '222:102'
Above query returns 1 record correctly. If you notice, I have just changed column for joining 2 tables. Event_ID in [Events] table is identity column but I thought I should get similar records even if I use any other column like Context_ID.
Further, I tried following query under the impression it should return all the ~7000 records as I am using default join but surprisingly it returned only 1 record.
select count(*)
from abc.orders ot
**join** abc.events e on ot.event_id = e.event_id
where ot.order_id = '222:102'
Following are the Redshift database details:
Cutdown version of table metadata:
CREATE TABLE abc.orders (
order_id character varying(30) NOT NULL ENCODE raw,
context_id integer ENCODE raw,
event_id character varying(21) NOT NULL ENCODE zstd,
FOREIGN KEY (event_id) REFERENCES events_20191014(event_id)
)
DISTSTYLE EVEN
SORTKEY ( context_id, order_id );
CREATE TABLE abc.events (
event_id character varying(21) NOT NULL ENCODE raw,
context_id integer ENCODE raw,
PRIMARY KEY (event_id)
)
DISTSTYLE ALL
SORTKEY ( context_id, event_id );
Database: Amazon Redshift cluster
I think, I am missing something essential while joining the tables. Could you please guide me in right direction?
Thank you

how to insert/update data in sql database using azure databricks notebook jdbc

I got lots of example to append/overwrite table in sql from AZ Databricks Notebook. But no single way to directly update, insert data using query or otherway.
ex. I want to update all row where (identity column)ID = 1143, so steps which I need to taken care are
val srMaster = "(SELECT ID, userid,statusid,bloburl,changedby FROM SRMaster WHERE ID = 1143) srMaster"
val srMasterTable = spark.read.jdbc(url=jdbcUrl, table=srMaster,
properties=connectionProperties)
srMasterTable.createOrReplaceTempView("srMasterTable")
val srMasterTableUpdated = spark.sql("SELECT userid,statusid,bloburl,140 AS changedby FROM srMasterTable")
import org.apache.spark.sql.SaveMode
srMasterTableUpdated.write.mode(SaveMode.Overwrite)
.jdbc(jdbcUrl, "[dbo].[SRMaster]", connectionProperties)
Is there any other sufficient way to achieve the same.
Note : Above code is also not working as SQLServerException: Could not drop object 'dbo.SRMaster' because it is referenced by a FOREIGN KEY constraint. , so it look like it drop table and recreate...not at all the solution.
You can use insert using a FROM statement.
Example: update values from another table in this table where a column matches.
INSERT INTO srMaster
FROM srMasterTable SELECT userid,statusid,bloburl,140 WHERE ID = 1143;
or
insert new values to rows where one of the existing column value matches
UPDATE srMaster SET userid = 1, statusid = 2, bloburl = 'https://url', changedby ='user' WHERE ID = '1143'
or just insert multiple values
INSERT INTO srMaster VALUES
(1, 10, 'https://url1','user1'),
(2, 11, 'https://url2','user2');
In SQL Server, you cannot drop a table if it is referenced by a FOREIGN KEY constraint. You have to either drop the child tables before removing the parent table, or remove foreign key constraints.
For a parent table, you can use the below query to get foreign key constraint names and the referencing table names:
SELECT name AS 'Foreign Key Constraint Name',
OBJECT_SCHEMA_NAME(parent_object_id) + '.' + OBJECT_NAME(parent_object_id) AS 'Child Table'
FROM sys.foreign_keys
WHERE OBJECT_SCHEMA_NAME(referenced_object_id) = 'dbo' AND
OBJECT_NAME(referenced_object_id) = 'PARENT_TABLE'
Then you can alter the child table and drop the constraint by its name using the below statement:
ALTER TABLE dbo.childtable DROP CONSTRAINT FK_NAME;

How to add a partition boundary only when not exists in SQL Data Warehouse?

I am using Azure SQL Data Warehouse Gen 1, and I create a partition table like this
CREATE TABLE [dbo].[StatsPerBin1](
[Bin1] [varchar](100) NOT NULL,
[TimeWindow] [datetime] NOT NULL,
[Count] [int] NOT NULL,
[Timestamp] [datetime] NOT NULL)
WITH
(
DISTRIBUTION = HASH ( [Bin1] ),
CLUSTERED INDEX([Bin1]),
PARTITION
(
[TimeWindow] RANGE RIGHT FOR VALUES ()
)
)
How should I split a partition only when there is no such boundary?
First I think if I can get partition boundaries by table name, then I can write a if statement to determine add partition boundary or not.
But I cannot find a way to associate a table with its corresponding partition values, the partition values of all partitions can be retrieved by
SELECT * FROM sys.partition_range_values
But it only contains function_id as identifier which I don't know how to join other tables so that I can get partition boundaries by table name.
Have you tried joining sys.partition_range_values with sys.partition_functions view?
Granted we cannot create partition functions in SQL DW, but the view seems to be still supported.
I know this is an out of date question, but I was having the same problem. Here is a query I ended up with that can get you started. It is modified slightly from a query for SQL Server documentation:
SELECT s.[name] AS [schema_name]
, t.[name] AS [table_name]
, p.[partition_number] AS [partition_number]
, rv.[value] AS [partition_boundary_value]
, p.[data_compression_desc] AS [partition_compression_desc]
FROM sys.schemas s
JOIN sys.tables t ON t.[schema_id] = s.[schema_id]
JOIN sys.partitions p ON p.[object_id] = t.[object_id]
JOIN sys.indexes i ON i.[object_id] = p.[object_id]
AND i.[index_id] = p.[index_id]
JOIN sys.data_spaces ds ON ds.[data_space_id] = i.[data_space_id]
LEFT JOIN sys.partition_schemes ps ON ps.[data_space_id] = ds.[data_space_id]
LEFT JOIN sys.partition_functions pf ON pf.[function_id] = ps.[function_id]
LEFT JOIN sys.partition_range_values rv ON rv.[function_id] = pf.[function_id]
AND rv.[boundary_id] = p.[partition_number]

How do you query table names and row counts for all tables in a schema using HP NonStop SQL/MX?

How do you query table names and row counts for all tables in a schema using HP NonStop SQL/MX?
Thanks!
This might help you, althought this is more standard SQL and im not sure how much variation comes into sqlmx
SELECT
TableName = t.NAME,
TableSchema = s.Name,
RowCounts = p.rows
FROM
sys.tables t
INNER JOIN
sys.schemas s ON t.schema_id = s.schema_id
INNER JOIN
sys.indexes i ON t.OBJECT_ID = i.object_id
INNER JOIN
sys.partitions p ON i.object_id = p.OBJECT_ID AND i.index_id = p.index_id
WHERE
t.is_ms_shipped = 0
GROUP BY
t.NAME, s.Name, p.Rows
ORDER BY
s.Name, t.Name
Obviously this is an example, replace example data and table info with yours
Here is how to list the tables in a sql/mx schema, note that the system catalog name given here is an example, replace NONSTOP_SQLMX_SYSNAME with NONSTOP_SQLMX_xxxx where xxxx is the Expand node name of your system.
Also the definition schema name includes the schema version number, this example uses 3600. This example lists all the base table names in schema JDFCAT.T.
See chapter 10 of the SQL/MX reference manual for information on the metadata tables.
The table row counts are not stored in the system metadata, so you can't get them from there. For a table do SELECT ROW COUNT FROM TABLE;
SELECT
O.OBJECT_NAME
FROM
NONSTOP_SQLMX_SYSNAME.SYSTEM_SCHEMA.CATSYS C
INNER JOIN NONSTOP_SQLMX_SYSNAME.SYSTEM_SCHEMA.SCHEMATA S
ON (S.CAT_UID = C.CAT_UID)
INNER JOIN JDFCAT.DEFINITION_SCHEMA_VERSION_3600.OBJECTS O
on S.SCHEMA_UID = o.SCHEMA_UID
WHERE C.CAT_NAME = 'JDFCAT' AND
S.SCHEMA_NAME = 'T' AND
O.OBJECT_TYPE = 'BT'
READ UNCOMMITTED ACCESS;

Show tables, describe tables equivalent in redshift

I'm new to aws, can anyone tell me what are redshifts' equivalents to mysql commands?
show tables -- redshift command
describe table_name -- redshift command
All the information can be found in a PG_TABLE_DEF table, documentation.
Listing all tables in a public schema (default) - show tables equivalent:
SELECT DISTINCT tablename
FROM pg_table_def
WHERE schemaname = 'public'
ORDER BY tablename;
Description of all the columns from a table called table_name - describe table equivalent:
SELECT *
FROM pg_table_def
WHERE tablename = 'table_name'
AND schemaname = 'public';
Update:
As pointed by #Kishan Pandey 's answer, if you are looking for details of a schema different by public, you need to set search_path to my_schema. (show search_path display current search path)
Listing tables in my_schema schema:
set search_path to my_schema;
select * from pg_table_def;
I had to select from the information schema to get details of my tables and columns; in case it helps anyone:
SELECT * FROM information_schema.tables
WHERE table_schema = 'myschema';
SELECT * FROM information_schema.columns
WHERE table_schema = 'myschema' AND table_name = 'mytable';
Or simply:
\dt to show tables
\d+ <table name> to describe a table
Edit: Works using the psql command line client
Tomasz Tybulewicz answer is good way to go.
SELECT * FROM pg_table_def WHERE tablename = 'YOUR_TABLE_NAME' AND schemaname = 'YOUR_SCHEMA_NAME';
If schema name is not defined in search path , that query will show empty result.
Please first check search path by below code.
SHOW SEARCH_PATH
If schema name is not defined in search path , you can reset search path.
SET SEARCH_PATH to '$user', public, YOUR_SCEHMA_NAME
You can use - desc / to see the view/table definition in Redshift. I have been using Workbench/J as a SQL client for Redshift and it gives the definition in the Messages tab adjacent to Result tab.
In the following post, I documented queries to retrieve TABLE and COLUMN comments from Redshift.
https://sqlsylvia.wordpress.com/2017/04/29/redshift-comment-views-documenting-data/
Enjoy!
Table Comments
SELECT n.nspname AS schema_name
, pg_get_userbyid(c.relowner) AS table_owner
, c.relname AS table_name
, CASE WHEN c.relkind = 'v' THEN 'view' ELSE 'table' END
AS table_type
, d.description AS table_description
FROM pg_class As c
LEFT JOIN pg_namespace n ON n.oid = c.relnamespace
LEFT JOIN pg_tablespace t ON t.oid = c.reltablespace
LEFT JOIN pg_description As d
ON (d.objoid = c.oid AND d.objsubid = 0)
WHERE c.relkind IN('r', 'v') AND d.description > ''
ORDER BY n.nspname, c.relname ;
Column Comments
SELECT n.nspname AS schema_name
, pg_get_userbyid(c.relowner) AS table_owner
, c.relname AS table_name
, a.attname AS column_name
, d.description AS column_description
FROM pg_class AS c
INNER JOIN pg_attribute As a ON c.oid = a.attrelid
INNER JOIN pg_namespace n ON n.oid = c.relnamespace
LEFT JOIN pg_tablespace t ON t.oid = c.reltablespace
LEFT JOIN pg_description As d
ON (d.objoid = c.oid AND d.objsubid = a.attnum)
WHERE c.relkind IN('r', 'v')
AND a.attname NOT
IN ('cmax', 'oid', 'cmin', 'deletexid', 'ctid', 'tableoid','xmax', 'xmin', 'insertxid')
ORDER BY n.nspname, c.relname, a.attname;
Shortcut
\d for show all tables
\d tablename to describe table
\? for more shortcuts for redshift
redshift now support show table
show table analytics.dw_users
https://forums.aws.amazon.com/ann.jspa?annID=8641
You can simply use the command below to describe a table.
desc table-name
or
desc schema-name.table-name