Conditionally drop temporary table in Redshift - amazon-web-services

We are using http://aws.amazon.com/redshift/ and I am creating/dropping temporary tables in reports. Occasionally we encounter cases where someone has created a temporary table and failed to drop it.
In other databases, for instance PostgreSQL which Redshift is based on, I could simply:
DROP TEMP TABLE IF EXISTS tblfoo;
But that is a syntax error in Redshift. I can check for the existence of temporary tables myself using http://docs.aws.amazon.com/redshift/latest/dg/r_STV_TBL_PERM.html but that only works if I am a superuser and I am not running as a superuser. I could also go and swallow exceptions, but with my reporting framework I'd prefer not to go there.
So how can I, as a regular user and without generating database errors, conditionally drop a temporary table if it exists?

The test I ran showed that I could see other users' temp tables in stv_tbl_perm using a non-super user id. The cluster version I tested in is 1.0.797. Note that no users can see other users' temp tables in pg_class.

Related

GCP CLOUD SQL denies permission for pre aggregation

I am trying to use pre aggregations over CLOUD SQL on Google Cloud Platform but the database is denying access and giving error Statement violates GTID consistency.
Any help is appreciated.
Cube.js done pre-aggregation by CREATE TABLE ... SELECT, but you are using MySQL on top of Google SQL with --enforce-gtid-consistency (has limitations).
Since only transactionally safe statements can be logged, there is a limitation to use CREATE TABLE ... SELECT (and some another SQL), because this statement is actually logged as two separate events.
There are two ways how to solve this issue:
1. Use pre-aggregations to an external database. (recommended way).
https://cube.dev/docs/pre-aggregations/#read-only-data-source-pre-aggregations
2. Use not documented flag loadPreAggregationWithoutMetaLock
Attention: This flag is an experimental and can be removed or changed in the feature..
Take a look at the source code
You can pass it directly in the driver constructor. This will produce two SQL statements to pass the limitation:
CREATE TABLE
INSERT INTO
Thanks

Redshift Table Ownership And Drop Query

Users & Scope -
write_user - All Access to all Tables
read_user - Read access to all Tables
backup_pruner - All GRANTS to all tables in schema backup.
My Problem Statement -
I have written an automation that has to drop tables of a schema called backup where tables are created by write_user.
Now, for dropping tables I have to user the backup_pruner user and here is the problem.
Since write_user creates table here, it is the owner of all tables in backup & Only Owners/Super-Users can drop tables.
How to Proceed from here ?
To answer the question WHY to use a separate user to DROP Tables -
To tighten the accessibility of tables as DROP if not used properly/any corner case, then can be disastrous for other tables too.
Consider using a Stored Procedure created with the SECURITY DEFINER to drop the tables. An SP created this way runs with the permissions of the creator.
You can define a list of table names allowed to be dropped that the SP checks before taking action.
I created an example of this approach on GitHub: sp_controlled_access

Is it possible to change a database (schema) name in AWS Athena?

I created a database and some tables with Data on AWS Athena and would like to rename the database without deleting and re-creating the tables and database. Is there a way to do this? I tried the standard SQL alter database but it doesn't seem to work.
thanks!
I'm afraid there is no way to do this according to this official forum thread. You would need to remove the database and re-create it. However, since Athena does not store any data by itself, deleting a table or a database won't impact your data stored on S3. Therefore, if you kept all the scripts that create external tables, re-creating a database should be fairly quick thing to do.
Athena doesn't support renaming database. You need to recreate database with a new name.
You can use Presto which is an open source version of Athena and Presto supports more DDL queries.

How RedShift Sessions are handled from a Server Connection for TEMP tables

I'm using ColdFusion to connect to a RedShift database and I'm trying to understand how to test/assume myself of how the connections work in relation to TEMP tables in RedShift.
In my CFADMIN for the datasource I have unchecked Maintain connections across client requests. I would assume then each user who is using my website would have their own "Connection" to the DB? Is that correct?
Per the RedShift docs about temp tables:
TEMP: Keyword that creates a temporary table that is visible only within the current session. The table is automatically dropped at the end of the session in which it is created. The temporary table can have the same name as a permanent table. The temporary table is created in a separate, session-specific schema. (You cannot specify a name for this schema.) This temporary schema becomes the first schema in the search path, so the temporary table will take precedence over the permanent table unless you qualify the table name with the schema name to access the permanent table.
Am I to understand that if #1 is true and each user has their own connection to the database and thereby their own session then per #2 any tables that are created will be only in that session even though the "user" is the same as it's a connection from my server that is using the same credentials.
3.If my assumptions in #1 and #2 are correct then if I have ColdFusion code that runs a query like so:
drop if exists tablea
create temp table tablea
insert into tablea
select * from realtable inner join
drop tablea
And multiple users are using that same function that does this. They should never run into any conflicts where one table gets dropped as another request is trying to use it correct?
How do I test that this is the case? Besides throwing it into production and waiting for an error how can I know. I tried running a few windows side by side in different browsers and stuff and didn't notice an issue, but I don't know how to know if the temp tables truly are different between clients. (as they should be.) I imagine I could query some meta data but what meta data about the table would tell me that?
I have a similar situation, but with redbrick database software. I handle it by creating unique table names. The general idea is:
Create a table name something like this:
<cfset tablename = TableText & randrange(1, 100000)>
Try to create a table with that name. If you fail try again with a different name.
If you fail 3 times stop trying and mail the cfcatch information to someone.
I have all this code in a custom tag.
Edit starts here
Based on the comments, here is some more information about my situation. In CFAdmin, for the datasource being discussed, the Maintain Connections box is checked.
I put this code on a ColdFusion page:
<cfquery datasource="dw">
create temporary table dan (f1 int)
</cfquery>
I ran the page and then refreshed it. The page executed successfully the first time. When refreshed, I got this error.
Error Executing Database Query.
** ERROR ** (7501) Name defined by CREATE TEMPORARY TABLE already exists.
That's why I use unique tablenames. I don't cache the queries though. Ironically, my most frequent motivation for using temporary tables is because there are situations where they make things run faster than using the permanent tables.

PostgreSQL: update table with new records from the same table on remote server

We have a PostgreSQL server running in production and a plenty of workstations with an isolated development environments. Each one has its own local PostgreSQL server (with no replication with the production server). Developers need to receive updates stored in production server periodically.
I am trying to figure out how to dump the contents of several selected tables from server in order to update the tables on development workstations. The biggest challenge is that the tables I'm trying to synchronize may be diverged (developers may add - but not delete - new fields to the tables through the Django ORM, while schema of the production database remains unchanged for a long time).
Therefore the updated records and new fields of the tables stored on workstations must be preserved against the overwriting.
I guess that direct dumps (e.g. pg_dump -U remote_user -h remote_server -t table_to_copy source_db | psql target_db) are not suitable here.
UPD: If possible I would also like to avoid the use of third (intermediate) database while transferring the data from production database to the workstations.
I would recommend the following approach.
I'll outline example based on a single table customer.
We want to copy some entries from this table on production. Obviously, full table dump will break new stuff that exists on development envs;
Therefore, create a table with the similar structure, but a different name, say customer_$. Another way is to create a dedicated schema for such “copying” tables. You might also want to include a couple of extra columns there, like copy_id and/or copy_stamp;
Now you can INSERT INTO customer_$ SELECT ... to populate your copying table with wanted data. You might need to think of the way how to do this, though. In the tool we use here we can supply predicate data via the -w switch, like -w "customer_id IN (SELECT id FROM cust2copy)";
After you've populated your copying table(s), you can dump them. Make sure to use the following switches to the pg_dump:
--column-inserts to explicitly list target columns, for on development env copying table might have changed it's structure. This might be “slow” for big volumes though;
--table / -t to specify tables to dump.
On the target env, make sure to (1) empty copying tables and (2) prevent parallel activities of similar nature;
Load date into the copying tables;
The most interesting part comes: you need to check, that data you're bout to INSERT into the main tables will not conflict with any of the constraints defined on the tables. You might have:
PRIMARY KEY violations. You can (1) replace existing entries or (2) merge entries together or (3) skip entries from the copying tables or (4) choose to assign different ID in the copying tables;
UNIQUE KEY violations, most likely you'll have to UPDATE some columns in the copying tables;
FOREIGN KEY violations, you'll have either to give up on such entries, or to copy over missing stuff from the production as well;
CHECK violations, you'll have to investigate this ones manually.
After checks are done and data in the copying tables is fixed, you can copy it into the main tables.
This is a very formal description of the approach. Say, for step #7 we have a huge pile of extra tools to do ID or ID ranges remapping, to manipulate data in the copying tables, adjusting security settings, ownership, some defaults, etc.
Also, we have a so-called catalogue for this tool, which allows us to group logically tied tables under common names. Say, to copy customers from production we have to check round 50 tables in order to satisfy all possible dependencies.
I haven't seen similar tools in the wild though so far.