Redshift Table Ownership And Drop Query

Redshift Table Ownership And Drop Query - amazon-web-services

Users & Scope -
write_user - All Access to all Tables
read_user - Read access to all Tables
backup_pruner - All GRANTS to all tables in schema backup.
My Problem Statement -
I have written an automation that has to drop tables of a schema called backup where tables are created by write_user.
Now, for dropping tables I have to user the backup_pruner user and here is the problem.
Since write_user creates table here, it is the owner of all tables in backup & Only Owners/Super-Users can drop tables.
How to Proceed from here ?
To answer the question WHY to use a separate user to DROP Tables -
To tighten the accessibility of tables as DROP if not used properly/any corner case, then can be disastrous for other tables too.

Consider using a Stored Procedure created with the SECURITY DEFINER to drop the tables. An SP created this way runs with the permissions of the creator.
You can define a list of table names allowed to be dropped that the SP checks before taking action.
I created an example of this approach on GitHub: sp_controlled_access

Related

How to update Athena tables in a zero-down-time manner?

I am deploying Athena external tables, and want to update their definition without downtime, is there a way?
The ways I thought about are:
Create a new table and rename the old and then rename the new to the old name, first, it involves a very small downtime, and renaming tables doesn't seem to be supported (neither altering the definition).
The other way is to drop the table and recreate it, which obviously involves downtime.

If you use the Glue UpdateTable API call you can change a table definition without first dropping the table. Athena uses the Glue Catalog APIs behind the scenes when you do things like CREATE TABLE … and DROP TABLE ….
Please note that if you make changes to partitioned tables you also need to update all partitions to match.
Another way that does not involve using Glue directly would be to create a view for each table and only use these views in queries. When you need to replace a table you create a new table with the new schema, then recreate the view (with CREATE OR REPLACE) to use the new table, then drop the old table. I haven't checked, but it would make sense if replacing a view used the UpdateTable API behind the scenes.

AWS Redshift purge policy automation

AWS Redshift team recommend using TRUNCATE in order to clean up a large table.
I have a continuous EC2 service that keeps adding rows to a table. I would like to apply some purging mechanism, so that when the cluster is near full it will auto delete old rows (say using the index column).
Is there some best practice for doing that?
Do I need to write my own code to handle that? (if so is there already a Python script for that that I can use e.g. in a Lambda function?)

A common practice when dealing with continuous data is to create a separate table for each month, eg Sales-2018-01, Sales-2018-02.
Then create a VIEW that combines the tables:
CREATE VIEW sales AS
SELECT * FROM Sales-2018-01
UNION
SELECT * FROM Sales-2018-02
Then, create a new table each month and remove the oldest month from the View. This effectively gives a 12-month rolling view of the data.
The benefit is that data does not have to be deleted from tables (which would then require a VACUUM). Instead, the old table can simply be dropped, or kept around for historical reporting with a different View.
See: Using Time Series Tables - Amazon Redshift

How RedShift Sessions are handled from a Server Connection for TEMP tables

I'm using ColdFusion to connect to a RedShift database and I'm trying to understand how to test/assume myself of how the connections work in relation to TEMP tables in RedShift.
In my CFADMIN for the datasource I have unchecked Maintain connections across client requests. I would assume then each user who is using my website would have their own "Connection" to the DB? Is that correct?
Per the RedShift docs about temp tables:
TEMP: Keyword that creates a temporary table that is visible only within the current session. The table is automatically dropped at the end of the session in which it is created. The temporary table can have the same name as a permanent table. The temporary table is created in a separate, session-specific schema. (You cannot specify a name for this schema.) This temporary schema becomes the first schema in the search path, so the temporary table will take precedence over the permanent table unless you qualify the table name with the schema name to access the permanent table.
Am I to understand that if #1 is true and each user has their own connection to the database and thereby their own session then per #2 any tables that are created will be only in that session even though the "user" is the same as it's a connection from my server that is using the same credentials.
3.If my assumptions in #1 and #2 are correct then if I have ColdFusion code that runs a query like so:
drop if exists tablea
create temp table tablea
insert into tablea
select * from realtable inner join
drop tablea
And multiple users are using that same function that does this. They should never run into any conflicts where one table gets dropped as another request is trying to use it correct?
How do I test that this is the case? Besides throwing it into production and waiting for an error how can I know. I tried running a few windows side by side in different browsers and stuff and didn't notice an issue, but I don't know how to know if the temp tables truly are different between clients. (as they should be.) I imagine I could query some meta data but what meta data about the table would tell me that?

I have a similar situation, but with redbrick database software. I handle it by creating unique table names. The general idea is:
Create a table name something like this:
<cfset tablename = TableText & randrange(1, 100000)>
Try to create a table with that name. If you fail try again with a different name.
If you fail 3 times stop trying and mail the cfcatch information to someone.
I have all this code in a custom tag.
Edit starts here
Based on the comments, here is some more information about my situation. In CFAdmin, for the datasource being discussed, the Maintain Connections box is checked.
I put this code on a ColdFusion page:
<cfquery datasource="dw">
create temporary table dan (f1 int)
</cfquery>
I ran the page and then refreshed it. The page executed successfully the first time. When refreshed, I got this error.
Error Executing Database Query.
** ERROR ** (7501) Name defined by CREATE TEMPORARY TABLE already exists.
That's why I use unique tablenames. I don't cache the queries though. Ironically, my most frequent motivation for using temporary tables is because there are situations where they make things run faster than using the permanent tables.

How can I create a model with ActiveRecord capabilities but without an actual table behind?

I think this is a recurrent question in the Internet, but unfortunately I'm still unable to find a successful answer.
I'm using Ruby on Rails 4 and I would like to create a model that interfaces with a SQL query, not with an actual table in the database. For example, let's suppose I have two tables in my database: Questions and Answers. I want to make a report that contains statistics of both tables. For such purpose, I have a complex SQL statement that takes data from these tables to build up the statistics. However the SELECT used in the SQL statement does not directly take values from neither Answers nor Questions tables, but from nested SELECTs.
So far I've been able to create the StatItem model, without any migration, but when I try StatItem.find_by_sql("...nested selects...") the system complains about unexisting table stat_items in the database.
How can I create a model whose instance's data is retrieved from a complex query and not from a table? If it's not possible, I could create a temporary table to store the data in there. In such case, how can I tell the migration file to not create such table (it would be created by the query)?

How about creating a materialized view from your complex query and following this tutorial:
ActiveRecord + PostgreSQL Materialized Views

Michael Kohl and his proposal of materialized views has given me an idea, which I initially discarded because I wrongly thought that a single database connection could be shared by two processes, but after reading about how Rails processes requests, I think my solution is fine.
STEP 1 - Create the model without migration
rails g model StatItem --migration=false
STEP 2 - Create a temporary table called stat_items
#First, drop any existing table created by older requests (database connections are kept open by the server process(es).
ActiveRecord::Base.connection.execute('DROP TABLE IF EXISTS stat_items')
#Second, create the temporary table with the desired columns (notice: a dummy column called 'id:integer' should exist in the table)
ActiveRecord::Base.connection.execute('CREATE TEMP TABLE stat_items (id integer, ...)')
STEP 3 - Execute an SQL statement that inserts rows in stat_items
STEP 4 - Access the table using the model, as usual
For example:
StatItem.find_by_...
Any comments/improvements are highly appreciated.

Conditionally drop temporary table in Redshift

We are using http://aws.amazon.com/redshift/ and I am creating/dropping temporary tables in reports. Occasionally we encounter cases where someone has created a temporary table and failed to drop it.
In other databases, for instance PostgreSQL which Redshift is based on, I could simply:
DROP TEMP TABLE IF EXISTS tblfoo;
But that is a syntax error in Redshift. I can check for the existence of temporary tables myself using http://docs.aws.amazon.com/redshift/latest/dg/r_STV_TBL_PERM.html but that only works if I am a superuser and I am not running as a superuser. I could also go and swallow exceptions, but with my reporting framework I'd prefer not to go there.
So how can I, as a regular user and without generating database errors, conditionally drop a temporary table if it exists?

The test I ran showed that I could see other users' temp tables in stv_tbl_perm using a non-super user id. The cluster version I tested in is 1.0.797. Note that no users can see other users' temp tables in pg_class.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js