I have recently faced a problem on Redshift cluster when table stopped responding.
The guess was that there is a lock on that table, but a query
select * from stl_tr_conflict order by xact_start_ts;
gives me nothing, though judging on AWS documentation stl_tr_conflict table should have records of all transaction issues including locks. But maybe records live there only when lock is alive. I am not sure.
Search in useractivitylog in S3 on words violation and ERROR also is not giving any results. So I still can't figure out why one of tables was not accessible.
I am new to database management so I would appreciate any advice on how to troubleshoot this issue.
Related
I have a set up with a few crawlers crawling a few buckets and generating tables within Glue that are then able to be queried from the Athena query engine.
I have noticed recently that a lot of tables have been popping up, none located in the buckets I am crawling and all look like some system generated tables
temp_table_*
There appear to be over 10 created every day, and when I look at the details of them they are generated from the system query results s3://aws-athena-query-results-*
Is there a reason these are created? Do I have to manually clean them up? Is there a way to stop them being generated, or ignore them ?
They are cluttering the ability to see the tables that matter in Athena.
Thanks in advance for any assistance.
I am connecting to a local DynamoDB and was able to create 2 tables and added data to each. However I am unable to see the 2nd table under, "Operation builder".
I have tired to commit the table again and get an error saying the table exists. See below
From "Operation builder" I've clicked on the "Table" refresh icon and the 2nd table "grades" will not show up.
I have tried closing down and re-running NoSQL Workbench, but I still getting have the same problem. Committing the missing table again gives me the error it already exist.
As it's stated in this thread, NoSQL Workbench is definitely buggy. I was having similar problems with my recently created GSIs and I am pretty sure I had already tried restarting the APP. Nevertheless, I probably did that by closing and opening the APP with the "X" button above. Still not seeing my updated GSIs. I have just done the same using the "Quit" option from the "Application" menu, and now it worked. Not sure if that was the reason or it was just a matter of time syncing...But you can always try.
I am currently writing a framework to transfer hive tables to aws. We can't do that in one shot. We need to it over a period of time. And so there are lots of table which needs to be in Synch with AWS and On-prem hadoop.
Tables which are small and needs truncte and load is not an issue. We have a frameowrk which daily refreshes the table using spark framework.
Problem is for huge tables, we need to append only newly added/updated/deleted rows to AWS. Finding a newly added is fairly simple task. However how do I get updated or deleted records.
40% of our total tables are transcation table. so Updates and deletes are frequent.
For other 60% tables Update/deletes are not frequent. However, sometime due to data issue, people delete the past batch and reload the data.
My questions are
Is there a way I can get Change data capture for hive table?
How do I figure out which records are updated/deleted in transcational table?
how do I figure out which records are updated/deleted in External Table?
I am trying to rename a larger table in Redshift and the process always hangs whenever I try to do so. I've also tried taking an alternate approach and altering the table to add a column instead and that hangs as well. This is happening on two of my larger tables in Redshift. I've checked for lock conflicts, etc. and nothing seems to be blocking the session. Any help would be greatly appreciated.
In Redshift, there's an STL_QUERY table that stores queries that were run over the last 5 days. I'm trying to find a way to keep more than 5 days worth of records. Here are some things that I've considered:
Is there a Redshift setting for this? It would appear not.
Could I use a trigger? Triggers are not available in Redshift, so this is a no-go.
Could I create an Amazon Data Pipeline job to periodically "scrape" the STL_QUERY table? I could, so this is an option. Unfortunately, I would have to give the pipeline some EC2 instance to use to run this work. It seems like a waste to have an instance sitting around to scrape this table once a day.
Could I use an Amazon Simple Work Flow job to scrape the table? I could, but it suffers from the same issues as 3.
Are there any other options/ideas that I'm missing? I would prefer some other option that does not involve me dedicating an EC2 instance, even if it means paying for an additional service (provided that it's cheaper than the EC2 instance I would have used in it's stead).
Keep it simple, do it all in Redshift.
First, use "CREATE TABLE … AS" to save all current history into a permanent table.
CREATE TABLE admin.query_history AS SELECT * FROM stl_query;
Second, using psql to run it, schedule a job on a machine you control to run this every day.
INSERT INTO admin.query_history SELECT * FROM stl_query WHERE query > (SELECT MAX(query) FROM admin.query_history);
Done. :)
Notes:
You need an 8.x version of psql if you haven't set this up yet.
Even if your job doesn't run for a few days stl_query keeps enough history that you'll be covered.
As per your comment, it might be safer to use starttime instead of query as the criteria.