I am trying to rename a larger table in Redshift and the process always hangs whenever I try to do so. I've also tried taking an alternate approach and altering the table to add a column instead and that hangs as well. This is happening on two of my larger tables in Redshift. I've checked for lock conflicts, etc. and nothing seems to be blocking the session. Any help would be greatly appreciated.
Related
Because of companies policies, we have a lot of information that we need as input inserted into a BigQuery table that we need to SELECT from.
My problem is that doing a select directly into this table and trying to run a process (a virtual machine, etc) is prone to errors and reworking. If my process stops, I need to run the query again and reprocess everything.
Is there a way to export data from Big Query to a Kinesis-like stream (I'm more familiar with AWS)?
DataFlow + PubSub seems to be the way to go for this kind of issue.
Thank you jamiet!
Hi Im trying to query some table in DynamoDB. However from what I read I can only do it using some code or form the CLI. Is there a way to do complex queries from the GUI? I tried playing with it but can't seem to figure out how to do a simple COUNT(*). Please help.
Go to DynamoDB Console;
Select the table that you want to count
Go to "overview" page/tab
In table properties, click on Manage Live Count
Click Start Scan
This will give you the count of items of the table at that moment. Just be warned that this count is eventually consistent; what means that if someone is performing changes in the table at that exact moment your end result will not be exact (but probably very close to reality).
Digressing a little bit (only in case you're new to DynamoDB):
DynamoDB is a NoSQL database. It doesn't support the same commands that are common in SQL databases. Mainly because it doesn't support the same consistency model provided by SQL databases.
In SQL databases, when you send a count(*) query your RDMS make some very educated guesses and take some short paths to discover the number of lines in the table. It does that because reading your entire table to give you this answer would take too much time.
DynamoDB doesn't have means to make these educated guesses. When you want to know how many items one table have the only option it has is to read all of them counting one by one. That is the exact task that the command mentioned in the beginning of this answer does. It scans the entire table counting all the items one by one.
Because of that, when you perform this task it will bill you the entire table read (DynamoDB bills you per reads and writes). And maybe after you started the scan someone put another item in the the table while you are still counting. In that case it will not restart the count because by design DynamoDB is eventually consistent.
I have recently faced a problem on Redshift cluster when table stopped responding.
The guess was that there is a lock on that table, but a query
select * from stl_tr_conflict order by xact_start_ts;
gives me nothing, though judging on AWS documentation stl_tr_conflict table should have records of all transaction issues including locks. But maybe records live there only when lock is alive. I am not sure.
Search in useractivitylog in S3 on words violation and ERROR also is not giving any results. So I still can't figure out why one of tables was not accessible.
I am new to database management so I would appreciate any advice on how to troubleshoot this issue.
When issuing the msck repair table statement, is the table still accessible for querying during the udpate?
I ask because I'm trying to figure out the best update schedule for a relatively large S3 hive table that is used to drive some reports in QuickSight. Will issuing this command break anyone who happens to simultaneously be running a QuickSight report based on this table?
Yes, the table will be available for running queries while you are running MSCK REPAIR TABLE, it's a background process. Queries run while that command is running will see different partitions, though, as the partitions the command discovers will be added as they are found.
Be aware that running MSCK REPAIR TABLE is a very inefficient process, with many partitions it will run for a very long time, and it is not incremental. This doesn't matter for query performance, but if it takes a long time now, it will only ever take longer and longer and might not be a viable long term strategy. There are some other questions here on StackOverflow about it that you can read to find other strategies for keeping your tables up to date.
When you load data to your Amazon Redshift tables, you can check the load status using the table STV_LOAD_STATE.
I would like to know if there's a way to achieve the same, but with the unload operation. In other words, I'd like to know if there's a way to find out the current stage of an unload process.
Unlike loading data into Redshift, Unloading actually has to run a select statement. Therefore it can't tell us a status like it does when it's loading.
e.g if the select statement has to join multiple tables and scan a lot of tables to generate the output then it might take long even though the actual unload part might not be the long part.
So I usually check the query execution steps in AWS console to have a rough idea about where the unload is.
I also check the S3 folder that I am unloading to see if the files start coming in yet. They usually come in batches so it can give you an idea as well.
2021, and we have a solution
STL_UNLOAD_LOG
https://docs.aws.amazon.com/redshift/latest/dg/r_STL_UNLOAD_LOG.html