Is it safe to truncate ACT_RU_METER_LOG table in Camunda BPM? - camunda

The ACT_RU_METER_LOG table contains 10 million rows. I want to upgrade the Camunda from 7.10.0 to 7.17 and as part of the upgrade there are few alter table statements on the mentioned table. As expected these alter tables take huge time, hence wondering if I can truncate the table. I am aware that the metrics can be disabled, but the existing data should be cleaned explicitly.
Thanks in advance.

Related

Schedule the creation a partitioned table overwriting an existing table in BigQuery GCP

Yesterday I scheduled daily the overwriting of a table. The new table will be partitioned as well as the overwritten one... It did not run at the corresponding time, nor gave an error... It just did not started.
My feeling is that it has to be with the partitioning option. To mention that the casting of the field date_formatted that will be used as partition field works fine.
As far as I know, when scheduling a query you can't use the create or replace table T partitioned by column C as select...
You starts from the select... clause, as shows in the image, and I don't know if the problem comes from here.
PS: I had no troubles scheduling the appending to a partitioned by day table with this same procedure.
the destination table is in the same dataset.
if the very same query is scheduled to deliver the results in a table with the same name, but in a different dataset (located in the same project), it works.
by the way, the input table and the output table never were the same.

Amazon redshift vacuum reindex

I am running a script to get the tables to run vacuum reindex based on interleaved_skew value from svv_interleaved_columns which represents the skew ratio of interleaved columns(interleaved_skew > 1.4) as mentioned in AWS guide.
The value 1.00 for interleaved_skew specifies that all the rows are in sorted order and no re index is required.
Now that I have run a vacuum reindex on a table of 8gb data, I expect the interleaved_skew value to go down but its behaving ackwardly and is increasing sometimes. And since my script is picking the tables to run vacuum reindex based on interleaved_skew and as the value is not going down to 1.00 the same tables are being picked and re index is being run again and that is killing most of my time.
I expect the tables after going through vacuum reindex and there is no flow of data into the table then that particular table should not go through vacuum reindex again as there won't be any skew.
But, the tables are being picked again.
Thanks in advance,
Any explanation on stv_interleaved_counts table & how and when the values in svv_interleaved_columns changes would help me greatly
Please have a look at our "AnalyzeVacuumUtility" on GitHub. It may provide all the functionality you are looking for.
As far as Interleaved sort keys go, I recommend this style of sort key for only for large tables that are not regularly updated. Compound sort keys will perform better in most circumstances.
Please review our "Advanced Table Design Playbook: Compound and Interleaved Sort Keys" to help with choosing the right style.

Amazon Athena scans lots of data when query involves only partitions

I have a table on Athena partitioned by day (huge table, TB of data). There's no day column on the table, at least not explicitly. I would expect that a query like the following:
select max(day) from my_table
would scan virtually no data. However, Athena reports that several hundreds of GB are scanned. Any idea why?
===== EDIT 2021-01-14 ===
I've recently bumped on this issue again. It turns out that when the underlying data is parquet then operations on partitions don't consume data. For other data formats that I've tried (including ORC) there is an associated data cost. It doesn't make any sense to me.
I don't know the answer for a fact but I guesstimate:
Athena just does not have the optimization of looking at the partition names only, when only they are queried. This is clear from its behaviour. So it scans everything.
Parquet has min/max for every column whereas ORC does it only if an index is present, AFAIU. Thus for Parquet Athena's query optimizer directs it to look directly at these rollup values, i.e., no scan is performed. It's different for ORC.
I know is a little late to answer this question for you Nicolas but it is important to keep here also some possible solutions.
Unfortunately, this is the way Athena works, Athena will read all data as a tableScan just to list the partitions values.
A possible workaround that works perfectly here is using the metadata of the partition instead of the data information, for example:
Instead of using this syntax:
select max(day) from my_table
Try to use this syntax:
SELECT day FROM my_schema."my_table$partitions" ORDER BY day DESC LIMIT 1
This second statement will read just metadata information and returns the same data you need.
It does not depend on the format but on the compression algorithm used. Snappy for ORC mostly & GZIP for parquet. This is what makes the difference

Reading (even joining) a very large (1.1bn row) table in Enterprise Guide from Teradata

Hopefully you guys can help with what I'm hoping is quite a simple question for those in the know!
I live (well, work) in SAS Enterprise Guide and am trying to perform a simple left join against a table in Teradata.
The table is extremely large (700+ columns, 1.1bn rows) and so far I have been connecting via a LIBNAME statement at the top of my program, followed by the usual PROC SQL to read the data.
The issue I am having is its is extremely slow. I performed the join successfully using 90 rows on the left table and it took 3 hours to complete. The real table I want to use has something like 15,000 rows.
I have tried to connect via the SQL Pass-Through method, but this throws a hosts file error, which I can't fix due to corporate security limitations.
Has anyone had any experience performing this kind of task?
I should mention that I can run a simple select * query in Teradata SQL Assistant is just over 1 minute (16,666,666 obs/s!) so the limitation must be somewhere between SAS/Teradata, or even SAS itself.
I'm sorry I haven't posted actual code snippets as they're on my work machine but this has been bugging me for ages so thought I'd see if I'm missing any tricks.
Thanks in advance for your help.
So you're joining a SAS data set to a Teradata table and want to return the matching records. You'll want to use SAS's DBMASTER= data set option. It designates which of the tables is larger. By telling SAS this, it knows which table to move.
Here I assume librefs have already been assigned and that the Teradata table is larger--more obs--than the SAS data set.
proc sql threads; select tdTable.* from sastables.sasTable1, td.tdTable(dbmaster=yes)
where tdTable.idNum=sasTable1.idNum; quit;
If by chance your SAS data set is larger, you'll want to use the MULTI_DATASRC_OPT= option. Either google these terms or look in the SAS/Access to Relational Databases manual. It's pretty good.
Good luck.
Have you considered creating a volatile table in Teradata? Since this is created in your spool allocation you shouldn't need explicit permissions to create the table. Once created you can load the SAS data set into the Volatile table and collect statistics on the table's join columns and filter columns. This will help the optimizer understand the demographics about your "small" table. The volatile table will only persist for the duration of your session and is not accessible across multiple sessions.
Then rewrite your SAS code to push-down the SQL to Teradata joining the large table to your volatile table. The results can be returned to SAS and loaded into another data set.
CREATE VOLATILE TABLE MyTable, NO FALLBACK
( ColA SMALLINT NOT NULL,
ColB VARCHAR(10) NOT NULL
) PRIMARY INDEX (ColA)
ON COMMIT PRESERVE ROWS /* This is important */
;
The primary index is how Teradata distributes the data and accesses the data. Tables distributed on the same column will join "AMP local" and will not require a redistribution. This is not always possible, as your primary index selection has to consider even distribution as well as access path. The primary index does not have to be unique, but can be.
Hope this helps.

NULL TO NOT NULL ALTER TABLE IN SAS

I'm having difficulties running a SAS Data Intergration job.
One column needs to be removed from the target's structure,
but cannot be removed because of the NULL constraint.
Do I need to remove the constraint first?
How do I do that?
Thank you in advance,
Gal.
Does the physical table exists without the column? If so, then the constraint is only in the metadata. Recreate the metadata and you should be fine.
If the physical table exists with the column, then you need to recreate that table without the column. You will still need to refresh the table metadata for DI Studio to pick it up.