I'm having difficulties running a SAS Data Intergration job.
One column needs to be removed from the target's structure,
but cannot be removed because of the NULL constraint.
Do I need to remove the constraint first?
How do I do that?
Thank you in advance,
Gal.
Does the physical table exists without the column? If so, then the constraint is only in the metadata. Recreate the metadata and you should be fine.
If the physical table exists with the column, then you need to recreate that table without the column. You will still need to refresh the table metadata for DI Studio to pick it up.
Related
The ACT_RU_METER_LOG table contains 10 million rows. I want to upgrade the Camunda from 7.10.0 to 7.17 and as part of the upgrade there are few alter table statements on the mentioned table. As expected these alter tables take huge time, hence wondering if I can truncate the table. I am aware that the metrics can be disabled, but the existing data should be cleaned explicitly.
Thanks in advance.
Yesterday I scheduled daily the overwriting of a table. The new table will be partitioned as well as the overwritten one... It did not run at the corresponding time, nor gave an error... It just did not started.
My feeling is that it has to be with the partitioning option. To mention that the casting of the field date_formatted that will be used as partition field works fine.
As far as I know, when scheduling a query you can't use the create or replace table T partitioned by column C as select...
You starts from the select... clause, as shows in the image, and I don't know if the problem comes from here.
PS: I had no troubles scheduling the appending to a partitioned by day table with this same procedure.
the destination table is in the same dataset.
if the very same query is scheduled to deliver the results in a table with the same name, but in a different dataset (located in the same project), it works.
by the way, the input table and the output table never were the same.
Hopefully you guys can help with what I'm hoping is quite a simple question for those in the know!
I live (well, work) in SAS Enterprise Guide and am trying to perform a simple left join against a table in Teradata.
The table is extremely large (700+ columns, 1.1bn rows) and so far I have been connecting via a LIBNAME statement at the top of my program, followed by the usual PROC SQL to read the data.
The issue I am having is its is extremely slow. I performed the join successfully using 90 rows on the left table and it took 3 hours to complete. The real table I want to use has something like 15,000 rows.
I have tried to connect via the SQL Pass-Through method, but this throws a hosts file error, which I can't fix due to corporate security limitations.
Has anyone had any experience performing this kind of task?
I should mention that I can run a simple select * query in Teradata SQL Assistant is just over 1 minute (16,666,666 obs/s!) so the limitation must be somewhere between SAS/Teradata, or even SAS itself.
I'm sorry I haven't posted actual code snippets as they're on my work machine but this has been bugging me for ages so thought I'd see if I'm missing any tricks.
Thanks in advance for your help.
So you're joining a SAS data set to a Teradata table and want to return the matching records. You'll want to use SAS's DBMASTER= data set option. It designates which of the tables is larger. By telling SAS this, it knows which table to move.
Here I assume librefs have already been assigned and that the Teradata table is larger--more obs--than the SAS data set.
proc sql threads; select tdTable.* from sastables.sasTable1, td.tdTable(dbmaster=yes)
where tdTable.idNum=sasTable1.idNum; quit;
If by chance your SAS data set is larger, you'll want to use the MULTI_DATASRC_OPT= option. Either google these terms or look in the SAS/Access to Relational Databases manual. It's pretty good.
Good luck.
Have you considered creating a volatile table in Teradata? Since this is created in your spool allocation you shouldn't need explicit permissions to create the table. Once created you can load the SAS data set into the Volatile table and collect statistics on the table's join columns and filter columns. This will help the optimizer understand the demographics about your "small" table. The volatile table will only persist for the duration of your session and is not accessible across multiple sessions.
Then rewrite your SAS code to push-down the SQL to Teradata joining the large table to your volatile table. The results can be returned to SAS and loaded into another data set.
CREATE VOLATILE TABLE MyTable, NO FALLBACK
( ColA SMALLINT NOT NULL,
ColB VARCHAR(10) NOT NULL
) PRIMARY INDEX (ColA)
ON COMMIT PRESERVE ROWS /* This is important */
;
The primary index is how Teradata distributes the data and accesses the data. Tables distributed on the same column will join "AMP local" and will not require a redistribution. This is not always possible, as your primary index selection has to consider even distribution as well as access path. The primary index does not have to be unique, but can be.
Hope this helps.
i need to use a append object after a series of join that have a conditional run... So the join step may be not execute if the condition is not verified and his work physical dataset will not be created.
The problem is that the append step take an error if one o more input physical dataset are not created.
Is there a smart way to create a physical empty table from a metadata structure of the works table of the joins or to use the append with some non-created datasets?
The create table with the list of all field is not a real solution because i've to replicate it per 8 different joins and then replicate the job 10 times...
Thanks to all
Roberto
Thank you for your comments.
What you should do:
Amend your conditional node so that it would on positive condition to create a global macro variable with value of MAX. On negative condition to create the same variable with value of 0.
Replace offending SQL step with "CREATE TABLE" node
In the options for "CREATE TABLE", specify macro variable for "MAXIMUM OUTPUT ROWS (OUTOBS)". See the picture below for example of those options.
So now when your condition is not met, you will always end up with an empty table. When condition is met, the step executes normally.
I must say my version of DI Studio is a bit old. In my version SQL node doens't allow passing macro variables to SQL options, only integers can be typed in. Check if your version allows it because if it does, then you can amend existing SQL step and avoid replacing it with another node.
One more thing, you will get a warning when OUTOBS options is less then the resulting would be dataset.
Let me know if you have any questions.
See the picture for create table options:
At the end i've created another step that extract 0 row from the source table by the condition 1=0 in the where tab. In this way i have a empty table that i can use with a data/set in the post sql of the conditional run if the work table of the join does not exist.
This is not a solution but a valid work around.
The default behaviour when importing data from a database table (such as SQL Server) is to bring in all columns and then select which columns you would like to remove.
Is there a way to do the reverse? ie Select which columns you want from a table? Preferably without using a Native SQL solution.
M:
let
db = Sql.Databases("sqlserver.database.url"){[Name="DatabaseName"]}[Data],
Sales_vDimCustomer = db{[Schema="Sales",Item="vDimCustomer"]}[Data],
remove_columns = Table.RemoveColumns(Sales_vDimCustomer,{"Key", "Code","Column1","Column2","Column3","Column4","Column5","Column6","Column7","Column8","Column9","Column10"})
in
remove_columns
The snippet above shows the connection and subsequent removal.
Compared to the native SQL way way:
= Sql.Database("sqlserver.database.url", "DatabaseName", [Query="
SELECT Name,
Representative,
Status,
DateLastModified,
UserLastModified,
ExtractionDate
FROM Sales.vDimCustomer
"])
I can't see much documentation on the }[Data], value in the step so was hoping maybe that I could hijack that field to specify which fields from that data.
Any ideas would be great! :)
My first concern is that when this gets compiled down to SQL, it gets sent as two queries (as watched in ExpressProfiler).
The first query removes the selected columns and the second selects all columns.
My second concern is that if a column is added to or removed from the database then it could crash my report (additional columns in Excel Tables jump your structured table language formulas to the wrong column). This is not a problem using Native SQL as it just won't select the new column and would actually crash if the column was removed which is something I would want to know about.
Ouch that was actually easy after I had another think and a look at the docs.
let
db = Sql.Databases("sqlserver.database.url"){[Name="DatabaseName"]}[Data],
Sales_vDimCustomer = Table.SelectColumns(
(db{[Schema="Sales",Item="vDimCustomer"]}[Data],
{
"Name",
"Representative",
"Status",
"DateLastModified",
"UserLastModified",
"ExtractionDate"
}
)
in
Sales_vDimCustomer
This also loaded much faster than the other way and only generated one SQL requested instead of two.