Redshift - DMS user fails to load data from S3 - amazon-web-services

I prepared a schema and tables using AWS SCT tool so my DMS job will have a landing place for data.
Even thought access to the database and schema has been granted:
GRANT ALL ON DATABASE my_db TO "dms_user";
GRANT ALL ON SCHEMA my_schema TO "dms_user";
GRANT ALL ON ALL TABLES IN SCHEMA my_schema TO "dms_user";
ALTER DEFAULT PRIVILEGES IN SCHEMA my_schema GRANT ALL ON TABLES TO "dms_user";
I'm getting error:
2022-03-25T22:26:48 [TARGET_LOAD ]E: RetCode: SQL_ERROR SqlState: XX000 NativeError: 30 Message: [Amazon][Amazon Redshift] (30) Error occurred while trying to execute a query: [SQLState XX000] ERROR: Load into table 'table_test' failed. Check 'stl_load_errors' system table for details. [1022502] (ar_odbc_stmt.c:4815)
2022-03-25T22:26:48 [TARGET_LOAD ]E: Failed to load schema.table_testfrom S3, file name: LOAD00000001.csv [1022509] (cloud_imp.c:2386)
2022-03-25T22:26:48 [TARGET_LOAD ]E: Failed to load ims_suretyradm_publish.dimaccount from S3, file name: LOAD00000001.csv [1022509] (cloud_imp.c:2386)
stl_load_errors table is empty...
I'll greatly appreciate any help/guidance on this.

I hope it is not the difference between "my_schema" and "my_schema_name" as these are likely different due to obfuscation error.
There are a number of places things can go sideways. Have you checked the permissions after the grant?
select HAS_SCHEMA_PRIVILEGE('dms_user', 'my_schema', 'create');
select HAS_SCHEMA_PRIVILEGE('dms_user', 'my_schema', 'usage');
Another resource is awslabs' Redshift github repo - https://github.com/awslabs/amazon-redshift-utils - there are a number of admin views there that explore permissions. Knowing which step in the process is not doing what you expect will narrow things down.
Also, remember that you will want to change the default ACL for the schema so the new objects created will be useable by the correct people. For example:
ALTER DEFAULT PRIVILEGES IN SCHEMA my_schema GRANT ALL ON TABLES TO dms_user;

Related

Unable to query AWS datashare

Today I created datashare but when I try to run query on it. I get below error:
Query: select * from postgress_home_db.staging_datashare.site limit 100
ERROR: Publicly accessible consumer cannot access object in the database.
I tried to search reason behind it but did not get anything.
Below are the queries I used to create datashare.
Producer cluster:
1. CREATE DATASHARE postgres_home_ds;
2. ALTER DATASHARE postgres_home_ds ADD SCHEMA postgres_home_pod;
3. GRANT USAGE ON DATASHARE postgres_home_ds to NAMESPACE 'xyz'
Consumer Cluster:
CREATE DATABASE postgress_home_db from DATASHARE postgres_home_ds of NAMESPACE 'abc'
CREATE EXTERNAL SCHEMA postgress_home_datashare FROM REDSHIFT DATABASE 'postgress_home_db' SCHEMA 'staging_datashare'
I was able to fix the issue by setting PUBLICACCESSIBLE=True
ALTER DATASHARE datashare_name SET PUBLICACCESSIBLE=True

Permission denied for relation stl_load_errors on Redshift Serverless

I use Amazon Redshift Serverless and Query editor(v2) and I'm having trouble with user permissions.
The following error occurred when importing data(.csv) in S3.
ERROR: Load into table 'x' failed. Check 'sys_load_error_detail' system table for details.
Therefore, I executed the command Select * From stl_load_errors to check the error, but it did not work well.
ERROR: permission denied for relation stl_load_errors
I checked my user permissions using select * from pg_user;and they are presented as follows.
However, I don't see any problem, what is the problem?
(I use hoge.)
usename
usesysid
usecreatedb
usesuper
usecatupd
passwd
valuntil
useconfig
rdsdb
1
true
true
true
********
infinity
NULL
hoge
101
true
true
false
********
NULL
NULL
I have tried to look at the query in the Amazon Redshift Serverless (Preview) dashboard under "Query and Database Monitoring", but could not find any details about the error.
What should I do?
AWS Redshift serverless is not exposing the stl_load_errors table
Perhaps you should try SYS_LOAD_HISTORY and SYS_LOAD_ERROR_DETAIL.
https://docs.aws.amazon.com/redshift/latest/mgmt/serverless-monitoring.html
You can't query STL, STV, SVCS, SVL, and some SVV system tables and views with Amazon Redshift Serverless, except the following:
...see link above....
try
select * from sys_load_error_detail

Error Bigquery/dataflow "Could not resolve table in Data Catalog"

I'm having troubles with a job I've set up on dataflow.
Here is the context, I created a dataset on bigquery using the following path
bi-training-gcp:sales.sales_data
In the properties I can see that the data location is "US"
Now I want to run a job on dataflow and I enter the following command into the google shell
gcloud dataflow sql query ' SELECT country, DATE_TRUNC(ORDERDATE , MONTH),
sum(sales) FROM bi-training-gcp.sales.sales_data group by 1,2 ' --job-name=dataflow-sql-sales-monthly --region=us-east1 --bigquery-dataset=sales --bigquery-table=monthly_sales
The query is accepted by the console and returns me a sort of acceptation message.
After that I go to the dataflow dashboard. I can see a new job as queued but after 5 minutes or so the job fails and I get the following error messages:
Error
2021-09-29T18:06:00.795ZInvalid/unsupported arguments for SQL job launch: Invalid table specification in Data Catalog: Could not resolve table in Data Catalog: bi-training-gcp.sales.sales_data
Error 2021-09-29T18:10:31.592036462ZError occurred in the launcher
container: Template launch failed. See console logs.
My guess is that it cannot find my table. Maybe because I specified the wrong location/region, since my table is specified to be location in "US" I thought it would be on a US server (which is why I specified us-east1 as a region), but I tried all us regions with no success...
Does anybody know how I can solve this ?
Thank you
This error occurs if the Dataflow service account doesn't have access to the Data Catalog API. To resolve this issue, enable the Data Catalog API in the Google Cloud project that you're using to write and run queries. Alternately, assign the roles/datacatalog.

Issue connecting to Databricks table from Azure Data Factory using the Spark odbc connector

​We have managed to get a valid connection from Azure Data Factory towards our Azure Databricks cluster using the Spark (odbc) connector. In the list of tables we do get the expected list, but when querying a specific table we get an exception.
ERROR [HY000] [Microsoft][Hardy] (35) Error from server: error code:
'0' error message:
'com.databricks.backend.daemon.data.common.InvalidMountException:
Error while using path xxxx for resolving path xxxx within mount at
'/mnt/xxxx'.'.. Activity ID:050ac7b5-3e3f-4c8f-bcd1-106b158231f3
In our case the Databrick tables and mounted parquet files stored in Azure Data Lake 2, this is related to the above exception. Any suggestions how to solve this issue?
Ps. the same error appaers when connectin from Power BI desktop.
Thanks
Bart
In your configuration to mount the lake can you add this setting:
"fs.azure.createRemoteFileSystemDuringInitialization": "true"
I haven't tried your exact scenario - however this solved a similar problem for me using Databricks-Connect.

Dataflow needs bigquery.datasets.get permission for the underlying table in authorized view

In a dataflow pipeline, I'm reading from a BigQuery Authorized View:
beam.io.Read(beam.io.BigQuerySource(query = "SELECT col1 FROM proj2.dataset2.auth_view1", use_standard_sql=True))
This is the error which I'm getting:
Error:
Message: Access Denied: Dataset proj1:dataset1: The user xxxxxx-compute#developer.gserviceaccount.com does not have bigquery.datasets.get permission for dataset proj1:dataset1.
proj1:dataset1 has the base table for the view auth_view1.
According to this issue in DataflowJavaSDK, dataflow seems to be directly executing some metadata query against the underlying table.
Is there a fix available for this issue in Apache Beam SDK?
Explicitly setting the Query location is also a solution in the Apache Beam Java SDK, using the withQueryLocation option of BigQueryIO.
It looks like setting the query location is not possible in the Python SDK yet.