Why do I get the S3ServiceException error when loading AWS Redshift from S3? - amazon-web-services

I'm getting an error when trying to load a table in Redshift from a CSV file in S3. The error is:
error: S3ServiceException:All access to this object has been disabled,Status 403,Error AllAccessDisabled,Rid FBC64D9377CF9763,ExtRid o1vSFuV8SMtYDjkgKCYZ6VhoHlpzLoBVyXaio6hdSPZ5JRlug+c9XNTchMPzNziD,CanRetry 1
code: 8001
context: Listing bucket=amazonaws.com prefix=els-usage/simple.txt
query: 1122
location: s3_utility.cpp:540
process: padbmaster [pid=6649]
The copy statement used is:
copy public.simple from 's3://amazonaws.com/mypath/simple.txt' CREDENTIALS 'aws_access_key_id=xxxxxxx;aws_secret_access_key=xxxxxx' delimiter ',';
As this is my first attempt at using Redshift and S3, I've kept the simple.txt file (and its destination table) a single field record. I've run the copy in both Aginity Workbench and SQL Workbench with the same results.
I've clicked the link in the S3 file's property tab and it downloads the simple.txt file - so it appears the input file is accessible. Just to be sure, I've given it public access.
Unfortunately, I don't see any addition information that would be helpful in debugging this in the Redshift Loads tab.
Can anyone see anything I'm doing incorrectly?

Removing the amazonaws.com from the URL fixed the problem. The resulting COPY statement is now:
copy public.simple from 's3://mypath/simple.txt' CREDENTIALS 'aws_access_key_id=xxxxxxx;aws_secret_access_key=xxxxxx' delimiter ',';

You can receive the same error code if you are on an IAM role and use the IAM metadata for your aws_access_key and aws_secret_access_key. Per the documentation, the pattern to follow in this case includes a token from the instance. Both the IAM role's access keys and tokens can be found in the metadata here: http://169.254.169.254/latest/meta-data/iam/security-credentials/{{roleName}}.
copy table_name
from 's3://objectpath'
credentials 'aws_access_key_id=<temporary-access-key-id>;aws_secret_access_key=<temporary-secret-access-key>;token=<temporary-token>';

Related

'Cannot COPY into nonexistent table' error but table exists in Amazon Redshift

I setup a table in Redshift and now want to populate it with data from an s3 bucket in a different region. I'm using the COPY command, but I get the error:
"psycopg2.errors.InternalError_: Cannot COPY into nonexistent table customcontent_table"
I can't figure out how to fix it since the table clearly already exists. Is there an error in my syntax? My code:
sql = "copy customcontent_table from 'test/2021/03/29/20/20/CustomContent.snappy.parquet' credentials 'aws_access_key_id=AA;aws_secret_access_key=zz' format parquet region 'us-west-2';"
cur = con.cursor()
cur.execute("begin;")
cur.execute(sql)
cur.execute("commit;")
con.close()
So your reference to the S3 object doesn't look correct. Should be something like (per AWS docs):
copy listing
from 's3://mybucket/data/listings_pipe.txt'
access_key_id '<access-key-id>'
secret_access_key '<secret-access-key'
...;
You seem have only the object key but not the s3:// prefix and the bucket name. I don't think this is the cause of this error but you will want to get it fixed.
My initial thought on why you are getting this error message is because the table is not being found by this session. Redshift sessions have a concept of "search path" which tells the current session where to look for tables (which schemas). If this is the case then the easiest solution (or at least the simplest to explain) is just to add the schema of the table to the COPY command:
copy schema_name.customcontent_table from ...
This will tell Redshift exactly where to find the table. If you want to set the search path you can read about it here - https://docs.aws.amazon.com/redshift/latest/dg/r_search_path.html
If this isn't the issue then we'll need to dig deeper.

Redshift copy from Parquet manifest in S3 fails and says MANIFEST parameter requires full path of an S3 object

I'm using Firehose to put records in Parquet format in an S3 bucket. I've manually defined a glue table.
So I've got a manifest like
{
"entries": [
{"url":"s3://my-bucket/file1.parquet"},
{"url":"s3://my-bucket/file2.parquet"}
]
}
And a copy command like
COPY schema_name.table_name
FROM 's3://my-bucket/manifest.json'
CREDENTIALS 'aws_iam_role=arn:aws:iam::123456:role/RoleWithPermissionToRedshiftAndBucket'
PARQUET
MANIFEST;
And it gives this mysterious error that has 0 results on Google.
[XX000][500310] [Amazon](500310) Invalid operation: COPY with MANIFEST parameter requires full path of an S3 object.
Details:
-----------------------------------------------
error: COPY with MANIFEST parameter requires full path of an S3 object.
code: 8001
context:
query: 23514459
location: scan_range_manager.cpp:795
process: padbmaster [pid=108497]
-----------------------------------------------;
It seems to me that I am definitely specifying the full path, so I'm not sure what's up.
One thing that was wrong was that the bucket was in a different region, which would also prevent it from working.
One reason you might get this error message is if the bucket is in another aws account.
But what actually fixed it for me was adding content_length to the manifest, since it is required for parquet.
{
"entries": [
{
"url":"s3://my-bucket/file1.parquet",
"mandatory":true,
"meta":{
"content_length":2893394
}
},
{
"url":"s3://my-bucket/file2.parquet",
"mandatory":true,
"meta":{
"content_length":2883626
}
}
]
}
Apparently, if you leave content_length out, you'll get an unrelated error message. This guy made the same mistake and got an error message saying
File has an invalid version number
Error while loading parquet format file into Amazon Redshift using copy command and manifest file
Make sure you have the correct CREDENTIALS or IAM_ROLE set.
I fixed this exact same error
COPY with MANIFEST parameter requires full path of an S3 object by changing my IAM_ROLE - from one that didn't have permissions to load to this table.
(Redshift error messages in this area are no good).

Azure SAS token for container throws 'invalid Signature Size' error

I am trying to list and download blobs from a container on Azure. It works perfectly fine when I try to do so using storage account access key. However, fails when use a SAS token. I generated the SAS token the with the following PowerShell script:
$storageContext = New-AzureStorageContext -StorageAccountName "myAccount" -StorageAccountKey "<account key>"
$permission = "rwdl"
$sasToken = New-AzureStorageContainerSASToken -Name "myContainer" -Policy "testPolicy" -Context $storageContext >>sastoken.txt
"
I get the following result:
?sv=2017-04-17&sr=c&si=testPolicy&sig=dbS680%2FXgPp4o%2BQCCzpYzGZszCnDHVjCkdHZRf6KDeg%3D
I appended the sas token with resource URI to get:
https://myAccount.blob.core.windows.net/myContainer?sv=2017-04-17&sr=c&si=testPolicy&sig=dbS680%2FXgPp4o%2BQCCzpYzGZszCnDHVjCkdHZRf6KDeg%3D
and ran the following CLI command:
az storage blob list --container-name myContainer --account-name myAccount --auth-mode key --debug --sas-token "https://myAccount.blob.core.windows.net/myContainer?sv=2017-04-17&sr=c&si=testPolicy&sig=dbS680%2FXgPp4o%2BQCCzpYzGZszCnDHVjCkdHZRf6KDeg%3D" >> bloblist.txt
I get the following error:
azure.multiapi.storage.v2018_03_28.common.storageclient :
Client-Request-ID=0f7a 7762-3729-11e9-8b32-ffc4c9592d0a Retry policy
did not allow for a retry: Server- Timestamp=Sat, 23 Feb 2019 05:08:30
GMT, Server-Request-ID=21f07a6a-f01e-00e9-32 35-cb7d5c000000, HTTP
status code=403, Exception=Server failed to authenticate t he request.
Make sure the value of Authorization header is formed correctly incl
uding the signature. ErrorCode: AuthenticationFailedAuthenticationFailedServer failed to auth enticate the request. Make sure the value of
Authorization header is formed corr ectly including the
signature.RequestId:21f07a6a-f01e-00e9-3235-cb7d5c000000Time
:2019-02-23T05:08:30.7149353ZSignature
size is invalid.
You do not have the required permissions needed to perform this operation.
Depending on your operation, you may need to be assigned one of the following ro
les:
"Storage Blob Data Contributor (Preview)"
"Storage Blob Data Reader (Preview)"
"Storage Queue Data Contributor (Preview)"
"Storage Queue Data Reader (Preview)"
If you want to use the old authentication method and allow querying for the righ
t account key, please use the "--auth-mode" parameter and "key" value.
Event: CommandInvoker.OnFilterResult [] 'CommandResultItem' object is
not iterable Traceback (most recent call last): File
"C:\Users\VSSADM~1\AppData\Local\Temp\pip-install-r8nye8gm\knack\knack\cl
i.py", line 212, in invoke File
"C:\Users\VSSADM~1\AppData\Local\Temp\pip-install-r8nye8gm\knack\knack\ou
tput.py", line 132, in out File
"C:\Users\VSSADM~1\AppData\Local\Temp\pip-install-r8nye8gm\knack\knack\ou
tput.py", line 38, in format_json TypeError: 'CommandResultItem'
object is not iterable telemetry.save : Save telemetry record of
length 2499 in cache
I have tried generating a storage account level SAS portal, but didnt find any luck.
Please help!
For anyone else that comes along with the same azcopy error with 403 AuthenticationFailed but the detail shows Signature size is invalid - I had the same problem when trying to script azcopy from a windows .bat file. When you get the SAS url, there will be percent signs in the string. You must double up the percent signs to "escape" them when running from a .bat file. e.g. wherever you see a % in the url, make it %% - hope this helps!
Funny thing is I remembered to do this in the first 3 azcopy scripts I wrote and a few weeks later made a 4th one for a new storage account and couldn't figure out why i kept getting 403. I suppose this post will be a reminder to myself the next time I forget again :)
The reason you're getting this error is because you're using full SAS URL instead of SAS token.
Please change the following:
az storage blob list --container-name myContainer --account-name myAccount --auth-mode key --debug --sas-token "https://myAccount.blob.core.windows.net/myContainer?sv=2017-04-17&sr=c&si=testPolicy&sig=dbS680%2FXgPp4o%2BQCCzpYzGZszCnDHVjCkdHZRf6KDeg%3D" >> bloblist.txt
to
az storage blob list --container-name myContainer --account-name myAccount --auth-mode key --debug --sas-token "?sv=2017-04-17&sr=c&si=testPolicy&sig=dbS680%2FXgPp4o%2BQCCzpYzGZszCnDHVjCkdHZRf6KDeg%3D" >> bloblist.txt
And you should be able to list blobs.

use SQL Workbench import csv file to AWS Redshift Database

I'm look for a manual and automatic way to use SQL Workbench to import/load a LOCAL csv file to a AWS Redshift database.
The manual way could be a way that click a navigation bar and select a option.
The automatic way could be some query codes to load the data, just run it.
here's my attempt:
there's an error "my target table in AWS is not found." but I'm sure the table exists, anyone know why?
WbImport -type=text
-file ='C:\myfile.csv'
-delimiter = ,
-table = public.data_table_in_AWS
-quoteChar=^
-continueOnError=true
-multiLine=true
You can use wbimport in SQL Workbench/J to import data
For more info : http://www.sql-workbench.net/manual/command-import.html
Like it was mentioned in the comments COPY command provided by Redshift is the optimal solution. You can use copy from S3, EC2 etc.
S3 Example:
copy <your_table>
from 's3://<bucket>/<file>'
access_key_id 'XXXX'
secret_access_key 'XXXX'
region '<your_region>'
delimiter '\t';
For more examples:
https://docs.aws.amazon.com/redshift/latest/dg/r_COPY_command_examples.html

Copying data from S3 to Redshift hangs

I've been trying to load data into Redshift for the last couple of days with no success. I have provided the correct IAM role to the cluster, I have given access to S3, I am using the COPY command with either the AWS credentials or the IAM role and so far no success. What can be the reason for this? It has come to the point that I don't have many options left.
So the code is pretty basic, nothing fancy there. See below:
copy test_schema.test from 's3://company.test/tmp/append.csv.gz'
iam_role 'arn:aws:iam::<rolenumber>/RedshiftCopyUnload'
delimiter ',' gzip;
I didn't put any error messages because there are none. The code simply hangs and I have left it running for well over 40 minutes with no results. If I go into the Queries section in Redshift I dont see any abnormal. I am using Aginity and SQL Workbench to run the queries.
I also tried to manually insert queries in Redshift and seems that works. COPY and UNLOAD do not work and even though I have created Roles with access to S3 and associated with the cluster I still get this problem.
Thoughts?
EDIT: Solution has been found. Basically it was a connectivity problem within our VPC. A VPC endpoint had to be created and associated with the subnet used by Redshift.
I agree with JohnRotenstein that, there needs more information to provide the answer. I would suggest you to take simple data points and simple table.
Here are step-by-step solution, I hope by doing that, you should be able to resolve your issue.
Assume here is your table structure.
Here I'm doing most of data types to prove my point.
create table sales(
salesid integer,
commission decimal(8,2),
saledate date,
description varchar(255),
created_at timestamp default sysdate,
updated_at timestamp);
Just to make it simple, here is your data file resides in S3.
Content in CSV(sales-example.txt)
salesid,commission,saledate,description,created_at,updated_at
1|3.55|2018-12-10|Test description|2018-05-17 23:54:51|2018-05-17 23:54:51
2|6.55|2018-01-01|Test description|2018-05-17 23:54:51|2018-05-17 23:54:51
4|7.55|2018-02-10|Test description|2018-05-17 23:54:51|2018-05-17 23:54:51
5|3.55||Test description|2018-05-17 23:54:51|2018-05-17 23:54:51
7|3.50|2018-10-10|Test description|2018-05-17 23:54:51|2018-05-17 23:54:51
Run following two command using the psql terminal or any sql connector. Make sure to run second command as well.
copy sales(salesid,commission,saledate,description,created_at,updated_at) from 's3://example-bucket/foo/bar/sales-example.txt' credentials 'aws_access_key_id=************;aws_secret_access_key=***********' IGNOREHEADER 1;
commit;
I hope, this should help you in debugging your issue.