Copying txt file to Redshift - amazon-web-services

I am trying to copy the text file from S3 to Redshift using the below command but getting the same error.
Error:
Missing newline: Unexpected character 0xffffffe2 found at location 177
copy table from 's3://abc_def/txt_006'
credentials '1234567890'
DELIMITER '|'
NULL AS 'NULL'
NULL AS '' ;
The text file has No header and field delimiter is |.
I tried passing the parameters using: ACCEPTINVCHARS.
Redshift shows same error
1216 error code: invalid input line.
Can anyone provide how to resolve this issue?
Thanks in advance.

Is your file in UTF8 format? if not convert it and try reloading.

I am Assuming path to the text file is correct. Also you generated the text file with some tool and uploaded to redshift manually
I faced the same issue and the issue is with whitespaces .I recommend you to generate the text file by nulling and trimming the whitespaces .
your query should be select RTRIM(LTRIM(NULLIF({columnname}, ''))),.., from {table}. generate the output of this query into text file.
If you are using SQl Server, query out the table using BCP.exe by passing the above query with all the columns and functions
Then use the below copy command after uploading the txt file in S3
copy {table}
from 's3://{path}.txt'
access_key_id '{value}'
secret_access_key '{value}' { you can alternatively use credentials as mentioned above }
delimiter '|' COMPUPDATE ON
removequotes
acceptinvchars
emptyasnull
trimblanks
BLANKSASNULL
FILLRECORD
;
commit;
This solved my problem. Please let us know if you are facing anything else.

Related

DELIMITER Not found during Amazon Copy

I have added a Delimiter ',' but then too I am getting an error.
Code:
"copy %s.%s_tmp
from '%s'
CREDENTIALS 'aws_access_key_id=%s;aws_secret_access_key=%s'
REMOVEQUOTES
ESCAPE
ACCEPTINVCHARS
ENCODING AS UTF8
DELIMITER ','
GZIP
ACCEPTANYDATE
region '%s'"
% (schema, table, s3_path, access_key, secret_key, region)
Error:
InternalError: Load into table 'my_table' failed. Check 'stl_load_errors' system table for details.
In this table in Redshift the error is Delimiter not found
How can I fix this?
One of the raw line is in this format :
1122,"",4332345,"2016-07-28 15:00:09","2032-09-28
15:00:09",19.00,"","some string","","som string","abc","abc","abc"
Try using the MAXERROR parameter in the copy command. IT will succeed you partial load even though some records are in error.
Also try using this version of COPY:
copy tblname(col1,col2,col3...) from s3 path

Error: unknown dialect

I'm using the csv reader in the csv module to read a file in the format.
Filename, Foo, Label
Each record looks as follows.
file1.wav,"[ 1.92849546e+02 2.86156126e+00 -7.96250116e+00
7.29509485e+02 4.79000000e+02 5.51000000e+02]",1
I get the following error when reading the file.
set_ = csv.reader(open(foo), 'rb', delimiter = ',')
Error: unknown dialect
Also I am using python 2.7 on a windows machine.
You are using the csv.reader api wrong
As per the documentation the 2nd argument to csv.reader is dialect and "rb" does not make sense.
Instead you probably intend to do something on these lines:
with open(foo) as input :
reader = csv.reader(foo)
#etc

Weka 3-7 CSVLoader do not work with ";" (semicolon) as field separator

I think that i found a bug in weka 3.7,
When I try to load a csv file using weka.core.converters.CSVLoader with separator ";", I get the following error:
Exception in thread "main" java.io.IOException: number expected, read Token[1;2], line 1
at weka.core.converters.ArffLoader$ArffReader.errorMessage(ArffLoader.java:294)
at weka.core.converters.ArffLoader$ArffReader.getInstanceFull(ArffLoader.java:656)
at weka.core.converters.ArffLoader$ArffReader.getInstance(ArffLoader.java:477)
at weka.core.converters.ArffLoader$ArffReader.readInstance(ArffLoader.java:445)
at weka.core.converters.ArffLoader$ArffReader.readInstance(ArffLoader.java:430)
at weka.core.converters.ArffLoader$ArffReader.(ArffLoader.java:202)
at weka.core.converters.CSVLoader.getDataSet(CSVLoader.java:803)
at de.tuhh.thesis.repower.pcanalysis.BinningWindSpeed.from_CSV_to_ARFF(BinningWindSpeed.java:99)
at de.tuhh.thesis.repower.pcanalysis.Main.main(Main.java:49)
My csv file is:
a;b
1;2
my code is:
CSVLoader loader = new CSVLoader();
File inputFile = new File(csvFileName);
loader.setSource(inputFile);
loader.setFieldSeparator(";");
data = loader.getDataSet();
if I try the same code but changing ";" for "," and using the following file, the program succeeds
a,b
1,2
I really need to work with ";"
Thanks and regards
There is (at least by now) an option to set the field separator:
CSVLoader loader = new CSVLoader();
loader.setFieldSeparator(";");
Just in case someone else stumbles upon this question..

django - docfile adding %0D to url?

I have a weird issue where the docfile.url of a file on our server is adding a %0D (carriage return) to the end of the url. This is only happening to files that I mannually linked. What I mean is, there were about 1,000 files in a directory and I created a CSV file which had the id and filename of each file, and added them to the mysql database with some code. All files uploaded normally through my django app's interface link normally - clicking their link opens the file properly.
Here's a sample of the CSV file:
792,asbuilts/C0010.pdf
793,asbuilts/C0011.pdf
794,asbuilts/C0012.pdf
795,asbuilts/C0013.pdf
796,asbuilts/C0014.pdf
797,asbuilts/C0015.pdf
798,asbuilts/C0016.pdf
799,asbuilts/C0017.pdf
I have all these asbuilt files in the directory static_media/asbuilts/. In mysql I ran this command:
load data local infile '/srv/www/cpm/CPM_CSV_Files/comm_asbuilts.csv' into table systems_asbuilt fields terminated by ',' lines terminated by '\n' (id, docFile);
A sample output of select * from systems_asbuilt is like this:
|846 | asbuilts/C0057.pdf
|847 | asbuilts/C0059.pdf
|848 | asbuilts/C0060.pdf
|849 | asbuilts/C0061.pdf
|850 | asbuilts/C0062.pdf
|851 | asbuilts/C0063.pdf
|852 | asbuilts/C0064.pdf
Everything looks good right?
But when I look at the link created it looks like this:
`www.ourdomain.com/static_media/asbuilts/R0546.pdf%0D'
If I manually delete the %0D from the link, the file opens as expected. Any idea why there's the extra %0D on there? Where is it coming from?
Thanks
My guess is, that this:
lines terminated by '\n'
Should be
lines terminated by '\r\n'
Your result "looks" right because of the client you are using to browse it, but when the record is retrieved, it still has the \r appended to it.
So you can strip it before you load it in the database, or .strip() before you generate the link.

Getting "new-line character seen in unquoted field" when parsing csv document using django-storages

I am trying to parse csv files that have been uploaded to Amazon S3 using django-storages. I keep getting a "Error: new-line character seen in unquoted field - do you need to open the file in universal-newline mode?". The normal work around for this is to open the file with "rU", but that does not seem to work with django storages. If I drop the file directly on the server and open from there it works, I just want to avoid storing the files directly on the server if possible. Here is the code I am using:
import csv
from django.core.files.storage import default_storage as s3_storage
n = 'csvdumps/130331548894.csv'
csvf = s3_storage.open(n, "rU")
csvReader = csv.reader(csvf)
for item in csvReader:
print item
I can see that this is a django-storage reported bug here http://jgrid.org/david/django-storages/issue/80/trying-to-parse-csv-file-from-django but perhaps you can try this:-
csvf = s3_storage.open(n.splitlines(), "rU")
Would also be great if you could share a link to access some of your S3 (sample) csv files though so I can open them to check the line endings.