AWS Sagemaker unable to parse csv - amazon-web-services

I'm trying to run a training job on AWS Sagemaker, but it keeps failing giving the following error:
ClientError: Unable to parse csv: rows 1-5000, file /opt/ml/input/data/train/KMeans_data.csv
I've selected 'text/csv' as the content type and my CSV file contains 5 columns with numerical content and text headers.
Can anyone point out what could be going wrong here?
Thanks!

From https://docs.aws.amazon.com/sagemaker/latest/dg/cdf-training.html CSV must not have headers:
Amazon SageMaker requires that a CSV file doesn't have a header record ...
Try removing the header row.

Try to make sure that there are no other files other than the training file in the training folder in S3 bucket.

Related

remove backslash from a .csv file to load data to redshift from s3

I am getting an issue when I am loading my file , I have backslash in my csv file
how and what delimited can I use while using my copy command so that I don't get
error loading data from s3 to redshift.
Though I used the QUOTE command but gave me a syntax error so seems like new format
doesn't like the QUOTE key word.
Please if any one can provide a new and correct
command or dow I need to clean or preprocess my data before uploading to s3.
If the
Data size is too big it might not be a very feasible solution
If I have to process it , Do I use pyspark or python(PANDAS) to do it?
Below is the copy command I am using to copy data from s3 to redshift
I tried passing a quote command in the copy command but seems like it doesn't take
that anymore also there is no example in amazon docs on how to do or acheive it
If someone can suggest a command which can replace especial characters while loading
the data
COPY redshifttable from 'mys3filelocation'
CREDENTIALS 'aws_access_key_id=myaccess_key;aws_secret_access_key=mysecretID'
region 'us-west-2'
CSV
DATASET:
US063737,2019-11-07T10:23:25.000Z,richardkiganga,536737838,Terminated EOs,"",f,Uganda,Richard,Kiganga,Business owner,Round Planet DTV Uganda,richardkiganga,0.0,4,7.0,2021-06-1918:36:05,"","",panama-
Disc.s3.amazon.com/photos/…,\"\",Mbale,Wanabwa p/s,Eastern,"","",UACE Certificate,"",drive.google.com/file/d/148dhf89shh499hd9303-JHBn38bh/… phone,Mbale,energy_officer's_id_type,letty
mainzi,hakuna Cell,Agent,8,"","",4,"","","",+647739975493,Feature phone,"",0,Boda goda,"",1985-10-12,Male,"",johnatlhnaleviski,"",Wife

Loading data into redshft database

I have five JSON files in one folder in amazon s3. I am trying to load all five files from s3 into redshift using copy command. I am getting an error while loading one file from s3 to redshift. Is there any way in redshift to skip loading that file and load the next file.
Use the MAXERROR parameter in the COPY command to increase the number of errors permitted. This will skip over any lines that produce errors.
Then, use the STL_LOAD_ERRORS table to view the errors and diagnose the data problem.

SageMaker RCF Data

I have a DynamoDB table filled with nice data. I use Datapipeline to extract this to S3 and it generates a folder with 3 files.
1) "139xx-x911-407x-83xx-06x5x659xx16" that contains all DB data in this format:
{"TimeStamp":{"s":"1539699960"},"SystemID":{"n":"1001"},"AccMin":{"n":"497"},"AccMax":{"n":"509"},"CustomerID":{"n":"10001"},"SensorID":{"n":"101"}}
2) "manifest"
{"name":"DynamoDB-export","version":3,
entries: [
{"url":"s3://cxxxx/2018-10-18-15-25-02/139xx-x911-407x-83xx-06x5x659xx16","mandatory":true}
]}
3) "_SUCCESS" No data inside.
I then go to SageMaker -> Training Jobs -> Create Training Job. Here I fill in everything to create a Random Cut Forest model, and point it towards the above data (I have tried both manifest file and the bigger data-file.
The training fails with error:
"ClientError: No data was found. Please make sure training data is
provided."
What am I doing wrong?
Thank you for your interest in SageMaker.
The manifest is optional, but if provided it should conform to the schema described at https://docs.aws.amazon.com/sagemaker/latest/dg/API_S3DataSource.html . Also, RandomCutForest does not support input data in JSON format. Only protobuf and CSV are supported, see https://docs.aws.amazon.com/sagemaker/latest/dg/randomcutforest.html
In order to get training working you have to convert input data to CSV or protobuf format and set content_type value appropriately. If you want to use a manifest file, then S3 location should point to that file and context has to be fixed to conform the schema. You can however remove the manifest and point S3 location to s3://bucket/path/to/data/.
I hope this helps.
Regards,
Yury

Parsing Error while uploading CSV files to AWS Neptune

I want to solve Parsing Error occurring in uploading CSV files to AWS Neptune.
The problem may be occurred by column name and its type, but I do not know what types are right to write in the header.
I transformed types of all the data as string before uploading CSVs.
Problem does not occur:"~id","pv_time:String","order_num:String","staff_num:String","~label"
Ploblem occurs:"order_num","order_from:String","order_to:String","station_name:String","~label"
The ~id and ~label headers are required.

.csv upload not working in Amazon Web Services Machine Learning - AWS

I have uploaded a simple 10 row csv file (S3) into AWS ML website. It keeps giving me the error,
"We cannot find any valid records for this datasource."
There are records there and Y variable is continuous (not binary). I am pretty much stuck at this point because there is only 1 button to move forward to build Machine Learning. Does any one know what should I do to fix it? Thanks!
The only way I have been able to upload .csv files to S3 created on my own is by downloading an existing .csv file from my S3 server, modifying the data, uploading it then changing the name in the S3 console.
Could you post the first few lines of contents of the .csv file? I am able to upload my own .csv file along with a schema that I have created and it is working. However, I did have issues in that Amazon ML was unable to create the schema for me.
Also, did you try to save the data in something like Sublime, Notepad++, etc. in order to get a different format? On my mac with Microsoft Excel, the CSV did not work, but when I tried LibreOffice on my Windows, the same file worked perfectly.