when i try create model with "decision-tree" dataset example, generated the below error. WSO2 Machine Learner version is: 1.2.2 .
[2017-01-11 18:21:02,284] ERROR {org.wso2.carbon.ml.core.impl.MLModelHandler} - Failed to build the model [id] 9
org.wso2.carbon.ml.core.exceptions.MLModelBuilderException: An error occurred while building logistic regression model: For input string: ",06"
at org.wso2.carbon.ml.core.spark.algorithms.SupervisedSparkModelBuilder.buildLogisticRegressionModel(SupervisedSparkModelBuilder.java:322)
suggestion?
Thanks,
Emanuele
resolved - the problem is caused by italian setting system operation. That use "comma separated" character for decimal and not "dot" character. I have resolved changed the settings.
Related
I have a jupyter notebook in SageMaker in which I want to run the XGBoost algorithm. The data has to match 3 criteria:
-No header row
-Outcome variable in the first column, features in the rest of the columns
-All columns need to be numeric
The error I get is the following:
Error for Training job xgboost-2019-03-13-16-21-25-000:
Failed Reason: ClientError: Blankspace and colon not found in firstline
'0.0,0.0,99.0,314.07,1.0,0.0,0.0,0.0,0.48027846,0.0...' of file 'train.csv'
In the error itself it can be seen that there are no headers, the output is the first column (it just takes 1.0 and 0.0 values) and all features are numerical. The data is stored in its own bucket.
I have seen a related question in GitHub but there are no solution there. Also, the example notebook that Amazon has does not take care of change the default sep or anything when saving a dataframe to csv for using it later on.
The error message indicated XGBoost was expecting the input data set as libsvm format instead of csv. SageMaker XGBoost by default assumed the input data set was in libsvm format. For using input data set in csv, please explicitly specify content-type as text/csv.
For more information: https://docs.aws.amazon.com/sagemaker/latest/dg/xgboost.html#InputOutput-XGBoost
One of my source table(oracle) date column having the value 5/3/2013 6:00:51.134000000 AM. I am trying to pull the same into to my target(oracle), but my target converted the micro seconds as "zeros" and loading the value 5/3/2013 06.00.51.000000000 AM. Both my source & target column has declared as timestamp. I have set the date format like MM/DD/YYYY HH24:MI:SS.US in session properties
Can anyone help to me to get the date with micro seconds? I am using informatica 10.2.0 Thx
You can try the workaround suggested at the below link to process high precision dates. You will need to modify the source and target definition field lengths to (29,9).
Link
The way to resolve this is to increase the precision of the source or target definition to precision 29 and scale 9 after the source/target is imported into Informatica. This will handle the digits in milliseconds without converting them to all zeros.
how to resolve this error:Read: Data overflow/conversion error for [some field] .I am getting this error after running the mapping in informatica data quality 9.1.0
Please try the below steps:
1) Check for the columns which may have date values in them. If the datatypes are not compatible in any of the transformations, error may come.
2) Always debug or run the data viewer for each of the transformation before you run the IDQ mapping. It will give you an overview of the data and issues if any.
I'm currently doing a bulk load from Greenplum to SAS. Initially there was one field with a backslash "\" at the end of the column causing to throw an error during loading. To resolve it I changed the format from TEXT to CSV and worked fine. But loading more data I encountered this error:
gpfdist error - line too long in file
I've been doing some search but couldn't assess if the cause is due to that the max_length to set when starting the gpfdist service. I also saw that there is a limit for Windows which is 1MB? Greatly appreciate your help.
By the way here are some additional info which might help:
-Greenplum version: 4.2.1.0 build 3
-Gpfdist installed in Windows along with SAS Applications
-Script submitted to Greenplum based on SAS Logs:
CREATE EXTERNAL TABLE ( ) LOCATION ('gpfdist://:8081/fileout.dat')
FORMAT 'CSV' ( DELIMITER '|' NULL '\N') ENCODING 'LATIN1'
Thanks!
"Line too long" sorts of errors usually indicate that you've got extra delimiters buried in VARCHAR/TEXT columns that throw the parsing of the file off.
Another possibility is that you've got hidden control characters, extra linebreaks or other nasty stuff hidden in your file that again is throwing your formatting off. Gpfdist can handle a lot of different data errors and keep going, but extra delimeters throws it for a loop.
Scan your load file looking for extra pipe characters in a line.
Another option would be to re-export your data, picking a different delimiter.
Please try an alternate solution, by selecting the input format as Text and client encoding as ISO_8859_5 in the session and see if that will help you. In my case it worked.
I'm working on a project that uses the TransposeTupleToBag UDF of LinkedIn's datafu UDF compilation. Found here: https://github.com/linkedin/datafu/tree/master/src/java/datafu/pig/util. I execute the following commands in grunt shell:
REGISTER jar-file;
DEFINE Transpose datafu.pig.util.TransposeTupleToBag();
a = load data 'file' using PigStorage(',') as (schema);
b = foreach a generate select_columns_from_schema;
c = foreach b generate col1, col2, datafu.pig.util.Transpose(col3, col4...coln);
When I execute the last line, I get this error:
[main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: Instance name is null.
This should not happen unless UDFContextSignature was not set.
What am I doing wrong? How to avoid it? I have not changed any of their code as well. And I'm only using TransposeTupleToBag, FieldNotFound and AliasableEvalFunc as they were the classes required to run Transpose successfully. I even tried the same with all classes loaded and it still gave me the same error. What's going on? Please help. Thanks!
TransposeTupleToBag requires a feature in Pig 0.11 where setUDFContextSignature is called. This is used to distinguish each invocation of the UDF. This method doesn't exist in Pig 0.10.
Turns out, LinkedIn's datafu is tested on pig 0.11.1 and nothing else. I was running pig 0.10 and so it wouldn't work as there was some property that probably is not being set in pig 0.10, but perhaps was fixed in pig 0.11.1.