I have a dataset with fields targeted and opens and I need to add calculated field opens per targeted which essentially means doing simple devision of those 2 values.
My calculated field is as follows
{opens}/{targeted}
but then displaying simple table with values they are completely incorrect
If I try any other operator like + * etc calculations are correct.
I'm completely out of ideas on how to debug this. I've simplified the dataset to just columns of targeted and opens, can't get any simpler.
Had the same problem, I fixed it by wrapping the columns with the sum() function. Like this:
sum({opens})/sum({targeted})
I think you need to make AWS understand that you are working with float numbers.
1.0*{opens}/{targeted}
if still not working try also
(1.0*{opens})/({targeted}*1.0)
it should give you the desired output (not tested, let me know if it doesnt work)
One of my source table(oracle) date column having the value 5/3/2013 6:00:51.134000000 AM. I am trying to pull the same into to my target(oracle), but my target converted the micro seconds as "zeros" and loading the value 5/3/2013 06.00.51.000000000 AM. Both my source & target column has declared as timestamp. I have set the date format like MM/DD/YYYY HH24:MI:SS.US in session properties
Can anyone help to me to get the date with micro seconds? I am using informatica 10.2.0 Thx
You can try the workaround suggested at the below link to process high precision dates. You will need to modify the source and target definition field lengths to (29,9).
Link
The way to resolve this is to increase the precision of the source or target definition to precision 29 and scale 9 after the source/target is imported into Informatica. This will handle the digits in milliseconds without converting them to all zeros.
I am new on weka. I have a dataset in csv with 5000 samples. here 20 samples of it; when I upload this dataset into weka, it looks ok, but when I run knn algorithm it gives a result that is not supposed to give. here is the sample data.
a,b,c,d
74,85,123,1
73,84,122,1
72,83,121,1
70,81,119,1
70,81,119,1
69,80,118,1
70,81,119,1
70,81,119,1
76,87,125,1
76,87,125,1
82,92,146,2
74,86,140,2
68,80,134,2
64,76,130,2
64,75,132,2
83,96,152,2
72,85,141,2
71,83,141,2
69,81,139,2
65,79,137,2
here is the result :
=== Cross-validation ===
=== Summary ===
Correlation coefficient 0.6148
Mean absolute error 0.2442
Root mean squared error 0.4004
Relative absolute error 50.2313 %
Root relative squared error 81.2078 %
Total Number of Instances 5000
it is supposed to give this kind of result like:
Correctly classified instances: 69 92%
Incorrectly classified instances: 6 8%
What should be the problem? What am I missing? I did this in all other algorithms but they all give the same output. I have used sample weka datasets, they all work as expected.
The IBk algorithm can be used for regression (predicting the value of a numeric response for each instance) as well as for classification (predicting which class each instance belongs to).
It looks like all the values of the class attribute in your dataset (column d in your CSV) are numbers. When you load this data into Weka, Weka therefore guesses that this attribute should be treated as a numeric one, not a nominal one. You can tell this has happened because the histogram in the Preprocess tab looks something like this:
instead of like this (coloured by class):
The result you're seeing when you run IBk is the result of a regression fit (predicting a numeric value of column d for each instance) instead of a classification (selecting the most likely nominal value of column d for each instance).
To get the result you want, you need to tell Weka to treat this attribute as nominal. When you load the csv file in the Preprocess tab, check Invoke options dialog in the file dialog window. Then when you click Open, you'll get this window:
The field nominalAttributes is where you can give Weka a list of which attributes are nominal ones even if they look numeric. Entering 4 here will specify that the fourth attribute (column) in the input is a nominal attribute. Now IBk should behave as you expect.
You could also do this by applying the NumericToNominal unsupervised attribute filter to the already loaded data, again specifying attribute 4 otherwise the filter will apply to all the attributes.
The ARFF format used for the Weka sample datasets includes a specification of which attributes are which type. After you've imported (or filtered) your dataset as above, you can save it as ARFF and you'll then be able to reload it without having to go through the same process.
I am using KNIME in order to activate a WEKA node AttributeSelectedClassifier .
But i keep getting this exception claiming that my attribute is nominal and has duplicate values.
But, it is numeric and it is very expected to have duplicate values in the dataset!
AttributeSelectedClassifier - How to deal with error "A nominal attribute (likes) cannot have duplicate labels ('(0.045455-0.045455]')"
I found similar topics to this one but none of them is covering how to chose the scalar to scale values with
1st Question: so i will be happy if someone can explain why is this behavior? I mean why duplicate values is bad?!
Anyway,
One of the threads of a similar topic recommended to scale the values by a large enough number (a scalar)!
Based on that I multiplied values with 10^6 and got error about this value: 27027.027027-27027.027027
I multiplied by 10^7 and then got an error about this value: 270270.27027-270270.27027
when i multiplied by 10^8 it succeeded.
2nd Question: what is the right way to deal with this? and how can i, programatically, chose the scalar to scale with ?
The full error:
ERROR AttributeSelectedClassifier - Execute failed: IllegalArgumentException in Weka during training. Please verify your settings. A nominal attribute (Meanlikes) cannot have duplicate labels ('(0.045455-0.045455]').
I am running the optimisation of two sets of data against each other and am after some assistance as to looking up settings of the run based on the calculated results. I'll explain....
I run 2 data lines against each other (think graph lines) - Line A and Line B. These lines have crossing points - upward and downward based on the direction of each line.e.g. Line A is going up and Line B is going down is an 'Upwards cross' and Line A going down and Line B going up is a 'Downward cross'.The program calculates financial analysis.
I analyze the crossing points and gain a resultant 'Rank' from the analysis based on a set of rules. The rank is a single integer.
Line A has a number of settings for the optimisation run e.g. Window 1 from a value of 10 to 20 and window 2 at a value of 30 to 40. Line B also has settings.
When I run the optimisation I iterate through the parameters available for each line and calculate the rank. The result of the optimisation run is a list of the ranks which is the size of the number of permutations avaliable.
So my question is this:
What is the best way to look up the line settings from the calculated rank using a position (index) in the rank list. The optimisation settings used to create the run will be stored for that rank run and can be used for the look-up.
I also will be adding additional parameters in the future to the system for the line so I want the program to take into account additional future line settings without affecting any rank files created previous to adding the new parameter.
In addition to that I want to be able to find out an index based on a particular setting included in the optimisation run (the reverse look-up of the previous method).
I want to avoid versioning for backward compatability if at all possible so that the lookup algorithm will be self-sufficient.
Is a hash table suitable for this purpose or do you have any implementation techniques that would fit better? Do you have any examples of this type of operation in action in C++?
Thanks,
Chris.
If I understand correctly, you have a bunch of associated data (settings + rank), on which you would like to be able to perform lookups with different key types. If so, then Boost.MultiIndex sounds like what you're looking for.