Add New attribute with default value in Rapidminer - data-mining

I am very new to this tool "Rapid miner".What i want to know is how to add a new attribute to one data set with default value in rapid-miner. I tried with using "Generate attributes" but how to set a default value to the new attribute. Do i have to use "Generate Empty Attribute" ??
There are two data set in my process and one of them have a additional attribute called "other".I want to get the union of both sets. do i have to use the append operator.
Thanks in advance.

The Generate Attributes operator is the right one to create new attributes. The value of the new attribute for each example can be generated from other attributes in the same example as well as from constant values (which is probably what you mean by default value) that you enter directly or from the values of macros. There are also functions that can be used.
The Join operator or possibly Union is likely to be the one you need to create a new example set with attributes from both inputs. The Append operator is used to add more examples whilst keeping the attributes the same.

Related

Weka GUI: add attribute is-missing-value

I have a couple of attributes with missing values.
This is a survey, so the fact that the person refused to answer is, by itself, useful information!
I would like to create a new attribute called is-missing-value = 1 if a given value in an attribute is a missing value and 0 otherwise.
Things I have tried:
I have tried using AddExpression, but this seems to only perform arithmetic operations such as 2*attribute.
I know that MathExpression allows using if-elses, such as ifelse(A < 3.0, 1, 0)... Do you guys know if/how I can test if a value is nan?
MakeIndicator (or NominalToBinary) should be able to do what I want, but I think I need (i) to convert my missing values to a nominal value, so that then (ii) I can convert this new nominal value to binary. The problem is that ReplaceMissingValue only works for mode or mean; I need to be able to define a new value. One solution could be to Edit the data directly, but I'd rather avoid this.
Please notice that I need to do this using the Weka GUI, not the Java interface.
I think I have a solution for you:
copy the attribute (if you want the original one to remain): apply the copy filter (this and the following filters are all under unsupervised/attribute folder) with the index of the attribute
Convert your attribute to nominal using the numericToNominal filter (set the attribute index)
Fill the missing values with a new value using ReplaceMissingWithUserConstant. Here you need to specify the nominalStringReplacementValue parameter (e.g. "missing") in addition to the index of your attribute.
Apply the NominalToBinary filter on your attribute. This will create several new attributes (as the number of unique values in the dataset + the missing value). You can remove the attributes you don't need and keep only the missing attribute.
Hope it helped.

Updating ElasticSearch mappings field type with existing data

I'm storing a few fields and for the sake of simplicity lets call the field in question 'age'. Initially ES created the index for me and it ended up choosing the wrong field type for 'age'. It's a string type right now instead of a numeric type. I'm aware that, I should have defined the mappings myself to begin with and force the data values been sent to be consistently all strings or numeric values.
What I've right now is an index with a ton of data that uses a 'string' type for age with following values: 1, 10, 'na', etc..
Now my question is: if I were to change the mapping from string to integer, would indexing have any issues with the existing data values such as 'na' when being updated ??
I just wanted to ask first before I start creating a playground environment to test with a sample data set.
What you can update according to the doc:
new properties can be added to Object datatype fields.
new multi-fields can be added to existing fields.
doc_values can be disabled, but not enabled.
the ignore_above parameter can be updated.
Otherwise I am afraid you will have to create a new mapping and reindex your data, see this post for example

How to create a new attribute with a default value in Rapid miner?

I am new to "Rapid miner" tool. There are two data set in my process. What I want to do is, generate a process which does the following:
To create this process should use Generate Attribute, Append and type conversion operators in RapidMiner
The first data set has a car name attribute, whereas the second data set has a name attribute. name should be renamed to car name.
The second data set has an additional other attribute which is not present in the first data set. Update the first data set to add an additional other attribute, with a default value of 1. This attribute should also have a type of Integer.
Append the modified second data set to the modified first data set
Export the new data to a new excel spreadsheet
I found the solution. Hope it will help for others
Please use below process flow
http://i.stack.imgur.com/omfDe.png

Set Mapping variable in Expression and use it in Source Filter

I have two tables in different databases. In a table A is the data, in the other table B are information for incremental load of the data from the first table. I want to load from table B and store the date of the last successful load from table A in a mapping variable $$LOAD_DATE. To achieve this, I read a date from table B and use the SETVARIABLE() function in a expression to set the $$LOAD_DATE variable. The port in which I do this is marked as output and writes into a dummy flat file. I only read on row of this source!
Then I use this $$LOAD_DATE variable in the Source Filter of the Source Qualifier of table A to only load new records which are younger than the date stored in the $$LOAD_DATE variable.
My problem is that I am not able to set the $$LOAD_DATE variable correctly. It is always the date 1753-1-1-00.00.00, which is the default value for mapping variables of the type date/time.
How do I solve this? How can I store a date in that variable and use it later in a Source Qualifiers source filter? Is it even possible?
EDIT: Table A has too much records to read them all and filter them later. This would be to expensive, so they have to be filtered at source filter level.
Yes, it's possible.
In the first map you have to initialize the variable, like this:
In first session configuration you have to define the Post-session on success variable assignment:
The second map (with your table A) will get the variable after this configuration of the session in Pre-session variable assignment:
It will work.
It is not possible to set a mapping variable and use it's value somewhere else in the same run, because, the variable is actually set when the session completes.
If you really want to implement it using mapping variables you have to create two mappings, one for setting the mapping variable and another for actual incremental load. You can pass a mapping variable value from one session to another in a workflow using a workflow variable. https://stackoverflow.com/a/26849639/2626813
Other solutions could be to use a lookup on B and a filter after that.
You can also write some scripts to query table B and modify the parameter file with the latest $LOAD_DATE value prior to executing the mapping.
Since we're having two different DBs, use two sessions. Get values in the first one and pass the parameters to the second one.

JasperReports: Passing in a list of lists as a datasource

I need to populate a few subreports with lists of different objects. Basically lets say i have the following:
Subreport on used Vehicles
Subreport on new Vehicles
I create a vehicle bean class with variables as strings and create getter and setter methods for the same. Then in my datasource I pass in a List<List<String>> as detailRows. detailRows contains a list for new vehicles and a list for used vehicles. So lets say, i pass detailRows in the data source.
Question is how do i pass these two lists to the two sub-reports? Can i use
new net.sf.jasperreports.engine.data.JRBeanCollectionDataSource($F{newVehiclesList}) as a datasource for sub report 1 and
new net.sf.jasperreports.engine.data.JRBeanCollectionDataSource($F{usedVehiclesList}) as datasource for sub report 2?
Is there anything else that needs to be done apart from what i mentioned? Do i need to create and pass any variables? Is the appropriate use of the list of lists as i have listed above or is it $F{detailRows}.get(0)?
I created a field detailRows in the main report as type list. I then pass the following to the subreport data source expression, new net.sf.jasperreports.engine.data.JRBeanCollectionDataSource($F{detailRows}
Is there any way i can pass the newVehiclesList from detailRows to the sub-report?
Thanks!
Selecting your SubReport you can set the property "Connection type" as "Use a data source expression" and inside the property "Data Source Expression" you set this:
new net.sf.jasperreports.engine.data.JRBeanCollectionDataSource($F{yourFieldHere})
Where your "yourFieldHere" is a list (don't forget to set the "Field Class" inside your field properties as a java.util.List as well)
Ok, then you need create two fields with the Field Class as java.util.List, one for each list (newVehiclesList and usedVehiclesList).
Put your two SubReports wherever you want and click on each one doing the following steps:
Change the "Connection type" to "Use a datasource expression" then change the "Data Source Expression" to new net.sf.jasperreports.engine.data.JRBeanCollectionDataSource($F{yourField})
Done.
ps: In order to use the fields inside your newVehiclesList and usedVehiclesList you have to create them inside of their own subReports.
i was the same problems with you and i solved it using the tag List of jasper, i used datasource in my class java, for example:
parameter.put("MyList", new JRBeanCollectionDataSource(ListObjects));
in JRXML
In palete of Jasper, choose the tag LIST and drag and drop in your relatory
after choose
create new dataset
create new dataset from a connection… ...
in data adapter choose new data adapter - collection of javabeans
use a JRDatasource expression
go in lis of parameters and choose you list op objects (MyList)
now go to outline of jasper and
- dataset properties
- edit and query filter ... ...
- javabean
- search you class (I using eclipse, so it's easy to search my class)
- add fields to use