How to create a new attribute with a default value in Rapid miner? - data-mining

I am new to "Rapid miner" tool. There are two data set in my process. What I want to do is, generate a process which does the following:
To create this process should use Generate Attribute, Append and type conversion operators in RapidMiner
The first data set has a car name attribute, whereas the second data set has a name attribute. name should be renamed to car name.
The second data set has an additional other attribute which is not present in the first data set. Update the first data set to add an additional other attribute, with a default value of 1. This attribute should also have a type of Integer.
Append the modified second data set to the modified first data set
Export the new data to a new excel spreadsheet

I found the solution. Hope it will help for others
Please use below process flow
http://i.stack.imgur.com/omfDe.png

Related

Updating ElasticSearch mappings field type with existing data

I'm storing a few fields and for the sake of simplicity lets call the field in question 'age'. Initially ES created the index for me and it ended up choosing the wrong field type for 'age'. It's a string type right now instead of a numeric type. I'm aware that, I should have defined the mappings myself to begin with and force the data values been sent to be consistently all strings or numeric values.
What I've right now is an index with a ton of data that uses a 'string' type for age with following values: 1, 10, 'na', etc..
Now my question is: if I were to change the mapping from string to integer, would indexing have any issues with the existing data values such as 'na' when being updated ??
I just wanted to ask first before I start creating a playground environment to test with a sample data set.
What you can update according to the doc:
new properties can be added to Object datatype fields.
new multi-fields can be added to existing fields.
doc_values can be disabled, but not enabled.
the ignore_above parameter can be updated.
Otherwise I am afraid you will have to create a new mapping and reindex your data, see this post for example

Set Mapping variable in Expression and use it in Source Filter

I have two tables in different databases. In a table A is the data, in the other table B are information for incremental load of the data from the first table. I want to load from table B and store the date of the last successful load from table A in a mapping variable $$LOAD_DATE. To achieve this, I read a date from table B and use the SETVARIABLE() function in a expression to set the $$LOAD_DATE variable. The port in which I do this is marked as output and writes into a dummy flat file. I only read on row of this source!
Then I use this $$LOAD_DATE variable in the Source Filter of the Source Qualifier of table A to only load new records which are younger than the date stored in the $$LOAD_DATE variable.
My problem is that I am not able to set the $$LOAD_DATE variable correctly. It is always the date 1753-1-1-00.00.00, which is the default value for mapping variables of the type date/time.
How do I solve this? How can I store a date in that variable and use it later in a Source Qualifiers source filter? Is it even possible?
EDIT: Table A has too much records to read them all and filter them later. This would be to expensive, so they have to be filtered at source filter level.
Yes, it's possible.
In the first map you have to initialize the variable, like this:
In first session configuration you have to define the Post-session on success variable assignment:
The second map (with your table A) will get the variable after this configuration of the session in Pre-session variable assignment:
It will work.
It is not possible to set a mapping variable and use it's value somewhere else in the same run, because, the variable is actually set when the session completes.
If you really want to implement it using mapping variables you have to create two mappings, one for setting the mapping variable and another for actual incremental load. You can pass a mapping variable value from one session to another in a workflow using a workflow variable. https://stackoverflow.com/a/26849639/2626813
Other solutions could be to use a lookup on B and a filter after that.
You can also write some scripts to query table B and modify the parameter file with the latest $LOAD_DATE value prior to executing the mapping.
Since we're having two different DBs, use two sessions. Get values in the first one and pass the parameters to the second one.

How do I rename a sharePoint file to include a date using Nintex

I'm trying to use a 2 workflows to archive any files when created or updated. The first simply moves a copy to a separate doc library. no issues
The second should rename the file once it arrives to append a date (and possible timestamp) to the end of the file so that it is a unique record.
I am trying to set a variable called Archive_Name and then setting the field value to the Archive_Name before commiting the change.
I am using this fomula to set the variable
Name-fn-FormatDate(Current Date,yyyy-MM-dd)
Both Name and Current Date are recognised variable.
When I run this the Name stays the same and does not append a date. If I run it as
fn-FormatDate(Current Date,yyyy-MM-dd)
the Name changes to my desired date proving that the formula is working, the text is being assigned to the Archive_Date variable and the variable is being applied to the field value.
What am I doing wrong?
I believe you need to concatenate the two variables. The & operator can be used in place of the CONCATENATE function. Thus; Name&fn-FormatDate(Current Date,yyyy-MM-dd)
Hope this helps

Add New attribute with default value in Rapidminer

I am very new to this tool "Rapid miner".What i want to know is how to add a new attribute to one data set with default value in rapid-miner. I tried with using "Generate attributes" but how to set a default value to the new attribute. Do i have to use "Generate Empty Attribute" ??
There are two data set in my process and one of them have a additional attribute called "other".I want to get the union of both sets. do i have to use the append operator.
Thanks in advance.
The Generate Attributes operator is the right one to create new attributes. The value of the new attribute for each example can be generated from other attributes in the same example as well as from constant values (which is probably what you mean by default value) that you enter directly or from the values of macros. There are also functions that can be used.
The Join operator or possibly Union is likely to be the one you need to create a new example set with attributes from both inputs. The Append operator is used to add more examples whilst keeping the attributes the same.

In Rapidminer once I import a data set how do I change the type of a column?

I've imported a datset into Rapidminer 5 and one of the columns that was supposed to be nominal or polynomial was set as a numeric. My data set has over 500 attributes so I don't really want to have to reimport my data every time I realize I've made a mistake like this. Is there some way to either automate the import process so that it saves the column types I set each time or can I go back and edit my already imported data set attribute types?
add this operator to your process, after you load the data:
Data Transformation > Type Conversion > Numerical to Polynomial
on the operator, select
attribute type filter = single
attribute = [name of your attribute]
here you go: http://i.stack.imgur.com/ov5yn.png
Select "Numerical to Polynomial"
Then change "attribute filter type" to 'subset' Then select attributes that you want to change.
One more suggestion, you better store this output in your local repository so you dont need the conversion everytime you need the data. So, you will have both original and duplicate in your basket. :)
Happy Data Mining...
apply the 'set role function':
It's listed under operators -> data tranformations -> Name and role modification -> Set role