In Rapidminer once I import a data set how do I change the type of a column? - data-mining

I've imported a datset into Rapidminer 5 and one of the columns that was supposed to be nominal or polynomial was set as a numeric. My data set has over 500 attributes so I don't really want to have to reimport my data every time I realize I've made a mistake like this. Is there some way to either automate the import process so that it saves the column types I set each time or can I go back and edit my already imported data set attribute types?

add this operator to your process, after you load the data:
Data Transformation > Type Conversion > Numerical to Polynomial
on the operator, select
attribute type filter = single
attribute = [name of your attribute]

here you go: http://i.stack.imgur.com/ov5yn.png
Select "Numerical to Polynomial"
Then change "attribute filter type" to 'subset' Then select attributes that you want to change.
One more suggestion, you better store this output in your local repository so you dont need the conversion everytime you need the data. So, you will have both original and duplicate in your basket. :)
Happy Data Mining...

apply the 'set role function':
It's listed under operators -> data tranformations -> Name and role modification -> Set role

Related

ColumnsofType Record not returning any columns

I've got a table full of different data types, including records, that I want to extract all column names of records to then use in an expand function. I've included a screenshot of a column containing record's however, when I use this = Table.ColumnsOfType(#"Expanded fields", {type record}), it returns an empty list .
I've tried looking through the entire column to see if there was anything different but its all record types. Any help please.
EDIT:
Error using Table.TransformColumnTypes
Record is not a valid type to search for. And judging by your image, your type is Type.Any as denoted by the ABC123
You best bet is to unpivot all the columns (perhaps those starting with a certain prefix) then on the new Value column, expand like so
#"PriorStepNameHere" = .... ,
ExpandList= List.Distinct(List.Combine(List.Transform(Table.Column(#"PriorStepNameHere", "Value"), each if _ is record then Record.FieldNames(_) else {}))),
Expand= Table.ExpandRecordColumn(#"PriorStepNameHere", "Value", ExpandList,ExpandList)
It sounds like the Table.ColumnsOfType function is not properly identifying the columns in your table that contain records.One possible reason for this is that the column's datatype is not properly set as 'record'. Another possible reason could be that the data in the columns is not structured properly and hence it is not being identified as a record. You can try to use the Table.TransformColumnTypes function to convert the column's datatype to 'record' and see if that resolves the issue.
If the issue still persists, please share the sample data and the code you are using.

Updating ElasticSearch mappings field type with existing data

I'm storing a few fields and for the sake of simplicity lets call the field in question 'age'. Initially ES created the index for me and it ended up choosing the wrong field type for 'age'. It's a string type right now instead of a numeric type. I'm aware that, I should have defined the mappings myself to begin with and force the data values been sent to be consistently all strings or numeric values.
What I've right now is an index with a ton of data that uses a 'string' type for age with following values: 1, 10, 'na', etc..
Now my question is: if I were to change the mapping from string to integer, would indexing have any issues with the existing data values such as 'na' when being updated ??
I just wanted to ask first before I start creating a playground environment to test with a sample data set.
What you can update according to the doc:
new properties can be added to Object datatype fields.
new multi-fields can be added to existing fields.
doc_values can be disabled, but not enabled.
the ignore_above parameter can be updated.
Otherwise I am afraid you will have to create a new mapping and reindex your data, see this post for example

How to create a new attribute with a default value in Rapid miner?

I am new to "Rapid miner" tool. There are two data set in my process. What I want to do is, generate a process which does the following:
To create this process should use Generate Attribute, Append and type conversion operators in RapidMiner
The first data set has a car name attribute, whereas the second data set has a name attribute. name should be renamed to car name.
The second data set has an additional other attribute which is not present in the first data set. Update the first data set to add an additional other attribute, with a default value of 1. This attribute should also have a type of Integer.
Append the modified second data set to the modified first data set
Export the new data to a new excel spreadsheet
I found the solution. Hope it will help for others
Please use below process flow
http://i.stack.imgur.com/omfDe.png

Set Mapping variable in Expression and use it in Source Filter

I have two tables in different databases. In a table A is the data, in the other table B are information for incremental load of the data from the first table. I want to load from table B and store the date of the last successful load from table A in a mapping variable $$LOAD_DATE. To achieve this, I read a date from table B and use the SETVARIABLE() function in a expression to set the $$LOAD_DATE variable. The port in which I do this is marked as output and writes into a dummy flat file. I only read on row of this source!
Then I use this $$LOAD_DATE variable in the Source Filter of the Source Qualifier of table A to only load new records which are younger than the date stored in the $$LOAD_DATE variable.
My problem is that I am not able to set the $$LOAD_DATE variable correctly. It is always the date 1753-1-1-00.00.00, which is the default value for mapping variables of the type date/time.
How do I solve this? How can I store a date in that variable and use it later in a Source Qualifiers source filter? Is it even possible?
EDIT: Table A has too much records to read them all and filter them later. This would be to expensive, so they have to be filtered at source filter level.
Yes, it's possible.
In the first map you have to initialize the variable, like this:
In first session configuration you have to define the Post-session on success variable assignment:
The second map (with your table A) will get the variable after this configuration of the session in Pre-session variable assignment:
It will work.
It is not possible to set a mapping variable and use it's value somewhere else in the same run, because, the variable is actually set when the session completes.
If you really want to implement it using mapping variables you have to create two mappings, one for setting the mapping variable and another for actual incremental load. You can pass a mapping variable value from one session to another in a workflow using a workflow variable. https://stackoverflow.com/a/26849639/2626813
Other solutions could be to use a lookup on B and a filter after that.
You can also write some scripts to query table B and modify the parameter file with the latest $LOAD_DATE value prior to executing the mapping.
Since we're having two different DBs, use two sessions. Get values in the first one and pass the parameters to the second one.

How can i query to get the multiple values in SimpleDB (AWS)

jpg
In that Picture i have colored one part. i have attribute called "deviceModel". It contains more than one value.. i want to take using query from my domain which ItemName() contains deviceModel attribute values more than one value.
Thanks,
Senthil Raja
There is no direct approach to get what you are asking.. You need to manipulate by writing your own piece of code. By running SELECT query you will get the item Attribute-value pair. So here you need to traverse each each itemName() and count values of your desire attribute.
I think what you are refering to is called MultiValued Attributes. When you put a value in the attribute - if you don't replace the existing attribute value the values will multiply, giving you an array of items connected to the value of that attribute name.
How you create them will depend on the sdk/language you are using for your REST calls, however look for the Replace=true/false when you set the attribute's value.
Here is the documentation page on retrieving them: http://docs.amazonwebservices.com/AmazonSimpleDB/latest/DeveloperGuide/ (look under Using Amazon SimpleDB -> Using Select to Create Amazon SimpleDB Queries -> Queries on Attributes with Multiple Values)