Updating ElasticSearch mappings field type with existing data - amazon-web-services

I'm storing a few fields and for the sake of simplicity lets call the field in question 'age'. Initially ES created the index for me and it ended up choosing the wrong field type for 'age'. It's a string type right now instead of a numeric type. I'm aware that, I should have defined the mappings myself to begin with and force the data values been sent to be consistently all strings or numeric values.
What I've right now is an index with a ton of data that uses a 'string' type for age with following values: 1, 10, 'na', etc..
Now my question is: if I were to change the mapping from string to integer, would indexing have any issues with the existing data values such as 'na' when being updated ??
I just wanted to ask first before I start creating a playground environment to test with a sample data set.

What you can update according to the doc:
new properties can be added to Object datatype fields.
new multi-fields can be added to existing fields.
doc_values can be disabled, but not enabled.
the ignore_above parameter can be updated.
Otherwise I am afraid you will have to create a new mapping and reindex your data, see this post for example

Related

set `exclude_from_indexes` for Array datatype

I'm storing a list of strings using Array datatype in Datastore(e.g. ["name1", "name2", ...]). As the list grows, I find myself unable to upsert the entry.
INVALID_ARGUMENT: Too many indexed properties
According to https://cloud.google.com/datastore/docs/concepts/entities#array, even if I set the property to be exclude_from_indexes, it gets ignored. The datastore web UI also doesn't have an Index checkmark for me to uncheck.
So the only option I came up with is to convert the Array into a String type and parse to a JSON Object every time I read from DB, and write back stringified.
Was wondering if this is the right approach or if there are better ways to do this I'm not aware of.
Thanks
You should set the exclude_from_indexes on each value of the array. That is what "For a property to be unindexed, the exclude_from_indexes field of Value must be set to true." means.

Weka GUI: add attribute is-missing-value

I have a couple of attributes with missing values.
This is a survey, so the fact that the person refused to answer is, by itself, useful information!
I would like to create a new attribute called is-missing-value = 1 if a given value in an attribute is a missing value and 0 otherwise.
Things I have tried:
I have tried using AddExpression, but this seems to only perform arithmetic operations such as 2*attribute.
I know that MathExpression allows using if-elses, such as ifelse(A < 3.0, 1, 0)... Do you guys know if/how I can test if a value is nan?
MakeIndicator (or NominalToBinary) should be able to do what I want, but I think I need (i) to convert my missing values to a nominal value, so that then (ii) I can convert this new nominal value to binary. The problem is that ReplaceMissingValue only works for mode or mean; I need to be able to define a new value. One solution could be to Edit the data directly, but I'd rather avoid this.
Please notice that I need to do this using the Weka GUI, not the Java interface.
I think I have a solution for you:
copy the attribute (if you want the original one to remain): apply the copy filter (this and the following filters are all under unsupervised/attribute folder) with the index of the attribute
Convert your attribute to nominal using the numericToNominal filter (set the attribute index)
Fill the missing values with a new value using ReplaceMissingWithUserConstant. Here you need to specify the nominalStringReplacementValue parameter (e.g. "missing") in addition to the index of your attribute.
Apply the NominalToBinary filter on your attribute. This will create several new attributes (as the number of unique values in the dataset + the missing value). You can remove the attributes you don't need and keep only the missing attribute.
Hope it helped.

How to create a new attribute with a default value in Rapid miner?

I am new to "Rapid miner" tool. There are two data set in my process. What I want to do is, generate a process which does the following:
To create this process should use Generate Attribute, Append and type conversion operators in RapidMiner
The first data set has a car name attribute, whereas the second data set has a name attribute. name should be renamed to car name.
The second data set has an additional other attribute which is not present in the first data set. Update the first data set to add an additional other attribute, with a default value of 1. This attribute should also have a type of Integer.
Append the modified second data set to the modified first data set
Export the new data to a new excel spreadsheet
I found the solution. Hope it will help for others
Please use below process flow
http://i.stack.imgur.com/omfDe.png

Convert String attributes to numeric values in WEKA

I am new to weka.. My data contains a column of student name. I want to convert these names to numeric values, over the whole column.
Eg: Suppose there are 10 names abcd ,cdef,xyz ,etc. I want to pre process the data so that corresponding to each name there is distinct numeric value, like abcd changes to 1 ,cdef changes to 2 ,etc.
Also two or more rows can have same name. So in this case, same name should have same value.
Please help me...
Weka supports 4 non-relational attribute types: nominal, numeric, string and date. You can find out more about them in Weka Manual (it can be found in the same folder were you downloaded Weka), chapter "The ARFF Header Section".
You should find out what is the type of the "student's name" attribute (probably string, but could be nominal), and decide what should be the type of the attribute with converted values (numeric, nominal, or string).
There can be 2 scenarios:
(1) If types of the existing and desired attributes are the same (string-string or nominal-nominal, i.e. you only want to change values, not attribute type), you could do so
(a) manually - open the data file in Weka Explorer, and click Edit... button, or
(b) write a small program using Weka's Attribute class functions value and setValue.
(2) Types are different - Weka attribute types cannot be converted, so you will have to create and insert a new attribute with the converted values, and delete the old attribute. An example of how to create a new attribute can be found at
http://weka.wikispaces.com/Programmatic+Use#Step.
As far as I understand, strictly converting names into a "numeric" type doesn't seem like the best approach, within the context of WEKA - WEKA will treat numeric attributes differently than it does "string" or "nominal" attributes (for example, for running certain "attribute selection" algorithms, you can not use "numeric" types - they need to be "discretized" or converted into nominal form).
So, for your case, I think you can convert your "string" names into just "nominal" type using the StringToNominal class (this class acts as a WEKA "filter" to help convert a given "string" attribute into an attribute of type "nominal"). This will also take care about the repeating names - the list of "nominal" values for the names (that will be generated after you apply this filter) will contain any given name (that appears any number of times) only one time.
"Nominal" attributes also have the advantage that implicitly, they do have a numeric representation (the index of the value within the set of values; similar to how the "enums" in Java have a numeric index). So, you can utilize that as the "numeric" information corresponding to the names (though as I said earlier, it's probably best to just use it as "nominal" attribute; really depends on your particular use case).
I had the same problem as the one mentioned in the question, and I could "address" it in the following way.
I first applied the StringToNominal filter as mentioned before (don't forget to change the attribute range (from "last" to "first-last")). Once done that, I saved the dataset in LibSVM format, which changes the nominal values to numeric ones.
Then, if you close Weka and open it again, you will have the same dataset with the same number of features but they will be numeric. Now some changes should be done, first of all, normalizing all the numeric values in the dataset, using the Normalize filter. After that, apply the NumericToNominal filter to the last attribute.
Then, you will have a similar dataset with numeric values.
Hope this helps.

SharePoint UserData and the ;# Syntax in returned data

Can a SharePoint expert explain to me the ;# in data returned by the GetListItems() call to the Lists web service?
I think I understand what they are doing here. The ;# is almost like a syntax for making a comment... or better yet, including the actual data (string) and not just the ID. This way you can use either, but they are nicely paired together in the same column.
Am I way off base? I just can't figure out the slighly different use. For example
I have a list with:
ows_Author
658;#Tyndall, Bruno
*in this case the 658 seems to be an ID for me in a users table somewhere*
ows_CreatedDate (note: a custom field. not ows_Created)
571;#2009-08-31 23:41:58
*in this case the 571 seems to be an ID of the row I'm already in. Why the repetition?*
Can anyone out there shed some light on this aspect of SharePoint?
The string ;# is used as a delimiter by SharePoint's lookup fields, including user fields. When working with the object model, you can use SPFieldLookupValue and SPFieldUserValue to convert the delimited string into a strongly-typed object. When working with the web services, however, I believe you'll need to parse the string yourself.
You are correct that the first part is an integer ID: ID in the site user list, or ID of the corresponding item in the lookup list. The second part is the user name or value of the lookup column.
Nicolas correctly notes that this delimiter is also used for other composite field values, including...
SPFieldLookupValueCollection
SPFieldMultiColumnValue
SPFieldMultiChoiceValue
SPFieldUserValueCollection
The SPFieldUser inherits from the SPFieldLookup which uses the ;# notation. You can easily parse the value by creating a new instance of the SPFieldLookupValue class:
string rawValue = "1;#value";
SPFieldLookupValue lookupValue = new SPFieldLookupValue(rawValue);
string value = lookupValue.LookupValue; // returns value