I am new to weka.. My data contains a column of student name. I want to convert these names to numeric values, over the whole column.
Eg: Suppose there are 10 names abcd ,cdef,xyz ,etc. I want to pre process the data so that corresponding to each name there is distinct numeric value, like abcd changes to 1 ,cdef changes to 2 ,etc.
Also two or more rows can have same name. So in this case, same name should have same value.
Please help me...
Weka supports 4 non-relational attribute types: nominal, numeric, string and date. You can find out more about them in Weka Manual (it can be found in the same folder were you downloaded Weka), chapter "The ARFF Header Section".
You should find out what is the type of the "student's name" attribute (probably string, but could be nominal), and decide what should be the type of the attribute with converted values (numeric, nominal, or string).
There can be 2 scenarios:
(1) If types of the existing and desired attributes are the same (string-string or nominal-nominal, i.e. you only want to change values, not attribute type), you could do so
(a) manually - open the data file in Weka Explorer, and click Edit... button, or
(b) write a small program using Weka's Attribute class functions value and setValue.
(2) Types are different - Weka attribute types cannot be converted, so you will have to create and insert a new attribute with the converted values, and delete the old attribute. An example of how to create a new attribute can be found at
http://weka.wikispaces.com/Programmatic+Use#Step.
As far as I understand, strictly converting names into a "numeric" type doesn't seem like the best approach, within the context of WEKA - WEKA will treat numeric attributes differently than it does "string" or "nominal" attributes (for example, for running certain "attribute selection" algorithms, you can not use "numeric" types - they need to be "discretized" or converted into nominal form).
So, for your case, I think you can convert your "string" names into just "nominal" type using the StringToNominal class (this class acts as a WEKA "filter" to help convert a given "string" attribute into an attribute of type "nominal"). This will also take care about the repeating names - the list of "nominal" values for the names (that will be generated after you apply this filter) will contain any given name (that appears any number of times) only one time.
"Nominal" attributes also have the advantage that implicitly, they do have a numeric representation (the index of the value within the set of values; similar to how the "enums" in Java have a numeric index). So, you can utilize that as the "numeric" information corresponding to the names (though as I said earlier, it's probably best to just use it as "nominal" attribute; really depends on your particular use case).
I had the same problem as the one mentioned in the question, and I could "address" it in the following way.
I first applied the StringToNominal filter as mentioned before (don't forget to change the attribute range (from "last" to "first-last")). Once done that, I saved the dataset in LibSVM format, which changes the nominal values to numeric ones.
Then, if you close Weka and open it again, you will have the same dataset with the same number of features but they will be numeric. Now some changes should be done, first of all, normalizing all the numeric values in the dataset, using the Normalize filter. After that, apply the NumericToNominal filter to the last attribute.
Then, you will have a similar dataset with numeric values.
Hope this helps.
Related
I've got a table full of different data types, including records, that I want to extract all column names of records to then use in an expand function. I've included a screenshot of a column containing record's however, when I use this = Table.ColumnsOfType(#"Expanded fields", {type record}), it returns an empty list .
I've tried looking through the entire column to see if there was anything different but its all record types. Any help please.
EDIT:
Error using Table.TransformColumnTypes
Record is not a valid type to search for. And judging by your image, your type is Type.Any as denoted by the ABC123
You best bet is to unpivot all the columns (perhaps those starting with a certain prefix) then on the new Value column, expand like so
#"PriorStepNameHere" = .... ,
ExpandList= List.Distinct(List.Combine(List.Transform(Table.Column(#"PriorStepNameHere", "Value"), each if _ is record then Record.FieldNames(_) else {}))),
Expand= Table.ExpandRecordColumn(#"PriorStepNameHere", "Value", ExpandList,ExpandList)
It sounds like the Table.ColumnsOfType function is not properly identifying the columns in your table that contain records.One possible reason for this is that the column's datatype is not properly set as 'record'. Another possible reason could be that the data in the columns is not structured properly and hence it is not being identified as a record. You can try to use the Table.TransformColumnTypes function to convert the column's datatype to 'record' and see if that resolves the issue.
If the issue still persists, please share the sample data and the code you are using.
I have a column with dates (in a string format) in Dataprep: yyyymmdd. I would like it to become a datetime object. Which function/transformation should I apply to achieve this result automatically?
In this case, you actually don't need to apply a transformation at all—you can just change column type to Date/Time and select the appropriate format options.
Note: This is one of the least intuitive parts of Dataprep as you have to select an incorrect format (in this case yy-mm-dd) before you can drill-down to the correct format (yyyymmdd).
Here's a screenshot of the Date / Time type window to illustrate this:
While it's unintuitive, this will correctly treat the column as a date in future operations, including assigning the correct type in export operations (e.g. BigQuery).
Through the UI, this will generate the following Wrangle Script:
settype col: YourDateCol customType: 'Datetime','yy-mm-dd','yyyymmdd' type: custom
According to the documentation, this should also work (and is more succinct):
settype col: YourDateCol type: 'Datetime','yy-mm-dd','yyyymmdd'
Note that if you absolutely needed to do this in a function context, you could extract the date parts using SUBSTRING/LEFT/RIGHT and pass them to the DATE or DATETIME function to construct a datetime object. As you've probably already found, DATEFORMAT will return NULL if the source column isn't already of type Datetime.
(From a performance standpoint though, it would probably be far more efficient for a large dataset to either just change the the or create a new column with the correct type versus having to perform those extra operations on so many rows.)
I have a couple of attributes with missing values.
This is a survey, so the fact that the person refused to answer is, by itself, useful information!
I would like to create a new attribute called is-missing-value = 1 if a given value in an attribute is a missing value and 0 otherwise.
Things I have tried:
I have tried using AddExpression, but this seems to only perform arithmetic operations such as 2*attribute.
I know that MathExpression allows using if-elses, such as ifelse(A < 3.0, 1, 0)... Do you guys know if/how I can test if a value is nan?
MakeIndicator (or NominalToBinary) should be able to do what I want, but I think I need (i) to convert my missing values to a nominal value, so that then (ii) I can convert this new nominal value to binary. The problem is that ReplaceMissingValue only works for mode or mean; I need to be able to define a new value. One solution could be to Edit the data directly, but I'd rather avoid this.
Please notice that I need to do this using the Weka GUI, not the Java interface.
I think I have a solution for you:
copy the attribute (if you want the original one to remain): apply the copy filter (this and the following filters are all under unsupervised/attribute folder) with the index of the attribute
Convert your attribute to nominal using the numericToNominal filter (set the attribute index)
Fill the missing values with a new value using ReplaceMissingWithUserConstant. Here you need to specify the nominalStringReplacementValue parameter (e.g. "missing") in addition to the index of your attribute.
Apply the NominalToBinary filter on your attribute. This will create several new attributes (as the number of unique values in the dataset + the missing value). You can remove the attributes you don't need and keep only the missing attribute.
Hope it helped.
I'm storing a few fields and for the sake of simplicity lets call the field in question 'age'. Initially ES created the index for me and it ended up choosing the wrong field type for 'age'. It's a string type right now instead of a numeric type. I'm aware that, I should have defined the mappings myself to begin with and force the data values been sent to be consistently all strings or numeric values.
What I've right now is an index with a ton of data that uses a 'string' type for age with following values: 1, 10, 'na', etc..
Now my question is: if I were to change the mapping from string to integer, would indexing have any issues with the existing data values such as 'na' when being updated ??
I just wanted to ask first before I start creating a playground environment to test with a sample data set.
What you can update according to the doc:
new properties can be added to Object datatype fields.
new multi-fields can be added to existing fields.
doc_values can be disabled, but not enabled.
the ignore_above parameter can be updated.
Otherwise I am afraid you will have to create a new mapping and reindex your data, see this post for example
I am trying to display field titles above the appropriate columns in a name value list in Sitecore.
e.g. So instead of this
The name value list would look like this
Is there an easy method of achieving this apart from writing a custom control?
There is no out of the box support for applying a label to the values in a name value list, as #jammykam mentioned.
Since what you are storing would not typically be handled as key/value data, the name value list type might not be the best fit for what you are doing - think what you would have to do if you needed to add extra information e.g. title. I would suggest creating a simple template for 'person details' and then add 'people' items as sub-items of your existing item.
Seems like you want to give the authors a hint regarding the input fields and the best way to do that is using the "Short Description" field in "Help" section of the template under Standard Values. You can possibly enter something like as a hint.
A less optimal option would be to set up standard values for that field so the authors always have a value that suggests the type of input value for key and value.