Extracting full Attribute Name from Weka PCA - weka

I am currently in the process of writing some code to analyse the mushrooms data off UCI using Weka. I am trying to get the values (i.e. coefficients) of the attributes, but the attribute name is truncated (indicated by the "..."), and am unable to get the full set of coefficients from the attributes.
e.g.
#attribute -0.251a=e+0.242m=k+0.241n=k-0.224t=p+0.213f=f... numeric
Any help would be greatly appreciated.

I believe your attribute names are being truncated because of an option in the PCA filter.
-A
Maximum number of attributes to include in
transformed attribute names.
(-1 = include all, default: 5)
Using the following code I change the value of this option to -1 and print an attribute name from the transformed data.
Instances originalTrain=...//load the training data
PrincipalComponents pca = new PrincipalComponents(); // new PCA filter
pca.setMaximumAttributeNames(-1); //set the value to -1
pca.setInputFormat(originalTrain);// inform filter about dataset
Instances newData = Filter.useFilter(originalTrain, pca); // apply filter
System.out.println(newData.attribute(0).name()); //look at new name
An example of the obviously untruncated attribute name is (scroll to view):
0.257stalksurfacebelowring=k+0.256stalksurfaceabovering=k+0.234ringtype=l+0.231odor=f-0.215ringtype=p-0.212stalksurfaceabovering=s+0.206sporeprintcolor=h-0.195stalksurfacebelowring=s+0.185bruises+0.18 stalkroot=b-0.176stalkcolorbelowring=w-0.175stalkcolorabovering=w-0.173odor=n-0.139sporeprintcolor=n-0.134sporeprintcolor=k+0.133habitat=p+0.133gillcolor=b+0.13 stalkcolorbelowring=b+0.13 stalkcolorabovering=b+0.129population=v+0.128stalkcolorabovering=n-0.125population=s-0.124stalkroot=e+0.121stalkcolorbelowring=n-0.119capcolor=w+0.119stalkcolorbelowring=p+0.119stalkcolorabovering=p-0.11gillspacing-0.105stalkroot=c-0.101gillcolor=n+0.094sporeprintcolor=w-0.087capshape=b-0.085gillcolor=k-0.082odor=l-0.082odor=a-0.082habitat=m+0.08 capcolor=y-0.08gillcolor=w+0.078gillcolor=h-0.076population=n-0.073habitat=g-0.072gillsize+0.068odor=y+0.068odor=s-0.067population=a-0.065capsurface=s-0.064odor=p+0.063gillcolor=g-0.059stalksurfaceabovering=f+0.057capsurface=y-0.057ringnumber=t-0.057stalksurfacebelowring=f+0.055ringnumber=o+0.051population=y-0.05habitat=u-0.048stalkcolorabovering=o-0.048stalkcolorbelowring=o+0.047veilcolor=w-0.046population=c+0.046capshape=k+0.046ringtype=e-0.046gillattachment-0.045stalkcolorabovering=g-0.045stalkcolorbelowring=g+0.043capcolor=e-0.041stalkroot=r-0.039gillcolor=u+0.039capcolor=g+0.034habitat=l-0.034veilcolor=n-0.034veilcolor=o-0.033habitat=w-0.031capcolor=p-0.031odor=c-0.031stalksurfacebelowring=y-0.031sporeprintcolor=r+0.03 capshape=f-0.029capcolor=n-0.028gillcolor=o-0.024stalkshape-0.024sporeprintcolor=o-0.024sporeprintcolor=y-0.024sporeprintcolor=b-0.024gillcolor=y-0.023gillcolor=e-0.023capcolor=b-0.023stalkcolorabovering=e-0.023stalkcolorbelowring=e-0.019gillcolor=r-0.018capshape=s-0.018sporeprintcolor=u-0.015capshape=x+0.012habitat=d+0.009gillcolor=p-0.006capsurface=g+0.005capsurface=f-0.004capshape=c+0.003stalkcolorbelowring=y-0.003stalkcolorabovering=y-0.003veilcolor=y+0.001stalksurfaceabovering=y+0.001capcolor=u+0.001capcolor=r-0.001capcolor=c+0 stalkcolorabovering=c+0 odor=m+0 ringtype=n+0 stalkcolorbelowring=c+0 ringnumber=n+0 ringtype=f

Related

how to make pivot table using Xlwings?

Would you let me know how to make the pivot table in excel using xlwings?
please give me the sample code
Thanks for your help
I have tried to find how to make it but I couldn't find it
Creating a PivotTable with xlwings is not currently straightforward, and requires the use of the .api to access the VBA functions.
An example to create a PivotTable using my mock data in a Table called Table1 of 3 columns of data with headers: "Colour", "Type", "Data".
# set the kwarg values for creating PivotTable
source_type = xw.constants.PivotTableSourceType.xlDatabase
source_data = wb.sheets["Sheet1"]["Table1[#All]"].api # cannot be of type Range or String
table_destination = wb.sheets["Sheet1"]["A20"].api # cannot be of type Range or String
table_name = "PTable1"
# create PivotTable
wb.api.PivotCaches().Create(SourceType=source_type,
SourceData=source_data).CreatePivotTable(
TableDestination=table_destination,
TableName=table_name)
pt = ws.api.PivotTables(table_name)
# Set Row Field (Rows) as Colour column of table
pt.PivotFields("Colour").Orientation = xw.constants.PivotFieldOrientation.xlRowField
# Set Column Field (Columns) as Type column of table
pt.PivotFields("Type").Orientation = xw.constants.PivotFieldOrientation.xlColumnField
# Set Data Field (Values) as Data
pt.AddDataField(pt.PivotFields("Data"),
# with name as "Sum of Data"
"Sum of Data",
# and calculation type as sum
xw.constants.ConsolidationFunction.xlSum)
Additional Detail
For reason of xlDatabase source_type, the VBA documentation can be found here
The parameters can be seen in the VBA documentation here. And this tutorial provides a detailed explanation of these, along with how to change this for different scenarios (dynamic range, in a new workbook, etc.); the guide gives a couple of additional changes to the formatting of the values, such as position of fields and number format.
The options for type of data calculation can be found here.

How to replace the value in same colomn - power bi query editor

I got bellow query table, now i need one more column that would show all the latest value of unit id which is the true value of the Custom and currentUnit column
data table image
expected result would be in the image
expected result image
Any help please!
Copy currentUnit to New
Replace false with null
Transform - Fill up
Replace value with null where custom = True
Provide sample data in copyable format if you need more detailed instructions

Adding label in AutoML for text classification

I am trying to create a text dataset in a Pipeline for a text classification but I believe I am doing it the wrong way or at least I don't get it. The csv passing only contains two columns message and label which is true or false.
Inside my pipeline I am creating dataset like this which I am not very sure how dataset is recognizing that column label is the independent variable.
dataset = gcp_aip.TextDatasetCreateOp(
project = project # my project id,
display_name = display_name # reference name,
gcs_source = src_uris # path to my data in gcs,
import_schema_uri = aiplatform.schema.dataset.ioformat.text.single_label_classification,
)
once created the dataset, i do training like this within the Pipeline
# training
model = gcp_aip.AutoMLTextTrainingJobRunOp(
project = project,
display_name = display_name,
prediction_type = "classification",
multi_label = False,
dataset = dataset.outputs["dataset"],
)
Not sure if creation and training is doing correctly since I never specified that label is my label column and needs to use message as a feature.
In vertex ai the dataset created look like this
But in my training section the results from the AutML, looks like this, dont know why, label with 0% is there, which makes me doubt about the insertion of the data
In preparation of CSV file, you don't need to specify which column is the feature and the label. Vertex AI's AutoML automatically reads the first column as the feature and the second column as the label. You may refer to this documentation for more details in preparation of CSV data.
Below is sample CSV file, all values under first column(column A) are detected to be the feature and all values under second column(column B) are the labels.
You might need to check your CSV file and search for the word "label" on your second column and replace it with either "True" or "False" since based on your given data, you are only trying to have 2 labels which are "True" and "False". In addition, if you find the word "label" on your 2nd column and it doesn't have a value on its first column, then you just need to just remove the word "label".
In your provided screenshot here, there is a 1 count for the word "label", which means there is a "label" value existing on the 2nd column of your CSV data.

Converting nominal attribute to numeric value using Weka

Suppose nominal attribute is Outlook which contains three values Sunny , Overcast and Rainy. I want to convert this values of outlook attribute in numeric form i.e. 1,2,3 (order can be change). I saw one filter nominaltobinary in weka but this will create three columns. I don't want to create separate column for each value. How I can do this using Weka.
In the ARFF, if you are using it, you can have a comment which specifies what the values of the "Outlook" attribute are.
For example, you ARFF can contain this comment at the top -
%% Numeric values for the "Outlook" Attribute
%% Sunny = 1
%% Overcast = 2
%% Rainy = 3
%% Windy = 4
Then you can define the attribute as -
#attribute Outlook {1,2,3,4}
I dont think there is a way to do this in the UI. But you can use a text editor to edit the ARFF itself.
For this you can use "RenameNominalValues" filter under unsupervised ---> attributes.
Then under "selectedAttribute" type the attribute and
under "valueReplacements" type as Sunny:1,Overcast:2,Rainy:3,Windy:4

weka- replace null value in a nominal attribute with a string

I am cleaning a data set with google open refine and then trying to use it in Weka to do some cluster analysis. I am dealing with a nominal column that stores range of salaries.
I've specified the attribute as below
#ATTRIBUTE Income {'0-30000','30000-50000','50000-75000','75000-150000','>150000'}
In the data set there are rows in which the 'Income' column is null and I suppose that is the reason why I get the error:
'nominal value not declared in header, read Token line 13'
Is there a way I can replace null values with a string( and then specify the string in the attribute)? - If so how do i specify it in the #ATRRIBUTE row?
Or would it be possible to include the null in the set of attributes?
Thanks