I am using Stata's new collection tool to create a table (Stata 17.0). I have completed the walkthrough https://blog.stata.com/2021/09/02/customizable-tables-in-stata-17-part-6-tables-for-multiple-regression-models/ and was able to get a table with p-value, sd, and CI.
I created it using this code
collect _r_b _r_p _r_se _r_ci, ///
name(model3) ///
tag(model[(2)]) ///
: stcox i.tertile age_at_baseline initial_dose female_sex
It is a Cox hazard model, and I would like to add "Number of failures"
After running the above code, I can type collect label list result, all and see it included in 'level labels' as 'N_fail'. However, when I add it to my code it will not work.
collect create model3
collect _r_b _r_p _r_se _r_ci N_fail, /// THIS TELLS STATA WHAT VARS YOU WANT
name(model3) ///
tag(model[(1)]) /// THIS STORES THE OUTPUT INTO A NEW DIMENSION
: stcox i.tertile
The error code
N_fail not found
r(111);
Related
I would like to visualise event occurrence changes in time.
Use case:
Let's say my logs contains 2 types of events (eventA, eventB).
I'm interested in a line graph that shows the number of events per hours. (line#1: dataA1, dataA2... ; line#2: dataB1, dataB2...)
What I'm aware of:
Query the logs: fields #timestamp, eventName | stats count() by bin(1h), eventName | sort bin(1h) asc
The above query gives all the data for creating the desired graph (eg: [bin(1h)], [count()], [eventName])
If I remove the eventName field form display I get a log-table with the correct data, but the line graph is showing datapoints mixed (eg: dataA1, dataA2, dataB3, dataA4, dataB5)
The question:
Is it possible to generate a line graph with more series in it?
If yes, what parametrization do I need?
See https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/CWL_Insights-Visualizing-Log-Data.html
Visualizing time series data
Time series visualizations work for queries with the following characteristics:
The query contains one or more aggregation functions. For more information, see Aggregation Functions in the Stats Command.
The query uses the bin() function to group the data by one field.
These queries can produce line charts, stacked area charts, bar charts, and pie charts.
You can't use line chart for your example because you can only use single bin() grouping to produce time series. You can however use e.g. pie chart for your use case.
Alternatively if applicable to your use case, you can start producing logs in different format as
{
"eventA": 1,
"eventB": 0
}
Then you can write query as
stats sum(eventA), sum(eventB) by bin(1h)
I am new on weka. I have a dataset in csv with 5000 samples. here 20 samples of it; when I upload this dataset into weka, it looks ok, but when I run knn algorithm it gives a result that is not supposed to give. here is the sample data.
a,b,c,d
74,85,123,1
73,84,122,1
72,83,121,1
70,81,119,1
70,81,119,1
69,80,118,1
70,81,119,1
70,81,119,1
76,87,125,1
76,87,125,1
82,92,146,2
74,86,140,2
68,80,134,2
64,76,130,2
64,75,132,2
83,96,152,2
72,85,141,2
71,83,141,2
69,81,139,2
65,79,137,2
here is the result :
=== Cross-validation ===
=== Summary ===
Correlation coefficient 0.6148
Mean absolute error 0.2442
Root mean squared error 0.4004
Relative absolute error 50.2313 %
Root relative squared error 81.2078 %
Total Number of Instances 5000
it is supposed to give this kind of result like:
Correctly classified instances: 69 92%
Incorrectly classified instances: 6 8%
What should be the problem? What am I missing? I did this in all other algorithms but they all give the same output. I have used sample weka datasets, they all work as expected.
The IBk algorithm can be used for regression (predicting the value of a numeric response for each instance) as well as for classification (predicting which class each instance belongs to).
It looks like all the values of the class attribute in your dataset (column d in your CSV) are numbers. When you load this data into Weka, Weka therefore guesses that this attribute should be treated as a numeric one, not a nominal one. You can tell this has happened because the histogram in the Preprocess tab looks something like this:
instead of like this (coloured by class):
The result you're seeing when you run IBk is the result of a regression fit (predicting a numeric value of column d for each instance) instead of a classification (selecting the most likely nominal value of column d for each instance).
To get the result you want, you need to tell Weka to treat this attribute as nominal. When you load the csv file in the Preprocess tab, check Invoke options dialog in the file dialog window. Then when you click Open, you'll get this window:
The field nominalAttributes is where you can give Weka a list of which attributes are nominal ones even if they look numeric. Entering 4 here will specify that the fourth attribute (column) in the input is a nominal attribute. Now IBk should behave as you expect.
You could also do this by applying the NumericToNominal unsupervised attribute filter to the already loaded data, again specifying attribute 4 otherwise the filter will apply to all the attributes.
The ARFF format used for the Weka sample datasets includes a specification of which attributes are which type. After you've imported (or filtered) your dataset as above, you can save it as ARFF and you'll then be able to reload it without having to go through the same process.
I am classifying iris data using DECISION TREE (C4.5), RANDOM FOREST and NAIVE BAYES. I am using the dataset downloaded from iris-train and iris-test. When I train the all networks everything is fine with proper results with 'classifier output', 'Detailed accuracy with class' and 'confusion matrix'. But, when I select the iris-test data in the Weka-explorer-classify-test options and select the iris-test file and in 'more options' select 'output prediction' as 'csv' and click start, I am getting the result as shown in the figure below. The 'classifier output' is showing the classified samples correctly, but, 'Detailed accuracy with class' and 'confusion matrix' is with all values zeros. Any suggestion where I am going wrong in selecting any option. Thank you.
The confusion matrix shows you how well your trained classifier performs by comparing the actual class of the instances in the test set with the class that was predicted by the classifier. But you are supplying a test set with no class information, so there's nothing to compare against. This is why you see
Total Number of Instances 0
Ignored Class Unknown Instances 120
in the output in your screenshot.
Typically you would first evaluate the performance of your classifier using cross-validation, or a test set that has class information. Then you can use the trained classifier to classify unknown data, for example using the Re-evaluate model on current test set right-click option as described in the help.
I use WEKA for Text classification , I have trained data set , and I apply StringToWOrdVector and NumericToNominal filters , and have test data set and applied the same filters on it .
When I try to apply my model on test data ,it gave me the following error
Train and test set are not compatible
I searched for a solution , the error occurred because number of attributes different between two sets, and it always be different because texts in two sets are different
How I can solve this error please ?
The best thing you can do is combine your training and test set into one file and then apply the filter to it all in one go, then split them up again and copy the #attribute values from the combined file into both the training and test files. This way the attributes will be consistent across both files.
I'm trying to classify some web posts using weka and naive bayes classifier.
First I manually classified many posts (about 100 negative and 100 positive) and I created an .arff file with this form:
#relation classtest
#attribute 'post' string
#attribute 'class' {positive,negative}
#data
'RT #burnreporter: Google has now indexed over 30 trillion URLs. Wow. #LeWeb',positive
'A special one for me Soundcloud at #LeWeb ',positive
'RT #dianaurban: Lost Internet for 1/2 hour at a conference called #LeWeb. Ironic, yes?',negative
.
.
.
Then I open Weka Explorer loading that file and applying the StringToWordVector filter to split the posts in single word attributes.
Then, after doing the same with my dataset, selecting (in classify tab of weka) naive bayes classifier and choosing select test set, it returns Train and test set are not compatible. What can I do? Thanks!
Probably the ordering of the attributes is different in train and test sets.
You can use batch filtering as described in http://weka.wikispaces.com/Batch+filtering
I used batch filter but still have problem. Here is what I did:
java -cp /usr/share/java/weka.jar weka.filters.unsupervised.attribute.NumericToNominal -R last -b -i trainData.arff -o trainDataProcessed.csv.arff -r testData.arff -s testDataProcessed.csv.arff
I then get the error below:
Input file formats differ.
Later.I figured out two ways to make the trained model working on supplied test set.
Method 1.
Use knowledge flow. For example something like below: CSVLoader(for train set) -> classAssigner -> TrainingSetMaker -->(classifier of your choice) -> ClassfierPerformanceEvaluator - TextViewer. CSVLoader(for test set) -> classAssigner -> TestgSetMaker -->(the same classifier instance above) -> PredictionAppender -> CSVSaver. Then load the data from the CSVLoader or arffLoder for the training set. The model will be trained. After that load data from the loader for the test set. It will evaluate the model(classifier, for example) on the supplied test set and you can see the result from the textviewer (connected to the ClassifierPerformanceEvaluator) and get the saved result from the CSVSaver or arffSaver connected to the PredictionAppender.An additional column, the "classfied as" will be added to the output file. In my case, I used "?" for the class column in the supplied test set if the class labels are not available.
Method 2.
Combine the Training and Test set into one file. Then the exact same filter can be applied to both training and test set. Then you can separate training set and test set by applying instance filter. Since I use "?" as class label in the test set. It is not visible in the instance filter indices. Hence just select those indices that you can see in the attribute values to be removed when apply the instance filter. You will get the test data left only. Save it and load it in supply test set at the classifier page.This time it will work. I guess it is the class attribute that causes the NOT compatible train and test set issue. As many classfier requires nominal class attribute. The value of which is converted to the index to available values of the class attribute according to http://weka.wikispaces.com/Why+do+I+get+the+error+message+%27training+and+test+set+are+not+compatible%27%3F