WEKA: What does the number after the '/' represent in these leaves? - weka

"0(607.0/60.0)"
"1(149.0/14.0)"
I know that 607 and 149 represent the total number of examples covered by each leaf.
I want to know what the numbers "60" and "14" after the '/' represent?

The second number is the number (weight) of those instances that are misclassified.

The first number is the total number of instances (weight of instances) reaching the leaf. The second number is the number (weight) of those instances that are misclassified.
https://weka.wikispaces.com/What+do+those+numbers+mean+in+a+J48+tree%3F

For sample dataset
Decision tree result:
physician-fee-freeze = n: democrat (253.41/3.75).
First number indicated the number of correct things that reach that node. ( in this democrats) and the second number after “/” shows number of incorrect things that reach that node ( in this case republicans)
Total number of instances:
435 Total number of no (also integral number of correct things): 253
Probability of having no:
253/435 = 0.58
Total number of missing data:
11 Total number of times where it is coming with “no”: 8 Probability:
8/11 = 0.72
Total probability that missing data could be no:
0.58 X 0.72 = 0.42
Total number of correct things:
253+0.42 = 253.42 ~ 253.41
The number after the “/”shows number of incorrect things that reach that node. Now if you see this data it has five incorrect instances where “republican” is the result while “physician fee freeze” is “n” (or “?”)
Those five can be split as following: Total number incorrect instances with “n” : 2 Total number incorrect instances with “?”: 3
Similar formula:
2+(253/435)*3=3.75

Related

Dax Query- To Calculate MAXID and Sum of Amount

I have below sample data:
IncidentID TransactionID Recordeddate Name Amount
10 1 13/01/2023 Recovery 1000
10 2 13/01/2023 Reserve 2000
10 3 13/01/2023 Reserve 3000
10 4 14/01/2023 Reserve 4000
11 5 14/01/2023 Recovery 3000
11 6 14/01/2023 Payment 4000
12 7 14/01/2023 Reserve 5000
13 8 14/01/2023 Reserve 4000
13 9 14/01/2023 Payment 2000
I need to calculate sum of amount for each incident id, for each date only Max Transaction ID and Name should be Reserve only.
I tried doing it be below expression:
Table2 = summarize(Table,Table[IncidentID],Table[RecordedDate],Table[Name],"MAXID",
calculate(max(TransactionID),AllExcept(IncidentID,RecordDate,Name)))
this table is giving me required filtered table, but i am not able to add amount column.
If i add amount column grouping is not working, and i need to get amount included so that i can calculate sum of amount.
Expected- Not able to add amount column in this, as adding amount it is not giving required results.

Determine Maximum Profit Algorithm C++

Consider the following problem:
The Searcy Wood Shop has a backlog of orders for its world famous rocking chair (1 chair per order). The total time required to make a chair is 1 week. However, since the chairs are sold in different regions and various markets, the amount of profit for each order may differ. In addition, there is a deadline associated with each order. The company will only earn a profit if they meet the deadline; otherwise, the profit is 0.
Write a program that will determine an optimal schedule for the orders that will maximize profit.
The first line in a test case will contain an integer, n (0 ≤ n ≤ 1000), that represents the number of orders that are pending. A value of 0 for n indicates the end of the input file.
The next n lines contain 3 positive integers each. The first integer, i, is an order number. All order numbers for a given test case are unique. The second integer represents the number of weeks from now until the deadline for order number i. The third integer represents the amount of profit that the company will earn if the deadline is met for order number i.
Example input:
7
1 3 40
2 1 35
3 1 30
4 3 25
5 1 20
6 3 15
7 2 10
4
3054 2 30
4099 1 35
3059 2 25
2098 1 40
0
Ouput:
100
70
The output will be the optimal sum of the input of the test case.
The problem that I am having is that I am struggling to come up with an algorithm that consistently finds this optimal sum.
My first idea was that I could simply go through each input week by week and choose the chair with the highest profit for said week. This didn't work though in the case that a week has two chairs that both have a higher profit than the week prior.
My next idea was that I could order the list in order from highest to lowest profit. Then I would go through the list from the highest profit and compare the current entry to the next entry and choose the entry with the lower week.
None of these are consistently working. Can anyone help me?
I would first sort the list by second column (number of weeks before the deadline) in increasing order and then sort the third column (profit) in decreasing order.
For example, in your file:
2098 1 40
2 1 35
4099 1 35
3 1 30
5 1 20
3054 2 30
3059 2 25
7 2 10
1 3 40
4 3 25
6 3 15
Among the same number of week orders, I will peak the highest profit to execute. If deadline is 1 week - top highest order; 2 weeks - 2 top highest orders, 3 weeks - 3 top highest orders and so on.
Firstly you'll have to think which orders are eligible to be completed on the 'ith' day, that would be all the orders with deadline greater than or equal to i. So just iterate all the orders in decreasing order of their deadline.
Lets say the last deadline week is 'x' then push all the profit values of week 'x' in a priority queue. The max value from the pushed values would be your optimal profit for week 'x'. Now remove the selected profit from the priority queue and add it to your answer. The remaining values are still eligible to be used in the previous weeks and now add the profit values with deadline 'x-1' to the priority queue and take the max out of them and repeat until deadline week becomes 0.

T 103 - Negative Marking

Raju is giving his JEE Main exam. The exam has Q questions and Raju needs S marks to pass. Giving the correct answer to a question awards the student with 4 marks whereas giving the incorrect answer to a question awards the student with negative 3 (-3) marks. If a student chooses to not answer a question at all, he is awarded 0 marks.
Write a program to calculate the minimum accuracy that Raju will need in order to pass the exam.
Input
Input consists of multiple test cases.
Each test case consists of two integers Q and S
Output
Print the minimum accuracy upto 2 decimal places
Print -1 if it is impossible to pass the exam
Sample Input 0
2
10 40
10 33
Sample Output 0
100.00
90.00
Think of this as a simultaneous equation problem.
4x - 3y = S
x + y = Q
For the second scenario, your equations will be :
4x - 3y = 33
x + y = 10
After solving 'x' will be equal to the minimum number of questions he has to solve correctly. Calculate what percentage of 'Q' is 'x'.
That's the concept, think how you would approach it programatically :)

formula for integer less than 15

In one field I want to accept numbers that could be decimal figure for weight but it should not be over 15. Previously I had the following regex code:
[1-9]\d*(\.\d+)?$
This is to be entered in Google Forms. In other words, all these numbers are OK:
0.05
1.5
2
3.56
But these are not ok:
2 kg
0
15.1
16
This should work for values 0 to 15
^((1[0-5])|([1-9]))?(\.\d*)?$

Best way to sort two results from string

I've got the results back from a function stored as a string:
TF00, 24 percent complete
TF01, 100 percent complete
TF02, 0 percent complete
TF03, 5 percent complete
but I need to sort it (reverse numerically) by the second item, so it looks like this:
TF01, 100 percent complete
TF00, 24 percent complete
TF03, 5 percent complete
TF02, 0 percent complete
What's the most Pythonic way of doing this?
Assume s is the str, then:
print '\n'.join(sorted(s.split('\n'), key=lambda x: int(x.split()[1])))