how to classify the whole data set in weka

how to classify the whole data set in weka - weka

I've got a supervised data set with 6836 instances, and I need to know the predictions of my model for all the instances, not only for a test set.
I followed the approach train-test (2/3-1/3) to know about my rates TPR and FPR, and I've got the predictions about my test (1/3), but I need to know the predcitions about all the 6836 instances.
How can I do it?
Thanks!

In the classify tab in Weka Explorer there should be a button that says 'More options...' if you go in there you should be able to output predictions as plain text. If you use cross validation rather than a percentage split you will get predictions for all instances in a table like this:
+-------+--------+-----------+-------+------------+
| inst# | actual | predicted | error | prediction |
+-------+--------+-----------+-------+------------+
| 1 | 2:no | 1:yes | + | 0.926 |
| 2 | 1:yes | 1:yes | | 0.825 |
| 1 | 2:no | 1:yes | + | 0.636 |
| 2 | 1:yes | 1:yes | | 0.808 |
| ... | ... | ... | ... | ... |
+-------+--------+-----------+-------+------------+

If you don't want to do cross validation you also can create a data set containing all your data (training + test) and add it as test data. Then you can go to more options and show the results as Campino already answered.

Related

DynamoDB with daily/weekly/monthly aggregated values

My application is creating a log file every 10min, which I want to store in DynamoDB in an aggregated way, e.g. 144 log files per day, 1008 log files per week or ~4400 log files per month.
I have different partition keys, but for sake of simplicity I have used only a single partition key in the following examples.
The straight forward solution would be to have different tables, e.g.
Table "TenMinLogsDay":
id (=part.key) | date (=sort key) | cntTenMinLogs | data
-------------- | ---------------- | ------------- | -------------------------------
1 | 2017-04-30 | 144 | some serialized aggregated data
1 | 2017-05-01 | 144 | some serialized aggregated data
1 | 2017-05-02 | 144 | some serialized aggregated data
1 | 2017-05-03 | 144 | some serialized aggregated data
Table "TenMinLogsWeek":
id (=part.key) | date (=sort key) | cntTenMinLogs | data
-------------- | ---------------- | ------------- | -------------------------------
1 | 2017-05-01 | 1008 | some serialized aggregated data
1 | 2017-05-08 | 1008 | some serialized aggregated data
1 | 2017-05-15 | 1008 | some serialized aggregated data
Table "TenMinLogsMonth":
id (=part.key) | date (=sort key) | cntTenMinLogs | data
-------------- | ---------------- | ------------- | -------------------------------
1 | 2017-05-01 | 4464 | some serialized aggregated data
1 | 2017-06-01 | 4320 | some serialized aggregated data
1 | 2017-07-01 | 4464 | some serialized aggregated data
I would prefer however a combined table. Out of the box DynamoDB does not seem to support this.
Also, I want to query either the daily OR the weekly OR the monthly aggregated items, thus I don't want to use the filter feature for this.
The following solution would be possible, but seems like a poor hack:
Table "TenMinLogsCombined":
id (=part.key) | date (=sort key) | week (=LSI sort key) | month (=LSI sort key) | cntTenMinLogs | data
-------------- | ---------------- | -------------------- | --------------------- | ------------- | -----
1 | 2017-04-30 | (empty) | (empty) | 144 | ...
1 | 2017-05-01 | (empty) | (empty) | 144 | ...
1 | 0017-05-01 | 2017-05-01 | (empty) | 1008 | ...
1 | 1017-05-01 | (empty) | 2017-05-01 | 4464 | ...
1 | 2017-05-02 | (empty) | (empty) | 144 | ...
1 | 2017-05-03 | (empty) | (empty) | 144 | ...
Explanation:
By using the year "0017" and "1017" instead of "2017" I can query the date range for, e.g. 2017-05-01 to 2017-05-04 and DynamoDB won't read the items starting with 0017 or 1017
For week or month range queries, such a hack is not required, as empty LSI sort keys are possible.
Does anybody know of a better way to achieve this?

unique column value per row

I find it best to use an example, so here we go:
Say I have a table with chores and a table with a weekly schedule like this:
CHORES:
|----+---------------+----------+-------|
| id | name | type | hours |
|----+---------------+----------+-------|
| 1 | clean kitchen | cleaning | 4 |
|----+---------------+----------+-------|
| 2 | clean toilet | cleaning | 3 |
etc
SCHEDULE:
|------+---------------+---------------+-----|
| week | monday | tuesday | etc |
|------+---------------+---------------+-----|
| 1 | clean kitchen | clean toilet | etc |
|------+---------------+---------------+-----|
| 2 | clean toilet | clean kitchen | etc |
etc
I want to make sure that for one week, you can't have duplicate cells, so this wouldn't be allowed:
SCHEDULE:
|------+---------------+--------------+-----|
| week | monday | tuesday | etc |
|------+---------------+--------------+-----|
| 1 | clean toilet | clean toilet | etc |
etc
What would I have to do in my models.py to get this behaviour?

Try django unique-together in model meta option.
https://docs.djangoproject.com/en/1.11/ref/models/options/#unique-together

I'd better user ManyToMany through another table like that:
SCHEDULE:
------+------------------------+
| week | chores |
|------+------------------------+
| 1 | many to many to chores |
|------+------------------------+
| 2 | many to many to chores |
And trough table like that
THROUGH TABLE:
|---------+---------------+---------------+
| week_id | day of week | chores_id |
|---------+---------------+---------------+
| 1 | Monday | clean toilet |
|---------+---------------+---------------+
| 1 | Tuesday | clean kitchen |
And in that table make unique together for week_id and chores_id

How to run raw query with a model with dynamic fields in Django 1.9?

I have a complex result that requires writing raw sql queries.
See https://stackoverflow.com/a/38548462/80353
The expected result is a table showing several columns.
The first column header is simply Product and the other column headers are store names.
The values are simply the product names and the aggregated sales values of the product in these stores.
Which stores will be shown is entirely dynamic. Maximum should be 9 stores.
The same in text format:
Store table
------------------------------
| id | code | address |
|-----|------|---------------|
| 1 | S1 | Kings Row |
| 2 | S2 | Queens Street |
| 3 | S3 | Jacks Place |
| 4 | S4 | Diamonds Alley|
| 5 | S5 | Hearts Road |
------------------------------
Product table
------------------------------
| id | code | name |
|-----|------|---------------|
| 1 | P1 | Saucer 12 |
| 2 | P2 | Plate 15 |
| 3 | P3 | Saucer 13 |
| 4 | P4 | Saucer 14 |
| 5 | P5 | Plate 16 |
| and many more .... |
|1000 |P1000 | Bowl 25 |
|----------------------------|
Sales table
----------------------------------------
| id | product_id | store_id | amount |
|-----|------------|----------|--------|
| 1 | 1 | 1 |7.05 |
| 2 | 1 | 2 |9.00 |
| 3 | 2 | 3 |1.00 |
| 4 | 2 | 3 |1.00 |
| 5 | 2 | 5 |1.00 |
| and many more .... |
| 1000| 20 | 4 |1.00 |
|--------------------------------------|
The relationships are:
Sales belongs to Store
Sales belongs to Product
Store has many Sales
Product has many Sales
What I want to achieve
I want to display by pagination in the following manner:
Given the stores S1-S3:
-------------------------
| product | S1 | S2 | S3 |
|---------|----|----|----|
|Saucer 12|7.05|9 | 0 |
|Plate 15 |0 |0 | 2 |
| and many more .... |
|------------------------|
For more details of the schema, check the question in How to get back aggregate values across 2 dimensions using Python Cubes?
My question
The schema is not super important to my question which is:
Since I am going to write a complex raw query, is there a way to map the query result to a model where the fields are dynamic?
I found documentation about how to execute raw queries in Django and how to execute raw queries to existing models with fixed fields and matching table.
My question is is it possible to do that for a model that has no matching table and dynamic fields?
If so, how?
Or if I choose to use materialised view in postgresql, how do I match it with a model class?

Rescale Dataset using Power BI

I'm trying to rescale a dataset in using PowerBI Desktop. I've imported a dataset full of raw data, but I can't use row context together with an aggregate. I'm trying to accomplish this:
Data:
+---------+-----+
| Name | Bar |
+---------+-----+
| Alfred | 0 |
| Alfred | -1 |
| Alfred | 1 |
| Burt | 1 |
| Burt | 0 |
| Charlie | 1 |
| Charlie | 1 |
| Charlie | 0 |
+---------+-----+
Calculations:
Foo: = SUM(Bar) / COUNT(Bar) GROUP BY Name
Which would Generate this dataset:
+---------+-----+
| Name | Foo |
+---------+-----+
| Alfred | 0 |
| Burt | .5 |
| Charlie | .67 |
+---------+-----+
Final Calculation:
Score: = (#Foo - MIN(Foo)) / (MAX(Foo)-MIN(Foo))
The goal is to grade on a curve with a set of data. I can do it in excel, but was hoping that Power BI could handle all the heavy lifting.
At this point it might be easier to do it all in SQL before bringing it into PowerBI, but that would make it significantly less dynamic (with date filters and the like). Thanks for any insight you might have!

I think you're looking for the GROUPBY DAX function. https://support.office.com/en-us/article/GROUPBY-Function-DAX-d6d064b2-fd8b-4c1b-97f8-c6d03cdf8ad0
You then would GROUPBY on the Name field and proceed from there. If need to use the measure outside of a visual that groups by each Name (like show me the average score after applying the curve), then you'll need to wrap that in a calculate table where you include the names, your measure projected as a column, and then do your aggregates (min/max/average) over that calculated table.

Omnigraffle templates: Avoid scaling of some parts of grouped shapes

I would like to create my own templates to mockup my web applications. Now imagine I had a Panel shape. It consist of a title and a body devided by a line. When I scale the template in its height later on I would not like to see the title bar scaled.
First of all, is that possible? And if so, how?
My panel should look something like that:
-------------------
| Title | -> should not change its height on scaling
-------------------
| Some Text in |
| here.. |
| |
| |
| | |
| ˇ |
| Only the body |
| should scale... |
| |
-------------------
I am using Omnigraffle 5.4.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

how to classify the whole data set in weka - weka

If you don't want to do cross validation you also can create a data set containing all your data (training + test) and add it as test data. Then you can go to more options and show the results as Campino already answered.

Related

DynamoDB with daily/weekly/monthly aggregated values

unique column value per row

How to run raw query with a model with dynamic fields in Django 1.9?

Rescale Dataset using Power BI

Omnigraffle templates: Avoid scaling of some parts of grouped shapes

Categories

Resources