Top user of day, week, all time - best way to implement? - django

Suppose each user on my site has a score which increases as they use the site (like Stackoverflow) and I have a field score stored in the user profile table for each user.
Getting the top 10 users of all time is easy, just order by the score column.
I want to have "top 10 today", "top 10 this week", "top 10 of all time".
What's the best way to implement this? Do I need to store every single score change with a timestamp?

You would have to have a table that stored the increments and use a timestamp. I.E.
CREATE TABLE ScoreIncreases (
PrimaryKey UNIQUEIDENTIFIER,
UserId UNIQUEIDENTIFIER,
ScoreIncrease INT,
CreatedDate DATETIME)
Your query would then be something like
SELECT TOP 1 u.PrimaryKey, SUM(ScoreIncrease)
FROM Users u
INNER JOIN ScoreIncreases si ON si.Userid=u.PrimaryKey
WHERE DATEDIFF(day,si.CreatedDate,GETDATE()) = 0
GROUP BY u.PrimaryKey
ORDER BY SUM(ScoreIncrease) DESC

Related

Tableau: percentage of total for specific dimension / COUNTD IF statement

I have an employee dataset and I'm trying to display the proportion of BAME staff only as a KPI. My dataset has duplicate lines (staff who have more than 1 role) and so I have to use a distinct count of their staff IDs (i.e. cannot use measure values or SUM) as I want to measure ethnicity per person not per role. The only way I can get the % of BAME across the population is by having all other categories visible however I am looking for a way to display the % of BAME across total only. For example in a table calc BAME = 12%, Non-BAME=85% and Unknown=3% - I want to keep the 12% only to put on my dashboard.
I've not needed to do this in Tableau before and I'm quite new to the software. Is there a way to do this? I thought that perhaps I would need to do an IF statement similar to Excel but the below syntax returns an error.
IF [Ethnic Group]="BAME" THEN COUNTD([Staff ID]) / COUNTD([Staff ID]) END
Thank you so much in advance!
COUNTD(IF [Ethnic Group]="BAME" THEN [Staff ID] END)/COUNTD([Staff ID])

Power BI Dashboard where the core filter condition is a disjunction on numeric fields

We are trying to implement a dashboard that displays various tables, metrics and a map where the dataset is a list of customers. The primary filter condition is the disjunction of two numeric fields. We want to the user to be able to select a threshold for [field 1] and a separate threshold for [field 2] and then impose the condition [field 1] >= <threshold> OR [field 2] >= <threshold>.
After that, we want to also allow various other interactive slicers so the user can restrict the data further, e.g. by country or account manager.
Power BI naturally imposes AND between all filters and doesn't have a neat way to specify OR. Can you suggest a way to define a calculation using the two numeric fields that is then applied as a filter within the same interactive dashboard screen? Alternatively, is there a way to first prompt the user for the two threshold values before the dashboard is displayed -- so when they click Submit on that parameter-setting screen they are then taken to the main dashboard screen with the disjunction already applied?
Added in response to a comment:
The data can be quite simple: no complexity there. The complexity is in getting the user interface to enable a disjunction.
Suppose the data was a list of customers with customer id, country, gender, total value of transactions in the last 12 months, and number of purchases in last 12 months. I want the end-user (with no technical skills) to specify a minimum threshold for total value (e.g. $1,000) and number of purchases (e.g. 10) and then restrict the data set to those where total value of transactions in the last 12 months > $1,000 OR number of purchases in last 12 months > 10.
After doing that, I want to allow the user to see the data set on a dashboard (e.g. with a table and a graph) and from there select other filters (e.g. gender=male, country=Australia).
The key here is to create separate parameter tables and combine conditions using a measure.
Suppose we have the following Sales table:
Customer Value Number
-----------------------
A 568 2
B 2451 12
C 1352 9
D 876 6
E 993 11
F 2208 20
G 1612 4
Then we'll create two new tables to use as parameters. You could do a calculated table like
Number = VALUES(Sales[Number])
Or something more complex like
Value = GENERATESERIES(0, ROUNDUP(MAX(Sales[Value]),-2), ROUNDUP(MAX(Sales[Value]),-2)/10)
Or define the table manually using Enter Data or some other way.
In any case, once you have these tables, name their columns what you want (I used MinNumber and MinValue) and write your filtering measure
Filter = IF(MAX(Sales[Number]) > MIN(Number[MinCount]) ||
MAX(Sales[Value]) > MIN('Value'[MinValue]),
1, 0)
Then put your Filter measure as a visual level filter where Filter is not 0 and use MinCount and MinValues column as slicers.
If you select 10 for MinCount and 1000 for MinValue then your table should look like this:
Notice that E and G only exceed one of the thresholds and tha A and D are excluded.
To my knowledge, there is no such built-in slicer feature in Power BI at the time being. There is however a suggestion in the Power BI forum that requests a functionality like this. If you'd be willing to use the Power Query Editor, it's easy to obtain the values you're looking for, but only for hard-coded values for your limits or thresh-holds.
Let me show you how for a synthetic dataset that should fit the structure of your description:
Dataset:
CustomerID,Country,Gender,TransactionValue12,NPurchases12
51,USA,M,3516,1
58,USA,M,3308,12
57,USA,M,7360,19
54,USA,M,2052,6
51,USA,M,4889,5
57,USA,M,4746,6
50,USA,M,3803,3
58,USA,M,4113,24
57,USA,M,7421,17
58,USA,M,1774,24
50,USA,F,8984,5
52,USA,F,1436,22
52,USA,F,2137,9
58,USA,F,9933,25
50,Canada,F,7050,16
56,Canada,F,7202,5
54,Canada,F,2096,19
59,Canada,F,4639,9
58,Canada,F,5724,25
56,Canada,F,4885,5
57,Canada,F,6212,4
54,Canada,F,5016,16
55,Canada,F,7340,21
60,Canada,F,7883,6
55,Canada,M,5884,12
60,UK,M,2328,12
52,UK,M,7826,1
58,UK,M,2542,11
56,UK,M,9304,3
54,UK,M,3685,16
58,UK,M,6440,16
50,UK,M,2469,13
57,UK,M,7827,6
Desktop table:
Here you see an Input table and a subset table using two Slicers. If the forum suggestion gets implemented, it should hopefully be easy to change a subset like below to an "OR" scenario:
Transaction Value > 1000 OR Number or purchases > 10 using Power Query:
If you use Edit Queries > Advanced filter you can set it up like this:
The last step under Applied Steps will then contain this formula:
= Table.SelectRows(#"Changed Type2", each [NPurchases12] > 10 or [TransactionValue12] > 1000
Now your original Input table will look like this:
Now, if only we were able to replace the hardcoded 10 and 1000 with a dynamic value, for example from a slicer, we would be fine! But no...
I know this is not what you were looking for, but it was the best 'negative answer' I could find. I guess I'm hoping for a better solution just as much as you are!

PowerBI DAX function to count number of occurrences using DISTINCT (Countif)

I am trying to create a column in PowerBI that counts how times a customer name occurs in a list.
I may be going in the wrong direction, but what I have so far is;
My SQL query returns a table of customer site visits (Query1), from which I have created a new table (Unique) and column (Customer) that lists the distinct names (cust_company_descr) from Query1;
Unique = DISTINCT(Query1[cust_company_descr])
What I need is a new column in the Unique table (Count) that will give me a count of how many times each customer name in the Query1 table appears.
If I were working in Excel, the solution would be to generate a list of unique values and do a COUNTIF, but I can't find a way to replicate this.
I have seen a few solutions that involve this sort of thing;
CountValues =
CALCULATE ( COUNTROWS ( TableName ); TableName[ColumnName] = " This Value " )
The issue I have with this is the Unique table contains over 300 unique entries, so I can't make this work.
Example (The Count column is what I'm trying to create)
Query 1
cust_company_descr
Company A
Company B
Company A
Company C
Company B
Company A
Unique
Company_____Count
Company A_____3
Company B_____2
Company C_____1
Any help is gratefully received.

Analyzing tweeter with hive, regex extract

I am trying to analyze what are the most popular hashtags of July. So far I am able to select tweets from July, or display the most popular tweets, but I didn't sucess in putting them together. I am thinking about creating a intermediate table with july tweets, then display the popular hashtags, but I don't know how, can you help me? What about a 2 level select (select a from select b from table) ?
SELECT hashtags.text, count(*) as total FROM tweets
WHERE regexp_extract(created_at, "(Tue) (Jul)*", 2) = "Jul"
LATERAL VIEW EXPLODE(entities.hashtags) t1 AS hashtags
GROUP BY LOWER(hashtags.text), created_at
ORDER BY total_count DESC
LIMIT 200
Regards, K.
So far, I did this, which is pretty much what I want, but is there any mean to achieve this differently ?
Working nested query:
SELECT
LOWER(hashtags.text),
COUNT(*) AS total_count
FROM (
SELECT * FROM tweets WHERE regexp_extract(created_at,"(Tue Jul)*",1) = "Tue Jul"
) tweets
LATERAL VIEW EXPLODE(entities.hashtags) t1 AS hashtags
GROUP BY LOWER(hashtags.text)
ORDER BY total_count DESC
LIMIT 15
EDIT:
Ok, so if you want you can also do it by a temporary table:
CREATE TABLE tmpdb (
id BIGINT,
created_at STRING,
source STRING,
favorited BOOLEAN,
retweet_count INT,
retweeted_status STRUCT<
text:STRING,
user:STRUCT<screen_name:STRING,name:STRING>>,
entities STRUCT<
urls:ARRAY<STRUCT<expanded_url:STRING>>,
user_mentions:ARRAY<STRUCT<screen_name:STRING,name:STRING>>,
hashtags:ARRAY<STRUCT<text:STRING>>>,
text STRING,
user STRUCT<
screen_name:STRING,
name:STRING,
friends_count:INT,
followers_count:INT,
statuses_count:INT,
verified:BOOLEAN,
utc_offset:INT,
time_zone:STRING>,
in_reply_to_screen_name STRING
)
ROW FORMAT SERDE 'com.cloudera.hive.serde.JSONSerDe'
Then you update it:
INSERT OVERWRITE TABLE tmpdb
SELECT * FROM tweets WHERE regexp_extract(created_at,"(Tue Jul)*",1) = "Tue Jul"
And the request become as simple as this:
SELECT
LOWER(hashtags.text),
COUNT(*) AS total_count
FROM tmpdb
LATERAL VIEW EXPLODE(entities.hashtags) t1 AS hashtags
GROUP BY LOWER(hashtags.text)
ORDER BY total_count DESC
LIMIT 15
The pro/cons about the second method:
You need to update the table if you want accurate requests, so it is not suited for one-shot request, but if you need to do multiple requests on the current state of the database, then this method is better.
Don't forget that, copying a database is a costly operation ! So know when to use it :)

Sorting events to show this week, month etc

I have a list of events but I'm struggling to work out how to show specific date ranges in the index view.
I would like to list the events by showing events today, this week, this month etc.
I'm new to rails so I've tried to use this site and I've come up with the following which works for today's events.
#events_today = Event.find(:all, :conditions => ["date between ? and ?", Date.today, Date.tomorrow])
But I'm not sure how to set the page to automatically update and show only this weeks events and this month.
Your basic query should do something like this:
Event.where(date: date_range)
Now before calling this query you can set the date range variable. If you only want this week:
date_range = Date.today.beginning_of_week..Date.today
Event.where(date: date_range)
Now there are all sorts of things you can do. You can select a start and end date or a custom period using either a form or a dropdown select. In this case date_range is set based on your params. You could also always use one predefined period.
If you want to work with several date range periods it could be nice to have a last_week, last_month, etc. scope in your model (or concern). Or you could simply define date_range constants in your initializers.
As per my understanding, you want to show all the event of a week or month on click of tab 'Week' or 'Month' from view.
When you clicked on month or week for getting events, you send simply month or week in params(assuming params[:events_in] hold 'week' or 'month')
apply check on this attributes of params
def get_events_in_week_or_month
if params[:events_in] =='week'
start_date_of_time_period = Date.today.beginning_of_week
end_date_of_time_period = Date.today.end_of_week
else
start_date_of_time_period = Date.today.beginning_of_month
end_date_of_time_period = Date.today.end_of_month
end
#events in descending order
#events = Event.where("date between ? and ? ", start_date_of_time_period, end_date_of_time_period).order("created_at DESC")
#events in ascending order
#events = Event.where("date between ? and ? ", start_date_of_time_period, end_date_of_time_period)
end
for getting more methods of date class, you should run following command on rails console :
Date.public_methods