Superset Cache Warmup not working-Docker version - apache-superset

I have docker version of superset, Version 1.2.0, Ubuntu 18.03,
Have enabled data cache and working as expected.
data cache expire every 2 minutes and setup cache warm up for every three minute, cache warm up is not working, Any suggestion to enable cache warm up?
Where i can find Celery beat or worker log?
My Superset_config.py
DATA_CACHE_CONFIG = {
'CACHE_TYPE': 'redis',
'CACHE_DEFAULT_TIMEOUT': 120,
'CACHE_KEY_PREFIX': 'superset_results',
'CACHE_REDIS_URL': 'redis://redis:6379/0',
}
CELERYBEAT_SCHEDULE = {
'cache-warmup-hourly': {
'task': 'cache-warmup',
'schedule': crontab(minute='*/3'),
'kwargs': {
'strategy_name': 'top_n_dashboards',
'top_n': 5,
'since': '7 days ago',
},
},
}
my Superset docker process status
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
0c4cd8aa4ddc apache/superset:latest-dev "/usr/bin/docker-ent…" 57 minutes ago Up 42 minutes (unhealthy) 8080/tcp superset_worker
df9506be2f84 apache/superset:latest-dev "/usr/bin/docker-ent…" 57 minutes ago Up 57 minutes (healthy) 8080/tcp, 0.0.0.0:8088->8088/tcp superset_app
f42c7aee5f3f node:14 "docker-entrypoint.s…" 57 minutes ago Up 57 minutes superset_node
5318e34d1607 apache/superset:latest-dev "/usr/bin/docker-ent…" 57 minutes ago Exited (1) 57 minutes ago superset_tests_worker
4874a4f53776 apache/superset:latest-dev "/usr/bin/docker-ent…" 57 minutes ago Exited (0) 55 minutes ago superset_init
fd50a927cfb5 apache/superset:latest-dev "/usr/bin/docker-ent…" 57 minutes ago Up 42 minutes (unhealthy) 8080/tcp superset_worker_beat
b4b160ecedf5 redis:latest "docker-entrypoint.s…" 57 minutes ago Up 57 minutes 127.0.0.1:6379->6379/tcp superset_cache
a9e5e4f4e938 postgres:10 "docker-entrypoint.s…" 57 minutes ago Up 57 minutes 127.0.0.1:5432->5432/tcp superset_db

This issue looks like outstanding defect on superset
https://github.com/apache/superset/issues/9597
https://github.com/apache/superset/pull/15713
Will wait for fix.

Related

Proxy server status capturing

My goal is to pull the key items for my servers that we are tracking for KPIs. My plan is to run this daily via a cron job and then have it email me once a week to be able to be put in an excel sheet to grab the monthly KPIs. Here is what I have so far.
#!/bin/bash
server=server1
ports=({8400..8499})
for l in ${ports[#]}
do
echo "checking on '$l'"
sp=$(curl -k --silent "https://"$server":"$l"/server-status" | grep -E "Apache Server|Total accesses|CPU Usage|second|uptime" | sed 's/<[^>]*>//g')
echo "$l: $sp" >> kpi.tmp
grep -v '^$' kpi.tmp > kpi.out
done
The output shows like this.
8400:
8401: Apache Server Status for server1(via x.x.x.x)
Server uptime: 18 days 4 hours 49 minutes 37 seconds
Total accesses: 545 - Total Traffic: 15.2 MB
CPU Usage: u115.57 s48.17 cu0 cs0 - .0104% CPU load
.000347 requests/sec - 10 B/second - 28.6 kB/request
8402: Apache Server Status for server 1(via x.x.x.x)
Server uptime: 20 days 2 hours 20 minutes 26 seconds
Total accesses: 33 - Total Traffic: 487 kB
CPU Usage: u118.64 s49.41 cu0 cs0 - .00968% CPU load
1.9e-5 requests/sec - 0 B/second - 14.8 kB/request
8403:
8404:
8405: Apache Server Status for server1(via x.x.x.x)
Server uptime: 20 days 2 hours 20 minutes 28 seconds
Total accesses: 35 - Total Traffic: 545 kB
CPU Usage: u133.04 s57.48 cu0 cs0 - .011% CPU load
2.02e-5 requests/sec - 0 B/second - 15.6 kB/request
I am having a hard time figuring out how to filter the out put to the way i would like it. As you can see from my desired output, if it does not have any data to not put it in the file, cut some of the info out of the returned data.
I would like my output to look like this:
8401:server1(via x.x.x.x)
Server uptime: 18 days 4 hours 49 minutes 37 seconds
Total accesses: 545 - Total Traffic: 15.2 MB
CPU Usage: .0104% CPU load
.000347 requests/sec - 10 B/second - 28.6 kB/request
8402: server1(via x.x.x.x)
Server uptime: 20 days 2 hours 20 minutes 26 seconds
Total accesses: 33 - Total Traffic: 487 kB
CPU Usage: .00968% CPU load
1.9e-5 requests/sec - 0 B/second - 14.8 kB/request
8405: server1(via x.x.x.x)
Server uptime: 20 days 2 hours 20 minutes 28 seconds
Total accesses: 35 - Total Traffic: 545 kB
CPU Usage: .011% CPU load
2.02e-5 requests/sec - 0 B/second - 15.6 kB/request

Is there a recommended Power BI DAX pattern for calculating monthly Days Sales Outstanding (a.k.a. Debtor Days) using the Countback method?

Is there a recommended Power BI DAX pattern for calculating monthly Days Sales Outstanding (a.k.a. DSO or Debtor Days) using the Countback method?
I have been searching for a while and although there are many asking about it, there is no working solution recommendation I can find. I think that is perhaps because nobody has set out the problem properly so I am going to try to explain as fully as possible.
DSO is a widely-used management accounting measure of the average number of days that it takes a business to collect payment for its credit sales. More background info on the metric here: https://www.investopedia.com/terms/d/dso.asp
There are various options for defining the calculation. I believe my requirement is known as the countback method. My data set is a fairly large star schema with a separate date dimension, but using the below simplified data set to generate a solution would totally point me in the right direction.
Input data set as follows:
Month No
Month
Days in Month
Debt Balance
Gross Income
1
Jan
31
1000
700
2
Feb
28
1100
500
3
Mar
31
900
400
4
Apr
30
950
600
5
May
31
1000
400
6
Jun
30
1100
550
7
Jul
31
900
700
8
Aug
31
950
500
9
Sep
30
1000
400
10
Oct
31
1100
600
11
Nov
30
900
400
12
Dec
31
950
550
The aim is to create a measure for debtor days equal to the number of days of average daily income per month we need to count back to match the debt balance.
Starting with Dec as an example in 3 steps:
Debt Balance= 950, income = 550. Dec has 31 days. So we take all
31 days of income and reduce the debt balance to 400 (i.e. 950 - 550) and go back to the previous month.
Remaining Dec Debt balance =
400. Nov Income = 700. We don't need all of the daily income from Nov to match the rest of the Dec debt balance. 400/700 x 30 days in
Nov = 17.14 days
We have finished counting back days. 31 + 17.14 = 48.14 debtor days
Nov has a higher balance so we need 1 more step:
Debt balance= 1500, income = 700. Nov has 30 days. So we take all 30 days of income and reduce the debt balance to 800 (i.e. 1500 - 700) and go back to the previous month.
Remaining Nov Debt balance = 800. Oct Income = 600. Oct has 31 days. So we take all 31 days of income from Oct and reduce the Nov debt balance to 200 (i.e. 1500 - 700 - 600)
Remaining Nov debt balance = 200. Sep Income = 400. We don't need all of the daily income from Sep to match the rest of the Nov debt balance. 200/400 x 30 days in Sep = 15 days
We have finished counting back days. 30 + 31 + 15 = 76 debtor days
Apr has a lower balance so can be resolved in one step:
Debt Balance = 400, income = 600. Apr has 30 days. We don't need all of Apr Income as income exceeds debt in this month. 400/600 * 30 = 20 debtor days
The required solution for Debtor days in the simplified data set is therefore shown in the right-most "Debtor Days" column as follows:
Month
Month
Days
Debt Balance
Gross Income
Debtor Days
1
Jan
31
1000
700
2
Feb
28
1100
500
54.57
3
Mar
31
900
400
59.00
4
Apr
30
400
600
20.00
5
May
31
600
400
41.00
6
Jun
30
800
550
49.38
7
Jul
31
900
700
41.91
8
Aug
31
950
500
50.93
9
Sep
30
1000
400
65.43
10
Oct
31
1100
600
67.20
11
Nov
30
1500
700
76.00
12
Dec
31
950
550
48.14
I hope the above explains the required calculation sufficiently. Of course it needs to be implemented as a measure rather than a calculated column as in the real world it needs to work with more complex scenarios with the user defining the filter context at runtime by filtering and slicing in Power BI.
If anyone can recommend a DAX calculation for Debtor Days, that would be great!
This works on a small example, probably this may not work on a large model.
There is no easy way to do that, DAX isnt a programing language and we canot use loop / recursive statements etc. We have many limitations;
We can only mimic this behavior by bulk/ force calculate (which is resource consuming task). The most interesting part is variable _zz where we calculate for each row 3 version of the main table limited to 1/2/3 rows (as you see we hardcode some value - i consider that we can find result in max 3 iteration). You can investigate this if you want by adding NewTable from this code:
filter(GENERATE(SELECTCOLUMNS(GENERATE(Sheet1, GENERATESERIES(1,3,1)),"MYK", [MonthYearKey], "MonthToCheck", [Value], "Debt", [Debt Balance]),
var _tmp = TOPN([MonthToCheck],FILTER(ALL(Sheet1), Sheet1[MonthYearKey] <= [MYK] ), Sheet1[MonthYearKey], DESC)
return row("IncomAgg", SUMX(_tmp, Sheet1[Gross Income]) )
), [IncomAgg] >= [Debt])
Next, I try to find in our Table Variable 2 information, how many months back we must go.
Full code (I use MonthYearKey for time navigating purpose):
Mes =
var __currRowDebt = SELECTEDVALUE(Sheet1[Debt Balance])
var _zz = TOPN(1,
filter(GENERATE(SELECTCOLUMNS(GENERATE(Sheet1, GENERATESERIES(1,3,1)),"MYK", [MonthYearKey], "MonthToCheck", [Value], "Debt", [Debt Balance]),
var _tmp = TOPN([MonthToCheck],FILTER(ALL(Sheet1), Sheet1[MonthYearKey] <= [MYK] ), Sheet1[MonthYearKey], DESC)
return row("IncomAgg", SUMX(_tmp, Sheet1[Gross Income]) )
), [IncomAgg] >= [Debt]), [MonthToCheck], ASC)
var __monthinscoop = sumx(_zz,[MonthToCheck]) - 2
var __backwardrunningIncom = sumx(_zz,[IncomAgg])
var _calc = CALCULATE( sum(Sheet1[Days]), filter(ALL(Sheet1), Sheet1[MonthYearKey] <= SELECTEDVALUE( Sheet1[MonthYearKey]) && Sheet1[MonthYearKey] >= SELECTEDVALUE( Sheet1[MonthYearKey]) - __monthinscoop ))
var __twik = SWITCH( TRUE()
, __monthinscoop < 0 , -1
, __monthinscoop = 0 , 1
, __monthinscoop = 1 , 3
,0)
var __GetRowValue = CALCULATE( SUM(Sheet1[Gross Income]), FILTER(ALL(Sheet1), Sheet1[MonthYearKey] = (SELECTEDVALUE( Sheet1[MonthYearKey]) + __monthinscoop - __twik)))
var __GetRowDays = CALCULATE( SUM(Sheet1[Days]), FILTER(ALL(Sheet1), Sheet1[MonthYearKey] = (SELECTEDVALUE( Sheet1[MonthYearKey]) + __monthinscoop - __twik)))
return
_calc+DIVIDE(__GetRowValue - (__backwardrunningIncom - __currRowDebt), __GetRowValue) * __GetRowDays

Django query to remove older values grouped by id?

Im trying to remove records from a table that have a duplicate value by their oldest timestamp(s), grouping by ID, so the results would be unique values per ID with the newest unique values per ID/timestamp kept, hopefully the below samples will make sense.
sample data:
id value timestamp
10 10 9/4/20 17:00
11 17 9/4/20 17:00
21 50 9/4/20 17:00
10 10 9/4/20 16:00
10 10 9/4/20 15:00
10 11 9/4/20 14:00
11 41 9/4/20 16:00
11 41 9/4/20 15:00
21 50 9/4/20 16:00
so id like to remove any values that have a dupliate value with the same id, keeping the newest timestamps, so the above data would become:
id value timestamp
10 10 9/4/20 17:00
11 17 9/4/20 17:00
21 50 9/4/20 17:00
10 11 9/4/20 14:00
11 41 9/4/20 16:00
EDIT:
query is just
SampleData.objects.all()
One approach could be using Subquery expressions as documented here.
Suppose your SampleData model looks like this:
class SampleData(models.Model):
id2 = models.IntegerField()
value = models.IntegerField()
timestamp = models.DateTimeField()
(I replaced id by id2 to avoid conflicts with the model id).
Then you could delete your duplicates like this:
newest = SampleData.objects.filter(id2=OuterRef('id2'), value=OuterRef('value')).order_by('-timestamp')
SampleData.objects.annotate(newest_id=Subquery(newest.values('pk')[:1])).exclude(pk=F('newest_id')).delete()
Edit:
It seems as if MySQL has some issues handling deletions and subqueries, as documented in this SO post.
In this case a 2 step approach should help: First getting the ids of the objects to delete and then deleting them:
newest = SampleData.objects.filter(id2=OuterRef('id2'), value=OuterRef('value')).order_by('-timestamp')
ids2delete = list(SampleData.objects.annotate(newest_id=Subquery(newest.values('pk')[:1])).exclude(pk=F('newest_id')).values_list('pk', flat=True))
SampleData.objects.filter(pk__in=ids2delete).delete()

Coldfusion 10 scheduled task cron time for every 15 minutes but only on Tuesdays

Trying to setup a scheduled task in CF10 (Standard) to run every 15 minutes but only on Tuesdays. A cron created said this would do the job:
*/15 * * * 2
But that gives the error "An error occured scheduling the task.
Unexpected end of expression." I also tried
15 * * * 2
The notes say 6 or 7 space separated fields - what am I missing? Minute, hour, day of month, month, day of week is 5 fields.
The representation is in the below format:-
Seconds Minutes Hours Day-of-Month Month Day-of-Week Year (optional
field)
So, for a task to run every 15 minutes but only on Tuesdays, below is the CRON.
"0 0/15 * ? * TUE".
You can refer to the link below for more details:
http://www.quartz-scheduler.org/documentation/quartz-2.x/tutorials/tutorial-lesson-06

I am getting Error R14 (Memory quota exceeded) in heroku with a Django app

I am running 3 dynos in Heroku platform for my Django application. It was working properly(1.5 years). Since 2 weeks, I am getting Error R14 (Memory quota exceeded) error frequently.
What should I do in order to avoid this error? How can I monitor the problem?
2015-01-27T10:34:01.855731+00:00 app[web.3]: Starting development server at http://0.0.0.0:43181/
2015-01-27T10:34:02.042166+00:00 heroku[web.3]: State changed from starting to up
2015-01-27T10:34:15.626327+00:00 heroku[web.2]: Error R14 (Memory quota exceeded)
2015-01-27T10:34:15.626241+00:00 heroku[web.2]: Process running mem=662M(129.4%)
2015-01-27T10:34:28.151622+00:00 heroku[router]: at=info method=GET path="/api/shop/651/?format=json&&account=(null)" request_id=2d904167-3a7d-4c8c-9b2c-ae845d0fffa9 fwd="88.247.106.124" dyno=web.1 connect=0ms service=3009ms status=200 bytes=282437
2015-01-27T10:34:28.146392+00:00 app[web.1]: [27/Jan/2015 12:34:28] "GET /api/shop/651/?format=json&&account=(null) HTTP/1.1" 200 282077
2015-01-27T10:34:35.480951+00:00 heroku[web.2]: Process running mem=662M(129.4%)
2015-01-27T10:34:35.481269+00:00 heroku[web.2]: Error R14 (Memory quota exceeded)
2015-01-27T10:34:55.511625+00:00 heroku[web.2]: Process running mem=662M(129.4%)
2015-01-27T10:34:55.511625+00:00 heroku[web.2]: Error R14 (Memory quota exceeded)
These are the logs.
And guppy results:
>>> hp.setref()
>>> hp.heap()
Partition of a set of 40 objects. Total size = 6632 bytes.
Index Count % Size % Cumulative % Kind (class / dict of class)
0 5 12 3160 48 3160 48 unicode
1 25 62 2200 33 5360 81 __builtin__.weakref
2 6 15 496 7 5856 88 list
3 1 2 488 7 6344 96 types.FrameType
4 2 5 184 3 6528 98 tuple
5 1 2 104 2 6632 100 urlparse.SplitResult
>>> hp.heap()
Partition of a set of 24479 objects. Total size = 12695072 bytes.
Index Count % Size % Cumulative % Kind (class / dict of class)
0 7212 29 7558176 60 7558176 60 dict of tastypie.fields.CharField
1 265 1 866008 7 8424184 66 dict (no owner)
2 232 1 777664 6 9201848 72 dict of 0x7fe18acb9360
3 696 3 729408 6 9931256 78 dict of tastypie.fields.DecimalField
4 567 2 594216 5 10525472 83 dict of tastypie.fields.BooleanField
5 517 2 541816 4 11067288 87 dict of tastypie.fields.IntegerField
6 7212 29 461568 4 11528856 91 tastypie.fields.CharField
7 260 1 272480 2 11801336 93 dict of tastypie.fields.DateTimeField
8 1255 5 223952 2 12025288 95 unicode
9 53 0 96248 1 12121536 95 dict of tastypie.fields.ToManyField
It's likely you have either a memory leak, or you're running too many concurrent processes with your server. Are you using gunicorn? If so, look at your procfile and see how many workers you're running -- then lower it by one.
To track the issue, try running:
$ heroku logs --tail
On the command line to view your web logs and see when the memory errors start kicking in.
in my case i've had the error after heavy request processing and the app needed just a heroku restart to get a new clean state
according to the documentation heroku restart make an entirely new clean dynos =>
Heroku CLI Commands