Unit test for z-score with Prometheus - unit-testing

I have been experimenting a lot with writing unit tests for alerts as per this: https://prometheus.io/docs/prometheus/latest/configuration/unit_testing_rules/#alerts-yml
I have some simple cases out, but now I am tackling rules that are less trivial. For example this:
abs(
avg_over_time(my_metrics{service_name="aService"}[1m])
-
avg_over_time(my_metrics{service_name="aService"}[3m])
)
/ stddev_over_time(my_metrics{service_name="aService"}[3m])
> 3
I have one file with the above rule and then this is in my test:
- interval: 1m
# Series data.
input_series:
- series: 'my_metrics{service_name="aService"}'
values: '0 0 0 0 1 0 0 0 0 '
alert_rule_test:
- eval_time: 3m
alertname: myalert
exp_alerts:
- exp_labels:
severity: warning
service_name: aService
exp_annotations:
summary: "some text"
description: "some other text"
I am not sure what my series should look like in order to test deviation from the mean. Is it even possible to test such rule?
Thank you
EDIT
I can have a succesful test if I set it > 0 as opposed to >3 I have tried to set a series of this sort:
'10+10x2 30+1000x1000'
but I cannot understand what would be the correct setup to have it triggered

This isn't a direct answer, rather a tip from someone who spent quite some time on these tests. Did you know that apart from testing alert expressions, you can unittest PromQL expressions as well? See how it can be useful:
evaluation_interval: 1m
tests:
- interval: 1m
input_series:
- series: test_metric
values: 1 1 1 10 1 1 1
promql_expr_test:
- expr: avg_over_time(test_metric[1m])
eval_time: 4m
exp_samples:
- value: #5.5
- expr: avg_over_time(test_metric[3m])
eval_time: 4m
exp_samples:
- value: #3.25
- expr: stddev_over_time(test_metric[3m])
eval_time: 4m
exp_samples:
- value: #3.897114317029974
I've split your alert expression into three separate, simple parts. If you run this unittest, you will see the commented-out values in the error message. From here it is not difficult to join pieces together and see why the alert is not happening. You can use that to build a working sequence of values.

Related

QAbstractProxyModel: How to implement methods?

I have a TreeModel with a database like structure:
+Table1
-"Table1_key" -"name"
- 1 -"John"
- 2 -"Peter"
-...
+Table2
-"Table2_key" -"Table1_key" -"value1" -"value2"
- 1 - 1 - 1000 - 20000
- 2 - 1 - 3000 - 4000
- 3 - 2 - 1000 -2000
-...
+...
So the Data comes from an .xml file and is displayed in a TreeView which works just fine.
However i want to display some of the Tables in different Views and resolve the keys in that view.
For the example model the required View could look like this:
+"John"
-"value1" -"value2"
- 1000 - 20000
- 3000 - 4000
+"Peter"
- 1000 -2000
I guess using a QAbstractProxyModel would be the proper way.
So my question is: How do i implement this?
I can't find any examples and i have no idea how to map between the indexes of the source and the proxymodel in the mapToSource/mapFromSource methods.

Why do I get multiple tables for a single measurement in influxdb

When I use this SELECT I get the following output.
SELECT integral("value",1h) / 1000 FROM /(Klima|NAS)_Power/ WHERE time > now()-1w AND time <= now() GROUP BY time(1d) fill(null)
name: Klima_Power
time integral
---- --------
2019-07-11T00:00:00Z 0.0028576888333333326
2019-07-12T00:00:00Z 0.05559535705833335
2019-07-13T00:00:00Z 0.055475250270833325
2019-07-14T00:00:00Z 0.0551049064541667
2019-07-15T00:00:00Z 0.055454312898611136
2019-07-16T00:00:00Z 0.05580957162916666
2019-07-17T00:00:00Z 0.05551291632777774
name: NAS_Power
time integral
---- --------
2019-07-11T00:00:00Z 0
2019-07-12T00:00:00Z 0
2019-07-13T00:00:00Z 0
2019-07-14T00:00:00Z 0
2019-07-15T00:00:00Z 0
2019-07-16T00:00:00Z 0.1073428686286408
2019-07-17T00:00:00Z 0.7449990083701262
2019-07-18T00:00:00Z 0.756581078140122
name: Klima_Power
time integral
---- --------
2019-07-18T00:00:00Z 0.05271264777916669
I want to create a graph in Grafana that shows stacked bars for multiple Measurements.
It works, but some Measurements are listet multiple times at the same time interval.
I guess I some how need to "group" the output so the values of the same measurement are listet in the same table.
Multiple blocks(tables) are getting because you are executing select statement with group by clause. If you prefer to get distinct records as output you can use distinct function.

Regex to parse Cisco log messages

I am trying to write a regex that will parse the following Cisco log messages correctly:
<191>45902: DC-SWITCH2: Aug 30 18:15:16.478: %SFF8472-3-THRESHOLD_VIOLATION: Te0/2: Rx power high warning; Operating value: -0.8 dBm, Threshold value: -1.0 dBm.
Desired output:
Te0/2: Rx power high warning; Operating value: -0.8 dBm, Threshold value: -1.0 dBm.
And:
<191>45902: DC-SWITCH2: Aug 31 19:17:30.147: sensor num : 10 sensor_value :33, high :110 low:85
Desired output:
sensor num : 10 sensor_value :33, high :110 low:85
I have developed the following regex for the first case, but I cannot fathom how to make the mnemonic %STRING section optional:
>\d+:\s.+?:\s.+?(?=:\s):\s%.+?(?=:\s):?\s(.+)
It returns the desired result for the first example, but for the second I get:
10 sensor_value :33, high :110 low:85
You want to make the part that checks for the %STRING non-capturing.
Something like this:
>\d+:\s.+?:\s.+?(?=:\s):\s(?:%.+?:)?\s(.+)
See https://regex101.com/r/F30ALK/1
Why not try something generic like
\d{2}:\d{2}:\d{2}.\d{3}.*? (\b[A-Za-z].*)
where the required output will be in the Group 1.
Example shown here

Weka Document Clustering: Doc ID not visible in the output

I have to crawl Wikipedia to get HTML pages of countries. I have successfully crawled. Now to build clusters, I have to do KMeans. I am using Weka for that.
I have used this code to convert my directory into arff format:
https://weka.wikispaces.com/file/view/TextDirectoryToArff.java
Here is its output:
enter image description here
Then I opened that file in Weka and performed StringToWordVector conversion with these parameters:
Then I performed Kmeans. The output I am getting is:
=== Run information ===
Scheme:weka.clusterers.SimpleKMeans -N 2 -A "weka.core.EuclideanDistance -R first-last" -I 5000 -S 10
Relation: text_files_in_files-weka.filters.unsupervised.attribute.StringToWordVector-R1,2-W1000-prune-rate-1.0-C-T-I-N1-L-S-stemmerweka.core.stemmers.SnowballStemmer-M0-O-tokenizerweka.core.tokenizers.WordTokenizer -delimiters " \r\n\t.,;:\'\"()?!"-weka.filters.unsupervised.attribute.StringToWordVector-R-W1000-prune-rate-1.0-C-T-I-N1-L-S-stemmerweka.core.stemmers.SnowballStemmer-M0-O-tokenizerweka.core.tokenizers.WordTokenizer -delimiters " \r\n\t.,;:\'\"()?!"
Instances: 28
Attributes: 1040
[list of attributes omitted]
Test mode:evaluate on training data
=== Model and evaluation on training set ===
kMeans
Number of iterations: 2
Within cluster sum of squared errors: 1915.0448503841326
Missing values globally replaced with mean/mode
Cluster centroids:
Cluster#
Attribute Full Data 0 1
(28) (22) (6)
====================================================================================
.
.
.
.
.
bolsheviks 0.3652 0.3044 0.5878
book 0.3229 0.3051 0.3883
border 0.4329 0.5509 0
border-left-style 0.4329 0.5509 0
border-left-width 0.3375 0.4295 0
border-spacing 0.3124 0.3304 0.2461
border-width 0.5128 0.2785 1.372
boundary 0.309 0.3007 0.3392
brazil 0.381 0.3744 0.4048
british 0.4387 0.2232 1.2288
brown 0.2645 0.2945 0.1545
cache-control=max-age=87840 0.4913 0.4866 0.5083
california 0.5383 0.5085 0.6478
called 0.4853 0.6177 0
camp 0.4591 0.5451 0.1437
canada 0.3176 0.3358 0.251
canadian 0.2976 0.1691 0.7688
capable 0.2475 0.315 0
capita 0.388 0.1188 1.375
carbon 0.3889 0.445 0.1834
caribbean 0.4275 0.5441 0
carlsbad 0.548 0.5339 0.5998
caspian 0.4737 0.5345 0.2507
category 0.2216 0.2821 0
censorship 0.2225 0.0761 0.7596
center 0.4829 0.4074 0.7598
central 0.211 0.0805 0.6898
century 0.2645 0.2041 0.4862
chad 0.3636 0.0979 1.3382
challenger 0.5008 0.6374 0
championship 0.6834 0.8697 0
championships 0.2891 0.1171 0.9197
characteristics 0.237 0 1.1062
charon 0.5643 0.4745 0.8934
china
.
.
.
.
.
Time taken to build model (full training data) : 0.05 seconds
=== Model and evaluation on training set ===
Clustered Instances
0 22 ( 79%)
1 6 ( 21%)
How to check which DocId is in which cluster? I have searched a lot but didnt find anything.
Also, is there any other good Java Library for Kmeans and agglomerate clustering?

How to write a program to rename mp4 files to match the name of srt files?

So I have downloaded the mp4 and srt files for "Introduction to computer networks" course from Coursera. But there is slight discrepancy between the names of mp4 and srt files.
The file name samples are following:
1 - 1 - 1-1 Goals and Motivation (1253).mp4
1 - 1 - 1-1 Goals and Motivation (12_53).srt
1 - 2 - 1-2 Uses of Networks (1316).mp4
1 - 2 - 1-2 Uses of Networks (13_16).srt
1 - 3 - 1-3 Network Components (1330).mp4
1 - 3 - 1-3 Network Components (13_30).srt
1 - 4 - 1-4 Sockets (1407).mp4
1 - 4 - 1-4 Sockets (14_07).srt
1 - 5 - 1-5 Traceroute (0736).mp4
1 - 5 - 1-5 Traceroute (07_36).srt
1 - 6 - 1-6 Protocol Layers (2225).mp4
1 - 6 - 1-6 Protocol Layers (22_25).srt
1 - 7 - 1-7 Reference Models (1409).mp4
1 - 7 - 1-7 Reference Models (14_09).srt
1 - 8 - 1-8 Internet History (1239).mp4
1 - 8 - 1-8 Internet History (12_39).srt
1 - 9 - 1-9 Lecture Outline (0407).mp4
1 - 9 - 1-9 Lecture Outline (04_07).srt
2 - 1 - 2-1 Physical Layer Overview (09_27).mp4
2 - 1 - 2-1 Physical Layer Overview (09_27).srt
2 - 2 - 2-2 Media (856).mp4
2 - 2 - 2-2 Media (8_56).srt
2 - 3 - 2-3 Signals (1758).mp4
2 - 3 - 2-3 Signals (17_58).srt
2 - 4 - 2-4 Modulation (1100).mp4
2 - 4 - 2-4 Modulation (11_00).srt
2 - 5 - 2-5 Limits (1243).mp4
2 - 5 - 2-5 Limits (12_43).srt
2 - 6 - 2-6 Link Layer Overview (0414).mp4
2 - 6 - 2-6 Link Layer Overview (04_14).srt
2 - 7 - 2-7 Framing (1126).mp4
2 - 7 - 2-7 Framing (11_26).srt
2 - 8 - 2-8 Error Overview (1745).mp4
2 - 8 - 2-8 Error Overview (17_45).srt
2 - 9 - 2-9 Error Detection (2317).mp4
2 - 9 - 2-9 Error Detection (23_17).srt
2 - 10 - 2-10 Error Correction (1928).mp4
2 - 10 - 2-10 Error Correction (19_28).srt
I want to rename the mp4 files to match srt files so that vlc can automatically load the subtitles when I play the videos. What someone discuss be algorithms to do this? You can also provide solution code in any language as I am familiar with many programming languages. But python and c++ are preferable.
Edit:
Thanks to everyone who replied. I know it is easier to rename the srt files than the other way around. But I think it will be more interesting to rename the mp4 files. Any suggestions?
Here's a quick solution in python.
The job is simple if you make the following assumptions:
all files are in the same folder
you have the same number of srt and mp4 files in the directory
all srt are ordered alphabetically, all mp4 are ordered alphabetically
Note I do not assume anything about the actual names (e.g. that you only need to remove underscores).
So you don't need any special logic for matching the files, just go one-by-one.
import os, sys, re
from glob import glob
def mv(src, dest):
print 'mv "%s" "%s"' % (src, dest)
#os.rename(src, dest) # uncomment this to actually rename the files
dir = sys.argv[1]
vid_files = sorted(glob(os.path.join(dir, '*.mp4')))
sub_files = sorted(glob(os.path.join(dir, '*.srt')))
assert len(sub_files) == len(vid_files), "lists of different lengths"
for vidf, subf in zip(vid_files, sub_files):
new_vidf = re.sub(r'\.srt$', '.mp4', subf)
if vidf == new_vidf:
print '%s OK' % ( vidf, )
continue
mv(vidf, new_vidf)
Again, this is just a quick script. Suggested improvements:
support different file extensions
use a better cli, e.g. argparse
support taking multiple directories
support test-mode (don't actually rename the files)
better error reporting (instead of using assert)
more advanced: support undoing
If all those files really follow this scheme the Python implementation is almost trivial:
import glob, os
for subfile in glob.glob("*.srt")
os.rename(subfile, subfile.replace("_",""))
If your mp4 also contain underscores you want to add an additional loop for them.
for f in *.srt; do mv $f ${f%_}; done
This is just an implementation of what Zeta said :)
import os;
from path import path;
for filename in os.listdir('.'):
extension = os.path.splitext(path(filename).abspath())[1][1:]
if extension == 'srt':
newName = filename.replace('_','');
print 'Changing\t' + filename + ' to\t' + newName;
os.rename(filename,newName);
print 'Done!'