AWS Elastic Search drop size abruptly - amazon-web-services

We have a social like app, and we started using the AWS ElasticcSearch Service in production, but we started to have a problem with ES, the ES version is the 2.3.
The cluster configuration is:
Data node: 2
Data node types: m3.medium.elasticsearch
Dedicated master instance count: 3
Dedicated master instance type: t2.small.elasticsearch.
Capacity of each data node: 50GB.
The problem is that in less than thirty minutes one of the node free storage size went from 9 GB to 0 GB, we did not know how this happened.
We have 4 types of documents, where one of them is a dynamic type, lets call it Group type, that is because every document of Group can have N fields that represents the friends of a Group.
Something like
{
13: [1,2,3,4],
5: [1,3,4],
user_ids: [1,2,3,4,6,7],
id: 1
}
This means that the users with ID 13 and 5 are friends with some of the users of the Group with ID 1.
So this document can grows according to the amount of users.
If anyone had or has the same problem, or just fully understand the Elastic Search architecture it would be awesome his help.
Indices info:
curl -XGET 'http://host/_cat/indices?v
health status index pri rep docs.count docs.deleted store.size pri.store.size
green open .kibana-4 1 1 5 0 1.9mb 1017.3kb
green open X 1 1 2259502 29575 57.5gb 28.7gb
green open Y 1 1 113156 0 21.7mb 10.8mb
curl -XGET 'http://host/_cat/nodes?v&h=host,id,ip,rp,hp,d,cpu,v,r,m,n
host id ip rp hp d cpu v r m n
x.x.x.x tIgm x.x.x.x 95 5 5.7gb 0 2.3.2 - m Shatter
x.x.x.x puUF x.x.x.x 95 6 5.7gb 0 2.3.2 - m Justice
x.x.x.x 1qZi x.x.x.x 97 54 17.7gb 7 2.3.2 d - Allatou
x.x.x.x lcty x.x.x.x 97 60 17.7gb 8 2.3.2 d - Amergin
x.x.x.x Nq1H x.x.x.x 5 15 5.7gb 0 2.3.2 - * Arkus
Thanks a lot!

I have managed to resolve the problem.
My problem is known as Mapping Explosion
Having variables keys in the mapping, like I had in the Group document type, will result on an evergrowing index.

Related

Weka Question: Which cluster do this Iris attributes belongs to?

I'm totally new into Weka and data science, I got an assignment to detect which of the following Iris attributes (SW, SL, PW, PL) belongs to which cluster? can you assist me? Thanks!
enter image description here
The iris dataset that comes with Weka has three classes (Iris-setosa, Iris-versicolor, Iris-virginica).
If you want to see how well clusters determined by your cluster algorithm align with the class labels, you need to select Classes to clusters evaluation in the Weka Explorer or via the -c <class_att_index> option on the command-line.
The following command uses SimpleKMeans with three clusters on the iris dataset that comes with Weka (-c last uses the last attribute as class and performs clusters to classes evaluation):
java -cp weka.jar weka.clusterers.SimpleKMeans -N 3 -c last -t data/iris.arff
Which will result in this output:
=== Clustering stats for training data ===
kMeans
======
Number of iterations: 6
Within cluster sum of squared errors: 6.998114004826762
Initial starting points (random):
Cluster 0: 6.1,2.9,4.7,1.4
Cluster 1: 6.2,2.9,4.3,1.3
Cluster 2: 6.9,3.1,5.1,2.3
Missing values globally replaced with mean/mode
Final cluster centroids:
Cluster#
Attribute Full Data 0 1 2
(150.0) (61.0) (50.0) (39.0)
=========================================================
sepallength 5.8433 5.8885 5.006 6.8462
sepalwidth 3.054 2.7377 3.418 3.0821
petallength 3.7587 4.3967 1.464 5.7026
petalwidth 1.1987 1.418 0.244 2.0795
Clustered Instances
0 61 ( 41%)
1 50 ( 33%)
2 39 ( 26%)
Class attribute: class
Classes to Clusters:
0 1 2 <-- assigned to cluster
0 50 0 | Iris-setosa
47 0 3 | Iris-versicolor
14 0 36 | Iris-virginica
Cluster 0 <-- Iris-versicolor
Cluster 1 <-- Iris-setosa
Cluster 2 <-- Iris-virginica
Incorrectly clustered instances : 17.0 11.3333 %

PowerBI Hierarchy RANKX Against Everyone at Each Level

I'm having a lot of trouble figuring out how to get RANKX to behave at different levels of a hierarchy.
I have a hierarchy structure as follows:
Region
Manager
Supervisor
Agent
For simplicity sake, let's rank each level on the total number of Requests Handled. I want to rank each level of the hierarchy against everyone at that level. For example, at the Agent level, each agent's total Requests Handled should be ranked against every other agent, regardless of what supervisor, manager, or region they are in.
I can get the agent level to work just fine with the following, but I can't figure out how to get the upper levels to rank against each other. This works fine if I apply the same RANKX statement at any level, but only when one level is shown in a visual. When adding any additional levels it breaks.
Requests Handled Measure =
SUMX('Work Done',[Requests Handled])
Rank Measure =
IF(
ISINSCOPE(Roster[Associate Name]) && NOT(ISBLANK([Requests Handled Measure])),
RANKX(
ALL(Roster[Associate Name], Roster[Supervisor Name], Roster[Manager Name], Roster[Region]),
[Requests Handled Measure],,DESC,Dense
)
)
The ideal result would be something like:
- Region 1 240 Requests Rank 2
- Manager A 122 Requests Rank 2
- Supervisor A 65 Requests Rank 3
- Agent A 30 Requests Rank 9
- Agent B 35 Requests Rank 5
- Supervisor B 57 Requests Rank 4
- Agent C 29 Requests Rank 10
- Agent D 28 Requests Rank 11
- Manager B 118 Requests Rank 3
- Supervisor C 65 Requests Rank 3
- Agent E 33 Requests Rank 6
- Agent F 32 Requests Rank 7
- Supervisor D 53 Requests Rank 6
- Agent G 26 Requests Rank 13
- Agent H 27 Requests Rank 12
- Region 2 250 Requests Rank 1
- Manager C 99 Requests Rank 4
- Supervisor E 56 Requests Rank 5
- Agent I 25 Requests Rank 14
- Agent J 31 Requests Rank 8
- Supervisor F 43 Requests Rank 7
- Agent K 20 Requests Rank 16
- Agent L 23 Requests Rank 15
- Manager D 151 Requests Rank 1
- Supervisor G 78 Requests Rank 1
- Agent M 40 Requests Rank 1
- Agent N 38 Requests Rank 2
- Supervisor H 73 Requests Rank 2
- Agent O 36 Requests Rank 4
- Agent P 37 Requests Rank 3

Amazon QuickSight - Working out size of network

I have a database table with a record for each IOT device connected, each device has a unique device id and a unique network id associated with it.
For example:
device_id
network_id
1
1
2
1
3
1
4
2
5
2
6
3
7
3
8
3
9
3
10
4
I would like to be able visualise the size of each network based on its id. So I would have an output like such based on the above data:
network_id
size
1
3
2
2
3
4
4
1
I'm not currently sure how to do this
I found that using the countOver function worked for this
I made a calculated field called NetworkSize which was defined as:
countOver
(
{device_id}
,[{network_id}]
)
Which gives the right output I was looking for
However I have to include device_id in the visual which is a bit inconvenient

Apply character set in regex lookbehind

I need to get the subnets count of bgp 7029, using regexp like
(?<=bgp 7029[\s]+\d[\s+])\d
but this doesn't work with positive look behind.
sh ip route vrf vrf-dnoc-mpls-test summary
IP routing table name is vrf-dnoc-mpls-test (0x2)
IP routing table maximum-paths is 32
Route Source Networks Subnets Replicates Overhead Memory (bytes)
static 0 0 0 0 0
connected 0 1 0 60 172
bgp 7029 0 1686 0 101160 289992
External: 0 Internal: 1686 Local: 0
internal 36 73652
Total 36 1687 0 101220 363816
Don't really need a lookbehind, a capture group will work just as well.
bgp[ \t]7029[ \t]+\d+[ \t]+(\d+)
where the subnet is in group 1

Way to get SCSI disk names in Linux C++ application

In my Linux C++ application I want to get names of all SCSI disks which are present on the
system. e.g. /dev/sda, /dev/sdb, ... and so on.
Currently I am getting it from the file /proc/scsi/sg/devices output using below code:
host chan SCSI id lun type opens qdepth busy online
0 0 0 0 0 1 128 0 1
1 0 0 0 0 1 128 0 1
1 0 0 1 0 1 128 0 1
1 0 0 2 0 1 128 0 1
// If SCSI device Id is > 26 then the corresponding device name is like /dev/sdaa or /dev/sdab etc.
if (MAX_ENG_ALPHABETS <= scsiId)
{
// Device name order is: aa, ab, ..., az, ba, bb, ..., bz, ..., zy, zz.
deviceName.append(1, 'a'+ (char)(index / MAX_ENG_ALPHABETS) - 1);
deviceName.append(1, 'a'+ (char)(index % MAX_ENG_ALPHABETS));
}
// If SCSI device Id is < 26 then the corresponding device name is liek /dev/sda or /dev/sdb etc.
else
{
deviceName.append(1, 'a'+ index);
}
But the file /proc/scsi/sg/devices also contains the information about the disk which were previously present on the system. e.g If I detach the disk (LUN) /dev/sdc from the system
the file /proc/scsi/sg/devices still contains info of /dev/sdc which is invalid.
Tell me is there any different way to get the SCSI disk names? like a system call?
Thanks
You can simply read list of all files like /dev/sd* (in C, you would need to use opendir/readdir/closedir) and filter it by sdX (where X is one or two letters).
Also, you can get list of all partitions by reading single file /proc/partitions, and then filter 4th field by sdX:
$ cat /proc/partitions
major minor #blocks name
8 0 52428799 sda
8 1 265041 sda1
8 2 1 sda2
8 5 2096451 sda5
8 6 50066541 sda6
which would give you list of all physical disks together with their capacity (3rd field).
After get disk name list from /proc/scsi/sg/devices, you can verify the existence through code. For example, install sg3-utils, and use sg_inq to query whether the disk is active.