Aerospike losing documents when node goes down - amazon-web-services

I've been doing dome tests using aerospike and I noticed a behavior different than what is sold.
I have a cluster of 4 nodes running on AWS in the same AZ, the instances are t2micro (1cpu, 1gb RAM, 25gb SSD) using the aws linux with the AMI aerospike
aerospike.conf:
heartbeat {
mode mesh
port 3002
mesh-seed-address-port XXX.XX.XXX.164 3002
mesh-seed-address-port XXX.XX.XXX.167 3002
mesh-seed-address-port XXX.XX.XXX.165 3002
#internal aws IPs
...
namespace teste2 {
replication-factor 2
memory-size 650M
default-ttl 365d
storage-engine device {
file /opt/aerospike/data/bar.dat
filesize 22G
data-in-memory false
}
}
What I did was a test to see if I would loose documents when a node goes down. For that I wrote a little code on python:
from __future__ import print_function
import aerospike
import pandas as pd
import numpy as np
import time
import sys
config = {
'hosts': [ ('XX.XX.XX.XX', 3000),('XX.XX.XX.XX',3000),
('XX.XX.XX.XX',3000), ('XX.XX.XX.XX',3000)]
} # external aws ips
client = aerospike.client(config).connect()
for i in range(1,10000):
key = ('teste2', 'setTest3', ''.join(('p',str(i))))
try:
client.put(key, {'id11': i})
print(i)
except Exception as e:
print("error: {0}".format(e), file=sys.stderr)
time.sleep(1)
I used this code just for inserting a sequence of integers that I could check after that. I ran that code and after a few seconds I stopped the aerospike service at one node for 10 seconds, using sudo service aerospike stop and sudo service aerospike colstart to restart.
I waited for a few seconds until the nodes did all the migration and executed the following python script:
query = client.query('teste2', 'setTest3')
query.select('id11')
te = []
def save_result((key, metadata, record)):
te.append(record)
query.foreach(save_result)
d = pd.DataFrame(te)
d2 = d.sort(columns='id11')
te2 = np.array(d2.id11)
for i in range(0,len(te2)):
if i > 0:
if (te2[i] != (te2[i-1]+1) ):
print('no %d'% int(te2[i-1]+1))
print(te2)
And got as response:
no 3
no 6
no 8
no 11
no 13
no 17
no 20
no 22
no 24
no 26
no 30
no 34
no 39
no 41
no 48
no 53
[ 1 2 5 7 10 12 16 19 21 23 25 27 28 29 33 35 36 37 38 40 43 44 45 46 47 51 52 54]
Is my cluster configured wrong or this is normal?
ps: I tried to include as many things I could, if you please suggest more information to include I will appreciate.

Actually I found a solution, and it is pretty simple and foolish to be honest.
In the configuration file we have some parameters for network communication between nodes, such as:
interval 150 # Number of milliseconds between heartbeats
timeout 10 # Number of heartbeat intervals to wait
# before timing out a node
This two parameters set the time it takes to the cluster to realize the node is down and out of the cluster. (in this case 1.5 sec).
What we found useful was to tune the write policies at the client to work along this parameters.
Depending on the client you will have some policies like number of tries until the operation fails, timeout for the operation, time between tries.
You just need to adapt the client parameters. For example: set the number of retries to 4 (each is executed after 500 ms) and the timeout to 2 sec. Doing that the client will recognize the node is down and redirect the operation to another node.
This setup can be overwhelming on the cluster, generating a huge overload, but it worked for us.

Related

Fetching TMYs from the NSRDB for interval size in 5 minutes

I am using pvlib https://pvlib-python.readthedocs.io/en/v0.9.0/generated/pvlib.iotools.get_psm3.html for retrieving NSRDB PSM3 time-series data from the PSM3 API. I was able to retrieve data for 30 minutes, but it doesn't work for changing the interval for 5 minutes!!! any help
my_metadata, my_data = pvlib.iotools.get_psm3(
latitude= xxx, longitude=xxx,
interval=5,
api_key=My_key, email='xxx', # <-- any email works here fine
#names='tmy'
names='xxx')
Perhaps you're using an older version of pvlib? I would try updating to the latest version with pip install -U pvlib or similar. You can check your installed version with print(pvlib.__version__).
This snippet successfully retrieves 5-minute data for me with pvlib 0.9.1:
import pvlib
lat, lon = 40, -80
key = '<redacted>'
email = '<redacted>'
df, metadata = pvlib.iotools.get_psm3(lat, lon, key, email, names='2020',
interval=5, map_variables=True)
It's worth pointing out that TMYs are always hourly; the 5-minute data is only for single (calendar) years like the above 2020 request. The get_psm3 docstring mentions this:
interval : int, {60, 5, 15, 30}
interval size in minutes, must be 5, 15, 30 or 60. Only used for
single-year requests (i.e., it is ignored for tmy/tgy/tdy requests).

Run celery task for maximum 6 hour, if it it takes more then 6 hour, rerun the same task again?

i have using django project with celery + rabbitmq , some of my task takes like 6 hour or more ,even stack , so i want to re-run the same task if its takes more than 6 hour how to do that ,im new with celery ?
You could try;
from celery.exceptions import SoftTimeLimitExceeded
#celery.task(soft_time_limit=60*60*6) # <--- Set time limit to 6 hours
def mytask():
try:
return do_work()
except SoftTimeLimitExceeded:
mytask.retry() # <--- Retry task after limit of 6 hours exceeded

How do I run my python application daily three times

I have a python application which creates JSON files. I need to execute an application three times per day at midnight, noon and evening.
import json
def add():
a = 10
b = 20
c = a+b
f = open("add.json","w")
f.write(str(c))
def subtract():
a = 10
b = 20
c = b-a
f = open("subtract.json","w")
f.write(str(c))
add()
subtract()
I need to run the application at a specified time automatically
You have 2 possibilities :
1- Let your code run with some timer !
For windows user:
2- use the AT command to create 3 scheduled tasks!
For Linux user:
2- use the Cron command to create 3 scheduled tasks !
which seems to be the best solution in your case!

How do I set the column width of a pexpect ssh session?

I am writing a simple python script to connect to a SAN via SSH, run a set of commands. Ultimately each command will be logged to a separate log along with a timestamp, and then exit. This is because the device we are connecting to doesn't support certificate ssh connections, and doesn't have decent logging capabilities on its current firmware revision.
The issue that I seem to be running into is that the SSH session that is created seems to be limited to 78 characters wide. The results generated from each command are significantly wider - 155 characters. This is causing a bunch of funkiness.
First, the results in their current state are significantly more difficult to parse. Second, because the buffer is significantly smaller, the final volume command won't execute properly because the pexpect launched SSH session actually gets prompted to "press any key to continue".
How do I change the column width of the pexpect session?
Here is the current code (it works but is incomplete):
#!/usr/bin/python
import pexpect
import os
PASS='mypassword'
HOST='1.2.3.4'
LOGIN_COMMAND='ssh manage#'+HOST
CTL_COMMAND='show controller-statistics'
VDISK_COMMAND='show vdisk-statistics'
VOL_COMMAND='show volume-statistics'
VDISK_LOG='vdisk.log'
VOLUME_LOG='volume.log'
CONTROLLER_LOG='volume.log'
DATE=os.system('date +%Y%m%d%H%M%S')
child=pexpect.spawn(LOGIN_COMMAND)
child.setecho(True)
child.logfile = open('FetchSan.log','w+')
child.expect('Password: ')
child.sendline(PASS)
child.expect('# ')
child.sendline(CTL_COMMAND)
print child.before
child.expect('# ')
child.sendline(VDISK_COMMAND)
print child.before
child.expect('# ')
print "Sending "+VOL_COMMAND
child.sendline(VOL_COMMAND)
print child.before
child.expect('# ')
child.sendline('exit')
child.expect(pexpect.EOF)
print child.before
The output expected:
# show controller-statistics
Durable ID CPU Load Power On Time (Secs) Bytes per second IOPS Number of Reads Number of Writes Data Read Data Written
---------------------------------------------------------------------------------------------------------------------------------------------------------
controller_A 0 45963169 1573.3KB 67 386769785 514179976 6687.8GB 5750.6GB
controller_B 20 45963088 4627.4KB 421 3208370173 587661282 63.9TB 5211.2GB
---------------------------------------------------------------------------------------------------------------------------------------------------------
Success: Command completed successfully.
# show vdisk-statistics
Name Serial Number Bytes per second IOPS Number of Reads Number of Writes Data Read Data Written
------------------------------------------------------------------------------------------------------------------------------------------------
CRS 00c0ff13349e000006d5c44f00000000 0B 0 45861 26756 3233.0MB 106.2MB
DATA 00c0ff1311f300006dd7c44f00000000 2282.4KB 164 23229435 76509765 5506.7GB 1605.3GB
DATA1 00c0ff1311f3000087d8c44f00000000 2286.5KB 167 23490851 78314374 5519.0GB 1603.8GB
DATA2 00c0ff1311f30000c2f8ce5700000000 0B 0 26 4 1446.9KB 65.5KB
FRA 00c0ff13349e000001d8c44f00000000 654.8KB 5 3049980 15317236 1187.3GB 1942.1GB
FRA1 00c0ff13349e000007d9c44f00000000 778.7KB 6 3016569 15234734 1179.3GB 1940.4GB
------------------------------------------------------------------------------------------------------------------------------------------------
Success: Command completed successfully.
# show volume-statistics
Name Serial Number Bytes per second IOPS Number of Reads Number of Writes Data Read Data Written
-----------------------------------------------------------------------------------------------------------------------------------------------------
CRS_v001 00c0ff13349e0000fdd6c44f01000000 14.8KB 5 239611146 107147564 1321.1GB 110.5GB
DATA1_v001 00c0ff1311f30000d0d8c44f01000000 2402.8KB 218 1701488316 336678620 33.9TB 3184.6GB
DATA2_v001 00c0ff1311f3000040f9ce5701000000 0B 0 921 15 2273.7KB 2114.0KB
DATA_v001 00c0ff1311f30000bdd7c44f01000000 2303.4KB 209 1506883611 250984824 30.0TB 2026.6GB
FRA1_v001 00c0ff13349e00001ed9c44f01000000 709.1KB 28 25123082 161710495 1891.0GB 2230.0GB
FRA_v001 00c0ff13349e00001fd8c44f01000000 793.0KB 34 122052720 245322281 3475.7GB 3410.0GB
-----------------------------------------------------------------------------------------------------------------------------------------------------
Success: Command completed successfully.
The output as printed to the terminal (as mentioned, the 3rd command won't execute in its current state):
show controller-statistics
Durable ID CPU Load Power On Time (Secs) Bytes per second
IOPS Number of Reads Number of Writes Data Read
Data Written
----------------------------------------------------------------------
controller_A 3 45962495 3803.1KB
73 386765821 514137947 6687.8GB
5748.9GB
controller_B 20 45962413 5000.7KB
415 3208317860 587434274 63.9TB
5208.8GB
----------------------------------------------------------------------
Success: Command completed successfully.
Sending show volume-statistics
show vdisk-statistics
Name Serial Number Bytes per second IOPS
Number of Reads Number of Writes Data Read Data Written
----------------------------------------------------------------------------
CRS 00c0ff13349e000006d5c44f00000000 0B 0
45861 26756 3233.0MB 106.2MB
DATA 00c0ff1311f300006dd7c44f00000000 2187.2KB 152
23220764 76411017 5506.3GB 1604.1GB
DATA1 00c0ff1311f3000087d8c44f00000000 2295.2KB 154
23481442 78215540 5518.5GB 1602.6GB
DATA2 00c0ff1311f30000c2f8ce5700000000 0B 0
26 4 1446.9KB 65.5KB
FRA 00c0ff13349e000001d8c44f00000000 1829.3KB 14
3049951 15310681 1187.3GB 1941.2GB
FRA1 00c0ff13349e000007d9c44f00000000 1872.8KB 14
3016521 15228157 1179.3GB 1939.5GB
----------------------------------------------------------------------------
Success: Command completed successfully.
Traceback (most recent call last):
File "./fetchSAN.py", line 34, in <module>
child.expect('# ')
File "/Library/Python/2.7/site-packages/pexpect-4.2.1-py2.7.egg/pexpect/spawnbase.py", line 321, in expect
timeout, searchwindowsize, async)
File "/Library/Python/2.7/site-packages/pexpect-4.2.1-py2.7.egg/pexpect/spawnbase.py", line 345, in expect_list
return exp.expect_loop(timeout)
File "/Library/Python/2.7/site-packages/pexpect-4.2.1-py2.7.egg/pexpect/expect.py", line 107, in expect_loop
return self.timeout(e)
File "/Library/Python/2.7/site-packages/pexpect-4.2.1-py2.7.egg/pexpect/expect.py", line 70, in timeout
raise TIMEOUT(msg)
pexpect.exceptions.TIMEOUT: Timeout exceeded.
<pexpect.pty_spawn.spawn object at 0x105333910>
command: /usr/bin/ssh
args: ['/usr/bin/ssh', 'manage#10.254.27.49']
buffer (last 100 chars): '-------------------------------------------------------------\r\nPress any key to continue (Q to quit)'
before (last 100 chars): '-------------------------------------------------------------\r\nPress any key to continue (Q to quit)'
after: <class 'pexpect.exceptions.TIMEOUT'>
match: None
match_index: None
exitstatus: None
flag_eof: False
pid: 19519
child_fd: 5
closed: False
timeout: 30
delimiter: <class 'pexpect.exceptions.EOF'>
logfile: <open file 'FetchSan.log', mode 'w+' at 0x1053321e0>
logfile_read: None
logfile_send: None
maxread: 2000
ignorecase: False
searchwindowsize: None
delaybeforesend: 0.05
delayafterclose: 0.1
delayafterterminate: 0.1
searcher: searcher_re:
0: re.compile("# ")
And here is what is captured in the log:
Password: mypassword
HP StorageWorks MSA Storage P2000 G3 FC
System Name: Uninitialized Name
System Location:Uninitialized Location
Version:TS230P008
# show controller-statistics
show controller-statistics
Durable ID CPU Load Power On Time (Secs) Bytes per second
IOPS Number of Reads Number of Writes Data Read
Data Written
----------------------------------------------------------------------
controller_A 3 45962495 3803.1KB
73 386765821 514137947 6687.8GB
5748.9GB
controller_B 20 45962413 5000.7KB
415 3208317860 587434274 63.9TB
5208.8GB
----------------------------------------------------------------------
Success: Command completed successfully.
# show vdisk-statistics
show vdisk-statistics
Name Serial Number Bytes per second IOPS
Number of Reads Number of Writes Data Read Data Written
----------------------------------------------------------------------------
CRS 00c0ff13349e000006d5c44f00000000 0B 0
45861 26756 3233.0MB 106.2MB
DATA 00c0ff1311f300006dd7c44f00000000 2187.2KB 152
23220764 76411017 5506.3GB 1604.1GB
DATA1 00c0ff1311f3000087d8c44f00000000 2295.2KB 154
23481442 78215540 5518.5GB 1602.6GB
DATA2 00c0ff1311f30000c2f8ce5700000000 0B 0
26 4 1446.9KB 65.5KB
FRA 00c0ff13349e000001d8c44f00000000 1829.3KB 14
3049951 15310681 1187.3GB 1941.2GB
FRA1 00c0ff13349e000007d9c44f00000000 1872.8KB 14
3016521 15228157 1179.3GB 1939.5GB
----------------------------------------------------------------------------
Success: Command completed successfully.
# show volume-statistics
show volume-statistics
Name Serial Number Bytes per second
IOPS Number of Reads Number of Writes Data Read
Data Written
----------------------------------------------------------------------
CRS_v001 00c0ff13349e0000fdd6c44f01000000 11.7KB
5 239609039 107145979 1321.0GB
110.5GB
DATA1_v001 00c0ff1311f30000d0d8c44f01000000 2604.5KB
209 1701459941 336563041 33.9TB
3183.3GB
DATA2_v001 00c0ff1311f3000040f9ce5701000000 0B
0 921 15 2273.7KB
2114.0KB
DATA_v001 00c0ff1311f30000bdd7c44f01000000 2382.8KB
194 1506859273 250871273 30.0TB
2025.4GB
FRA1_v001 00c0ff13349e00001ed9c44f01000000 1923.5KB
31 25123006 161690520 1891.0GB
2229.1GB
FRA_v001 00c0ff13349e00001fd8c44f01000000 2008.5KB
37 122050872 245301514 3475.7GB
3409.1GB
----------------------------------------------------------------------
Press any key to continue (Q to quit)%
As a starting point: According to the manual, that SAN has a command to disable the pager. See the documentation for set cli-parameters pager off. It may be sufficient to execute that command. It may also have a command to set the terminal rows and columns that it uses for formatting output, although I wasn't able to find one.
Getting to your question: When an ssh client connects to a server and requests an interactive session, it can optionally request a PTY (pseudo-tty) for the server side of the session. When it does that, it informs the server of the lines, columns, and terminal type which the server should use for the TTY. Your SAN may honor PTY requests and use the lines and columns values to format its output. Or it may not.
The ssh client gets the rows and columns for the PTY request from the TTY for its standard input. This is the PTY which pexpect is using to communicate with ssh.
this question discusses how to set the terminal size for a pexpect session. Ssh doesn't honor the LINES or COLUMNS environment variables as far as I can tell, so I doubt that would work. However, calling child.setwinsize() after spawning ssh ought to work:
child = pexpect.spawn(cmd)
child.setwinsize(400,400)
If you have trouble with this, you could try setting the terminal size by invoking stty locally before ssh:
child=pexpect.spawn('stty rows x cols y; ssh user#host')
Finally, you need to make sure that ssh actually requests a PTY for the session. It does this by default in some cases, which should include the way you are running it. But it has a command-line option -tt to force it to allocate a PTY. You could add that option to the ssh command line to make sure:
child=pexpect.spawn('ssh -tt user#host')
or
child=pexpect.spawn('stty rows x cols y; ssh -tt user#host')

How to create a DS to store the accumulated result of another DS with rrdtool

I want to create a rrd file with two data souces incouded. One stores the original value the data, name it 'dc'. The other stores the accumulated result of 'dc', name it 'total'. The expected formula is current(total) = previous(total) + current(dc). For example, If I update the data sequence (2, 3, 5, 4, 9) to the rrd file, I want 'dc' is (2, 3, 5, 4, 9) and 'total' is (2, 5, 15, 19, 28).
I tried to create the rrd file with the command line below. The command fails and says that the PREV are not supported with DS COMPUTE.
rrdtool create test.rrd --start 920804700 --step 300 \
DS:dc:GAUGE:600:0:U \
DS:total:COMPUTE:PREV,dc,ADDNAN \
RRA:AVERAGE:0.5:1:1200 \
RRA:MIN:0.5:12:2400 \
RRA:MAX:0.5:12:2400 \
RRA:AVERAGE:0.5:12:2400
Is there an alternative manner to define the DS 'total' (DS:total:COMPUTE:PREV,dc,ADDNAN) ?
rrdtool does not store 'original' values ... it rather samples to signal you provide via the update command at the rate you defined when you setup the database ... in your case 1/300 Hz
that said, a total does not make much sense ...
what you can do with a single DS though, is build the average value over a time range and multiply the result with the number of seconds in the time range and thus arrive at the 'total'.
Sorry a bit late but may be helpful for someone else.
Better to use RRDtool's ' mrtg-traffic-sum ' package which when I'm using an rrd with GAUGE DS & LAST a the RRA's so it's allowing me to collect monthly traffic volumes & quota limits.
eg: Here is a basic Traffic chart with no traffic quota.
root#server:~# /usr/bin/mrtg-traffic-sum --range=current --units=MB /etc/mrtg/R4.cfg
Subject: Traffic total for '/etc/mrtg/R4.cfg' (1.9) 2022/02
Start: Tue Feb 1 01:00:00 2022
End: Tue Mar 1 00:59:59 2022
Interface In+Out in MB
------------------------------------------------------------------------------
eth0 0
eth1 14026
eth2 5441
eth3 0
eth4 15374
switch0.5 12024
switch0.19 151
switch0.49 1
switch0.51 0
switch0.92 2116
root#server:~#
From this you can then write up a script that will generate a new rrd which stores these values & presto you have a traffic volume / quota graph.
Example fixed traffic volume chart using GAUGE
This thread reminded me to fix this collector that had stopped & just got around to posting ;)