Redis keys deletion (aws elastic cache) - amazon-web-services

I have used the below script to delete the Keys from my redis node ( using AWS elastic cache service ) , bytes used for cache metrics dropped from 100 GB to 80 GB which is fine since we deleted around 1,60,000 keys. In a few minutes, bytes used for cache increased rapidly and hit the maximum (106GB). Is this because of the delete operation ?
any fault with the script?
In addition to above after reaching the 106 GB, in few minutes drastically it reduced again to 80 GB and stabilized
count=0
while read -r delkeys
do
((count=count+1))
echo "KEYNAME:$delkeys"
redis-cli -h $REDIS_HOST -p $REDIS_PORT DEL "$delkeys"
if [[ $count == 1000 ]]
then
sleep 5
count=0
fi
done < filename
Engine Version : 2.8.21
Engine : Redis
In addition to the above part, on previous day I have taken values for all the 1,60,000 keys by using LRANGE "$dumpkeys" 0 -1 in the above script but havent faced any performance issue like cpu or high RAM utilisation

Related

in AWS clustered redis, Engine CPU utilisation of 1 shard spike upto 100% while load test

i have a clustered Redis in AWS with 3 shards and no replica, while doing the load test, Engine CPU utilization of 1 shard spiked upto 100% while Engine CPU utilization of the other 2 shards is 4% and 5% respectively. on running Monitor command got the below line very frequently while the count of my GET command is 9318186 while grep "evalsha" | wc -l return count of 18479850
"evalsha" "0f0f23b9048b36752b5be114f35ish083449f908" "1" "undefined" "4db174ac2c7ah6386687727bccdad00f" 16835506623.1783958 [0 lua] "get" "undefined" i got this very frequently in the 1st shard
while in another shard the count of my GET command is 78527 and grep "evalsha" | wc -l return count of 12889
i'm using npm ioredis to connect to all my shards tried like below
const cluster = new Redis.Cluster([ { port: 6380, host: "127.0.0.1", }, { port: 6381, host: "127.0.0.1", }, ]); and redlock to accuire lock on keys

AWS EC2 terminal session terminated with "Plugin with name Standard_Stream not found"

I was streaming Kafka on AWS EC2 CentOS 7. My Session Manager Idle Timeout is set to 60min. And yet, after running for much less than that, the terminal got frozen, saying My session has been terminated. Of course, the Kafka streaming for disrupted as well.
When I tried to restart a new session with a new terminal, I got this error popup
Your session has been terminated for the following reasons: Plugin with name Standard_Stream not found. Step name: Standard_Stream
and I am still unable to restart a terminal.
What does this error mean and how to resolve it? Thanks.
So far you need to access the EC2 using SSH with key-pem to debug
(ask your admin)
Running tail -f got issue
tail: inotify resources exhausted
tail: inotify cannot be used, reverting to polling
Restart ssm-agent service also got issue No space left on device
but it's not about disk space
[root#env-test ec2-user]# systemctl restart amazon-ssm-agent.service
Error: No space left on device
[root#env-test ec2-user]# df -h |grep dev
devtmpfs 32G 0 32G 0% /dev
tmpfs 32G 0 32G 0% /dev/shm
/dev/nvme0n1p1 100G 82G 18G 83% /
So the error itself means that system is getting low on inotify
watches, that enable programs to monitor file/dirs changes. To see
the currently set limit (including output on my machine)
$ cat /proc/sys/fs/inotify/max_user_watches
8192
Check which processes using inotify to improve your apps or increase max_user_watches
for foo in /proc/*/fd/*; do readlink -f $foo; done | grep inotify | sort | uniq -c | sort -nr
5 /proc/1/fd/anon_inode:inotify
2 /proc/7126/fd/anon_inode:inotify
2 /proc/5130/fd/anon_inode:inotify
1 /proc/4497/fd/anon_inode:inotify
1 /proc/4437/fd/anon_inode:inotify
1 /proc/4151/fd/anon_inode:inotify
1 /proc/4147/fd/anon_inode:inotify
1 /proc/4028/fd/anon_inode:inotify
1 /proc/3913/fd/anon_inode:inotify
1 /proc/3841/fd/anon_inode:inotify
1 /proc/31146/fd/anon_inode:inotify
1 /proc/2829/fd/anon_inode:inotify
1 /proc/21259/fd/anon_inode:inotify
1 /proc/1934/fd/anon_inode:notify
Notice that the above inotify list include PID of ssm-agent
processes, it explains why we got issue with SSM when
max_user_watches reached limit
ps -ef | grep ssm-ag
root 3841 1 0 00:02 ? 00:00:05 /usr/bin/amazon-ssm-agent
root 4497 3841 0 00:02 ? 00:00:33 /usr/bin/ssm-agent-worker
Final Solution: Permanent solution (preserved across restarts)
echo "fs.inotify.max_user_watches=1048576" >> /etc/sysctl.conf sysctl -p
Verify:
$ aws ssm start-session --target i-123abc456efd789xx --region ap-northeast-2
Starting session with SessionId: userdev-03ccb1a04a6345bf5
sh-4.2$
This issue comes from EC2 instance not about SSM agent Go to link to
undestanding SSM agent.
optional link
In my case, extend the disk space works!
(syslog full of my case)
In my case too extending the disk space worked as my /var/logs was huge.

AWS DOCKER dm.basesize in /etc/sysconfig/docker doesn't work

I want to change dm.basesize in my containers .
These are the size of containers to 20GB
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
xvda 202:0 0 8G 0 disk
`-xvda1 202:1 0 8G 0 part /
xvdf 202:80 0 8G 0 disk
xvdg 202:96 0 8G 0 disk
I have a sh
#cloud-boothook
#!/bin/bash
cloud-init-per once docker_options echo 'OPTIONS="${OPTIONS} --storage-opt dm.basesize=20G"' >> /etc/sysconfig/docker
~
I executed this script
I stopped the docker service
[ec2-user#ip-172-31-41-55 ~]$ sudo service docker stop
Redirecting to /bin/systemctl stop docker.service
[ec2-user#ip-172-31-41-55 ~]$
I started docker service
[ec2-user#ip-172-31-41-55 ~]$ sudo service docker start
Redirecting to /bin/systemctl start docker.service
[ec2-user#ip-172-31-41-55 ~]$
But the container size doesn't change.
This is /etc/sysconfig/docker file
#The max number of open files for the daemon itself, and all
# running containers. The default value of 1048576 mirrors the value
# used by the systemd service unit.
DAEMON_MAXFILES=1048576
# Additional startup options for the Docker daemon, for example:
# OPTIONS="--ip-forward=true --iptables=true"
# By default we limit the number of open files per container
OPTIONS="--default-ulimit nofile=1024:4096"
# How many seconds the sysvinit script waits for the pidfile to appear
# when starting the daemon.
DAEMON_PIDFILE_TIMEOUT=10
I read in the aws documentation that I can to execute scripts in the aws instance when I start it . I don't want to restart my aws instance because I lost my data.
Is there a way to update my container size without restart the aws instance?
In the aws documentation I don't find how to set a script when I launch the aws instance.
I follow the tutorial
https://docs.aws.amazon.com/AmazonECS/latest/developerguide/launch_container_instance.html
I don't find a example how to set a script when I launch the aws instance.
UPDATED
I configured the file
/etc/docker/daemon.json
{
"storage-driver": "devicemapper",
"storage-opts": [
"dm.directlvm_device=/dev/xdf",
"dm.thinp_percent=95",
"dm.thinp_metapercent=1",
"dm.thinp_autoextend_threshold=80",
"dm.thinp_autoextend_percent=20",
"dm.directlvm_device_force=false"
]
}
When I start docker, I get
Error starting daemon: error initializing graphdriver: /dev/xdf is not available for use with devicemapper
How can I configure the parameter
dm.directlvm_device=/dev/xdf

intermittent issues with ClamAV clamd INSTREAM on socket

I've got an AWS Lambda function running NodeJS code to stream files from S3 to ClamAV running on an EC2 instance.
Generally (about 75% of the time) the system works, but often (especially when multiple files are being scanned from different Lambda containers) clamd threads gets stuck on INSTREAM.
Once a thread has been in INSTREAM for 25-30 seconds it does not seem to be able to recover. When it has been QUEUEDSINCE 350 seconds it is killed off. I can't figure out how either of these numbers relate to any value in my config.
I'm struggling to find any sign of an error in the logs - the number of INSTREAM requests matches the number of complete scans:
$ sudo grep -c "got command INSTREAM" /var/log/clamav/clamav.log
129
$ sudo grep -c "Chunks complete" /var/log/clamav/clamav.log
129
$ sudo grep -c "Scanthread: connection shut down" /var/log/clamav/clamav.log
129
...okay, now that I look a little more deeply into the logs it just takes a lot longer for some to be scanned. When I do a batch of 16 files, with Lambda concurrency restricted to 7 the first 7 files are scanned within a few seconds. The next file begins scanning soon after, gets to "Chunks complete" within a second, but takes 23 seconds before "Scanthread: connection shutdown". From here on it just gets worse - 1:24, 1:45... and then the 3rd batch of 7 files take over 3 minutes to scan.
If I give the system a few minutes to settle down, all the threads to die off, the same files that took over 3 minutes now take about 5-7 seconds.
If I run the same test on a faster machine the performance improves, but the issue is still there:
When threads get stuck at INSTREAM I can see that the files are still there:
$ ls -al /tmp
drwx------ 2 clamav clamav 4096 Aug 29 16:52 clamav-493bdf893ce4d8d7763c00fee22d9d69.tmp
-rwx------ 1 clamav clamav 25683921 Aug 29 16:52 clamav-5cdefd83d5531a03c7cf22fda37d133f.tmp
https://github.com/yongtang/clamav.js/issues/6
https://github.com/yongtang/clamav.js/issues/7
https://bugzilla.clamav.net/show_bug.cgi?id=12181

Resizing the default 10GB boot drive Google Cloud Platform

How do I increase the default 10GB boot drive when I create an instance on the Google Cloud Platform? I've read through different answers regarding this with nothing super clear. I'm sort of a beginner to the platform and I'd really appreciate it if someone could tell me how to do this in simple terms.
Use the following steps to increase the boot size with CentOS on the Google Cloud Platform.
ssh into vm instance
[user#user-srv ~]$ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda1 9.9G 898M 8.5G 10% /
tmpfs 296M 0 296M 0% /dev/shm
[user#user-srv ~]$ sudo fdisk /dev/sda
The device presents a logical sector size that is smaller than
the physical sector size. Aligning to a physical sector (or optimal
I/O) size boundary is recommended, or performance may be impacted.
WARNING: DOS-compatible mode is deprecated. It's strongly recommended to
switch off the mode (command 'c') and change display units to
sectors (command 'u').
Command (m for help): p
Disk /dev/sda: 53.7 GB, 53687091200 bytes
4 heads, 32 sectors/track, 819200 cylinders
Units = cylinders of 128 * 512 = 65536 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk identifier: 0x0004a990
Device Boot Start End Blocks Id System
/dev/sda1 17 163825 10483712+ 83 Linux
Command (m for help): c
DOS Compatibility flag is not set
Command (m for help): u
Changing display/entry units to sectors
Command (m for help): p
Disk /dev/sda: 53.7 GB, 53687091200 bytes
4 heads, 32 sectors/track, 819200 cylinders, total 104857600 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk identifier: 0x0004a990
Device Boot Start End Blocks Id System
/dev/sda1 2048 20969472 10483712+ 83 Linux
Command (m for help): p
Disk /dev/sda: 53.7 GB, 53687091200 bytes
4 heads, 32 sectors/track, 819200 cylinders, total 104857600 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk identifier: 0x0004a990
Device Boot Start End Blocks Id System
/dev/sda1 2048 20969472 10483712+ 83 Linux
Command (m for help): n
Command action
e extended
p primary partition (1-4)
p
Partition number (1-4): 1
Partition 1 is already defined. Delete it before re-adding it.
Command (m for help): d
Selected partition 1
Command (m for help): n
Command action
e extended
p primary partition (1-4)
p
Partition number (1-4): 1
First sector (2048-104857599, default 2048):
Using default value 2048
Last sector, +sectors or +size{K,M,G} (2048-104857599, default 104857599):
Using default value 104857599
Command (m for help): w
The partition table has been altered!
Calling ioctl() to re-read partition table.
WARNING: Re-reading the partition table failed with error 16: Device or resource busy.
The kernel still uses the old table. The new table will be used at
the next reboot or after you run partprobe(8) or kpartx(8)
Syncing disks.
[user#user-srv ~]$ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda1 9.9G 898M 8.5G 10% /
tmpfs 296M 0 296M 0% /dev/shm
[user#user-srv ~]$ sudo reboot
Broadcast message from user#user-srv
(/dev/pts/0) at 3:48 ...
The system is going down for reboot NOW!
[user#user-srv ~]$ Connection to 23.251.144.204 closed by remote host.
Connection to 23.251.144.204 closed.
Robetus-Mac:~ tomassiro$ gcutil listinstances --project="project-name"
+-------+---------------+---------+----------------+----------------+
| name | zone | status | network-ip | external-ip |
+-------+---------------+---------+----------------+----------------+
| srv-1 | us-central1-a | RUNNING | 10.230.224.112 | 107.168.216.20 |
+-------+---------------+---------+----------------+----------------+
ssh into vm instance
[user#user-srv ~]$ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda1 9.9G 898M 8.5G 10% /
tmpfs 296M 0 296M 0% /dev/shm
[user#user-srv ~]$ sudo resize2fs /dev/sda1
resize2fs 1.41.12 (17-May-2010)
Filesystem at /dev/sda1 is mounted on /; on-line resizing required
old desc_blocks = 1, new_desc_blocks = 4
Performing an on-line resize of /dev/sda1 to 13106944 (4k) blocks.
The filesystem on /dev/sda1 is now 13106944 blocks long.
[user#user-srv ~]$ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda1 50G 908M 46G 2% /
tmpfs 296M 0 296M 0% /dev/shm
[user#user-srv ~]$ exit
logout
Connection to 23.251.144.204 closed.
The steps are easy:
Create a new disk from an existing source image with bigger size
Create a new instance using the disk you just created (select existing disk)
After the system boot up, using the command "df -h", you can see the storage is still 9.9GB.
Follow the steps (from steps 4-12) in the "Repartitioning a root persistent disk" section in https://developers.google.com/compute/docs/disks
Finished!!
Without reboot or restarts increase boot size in GCP cloud VM or Google cloud engine
Check first disk usage is more than 80% df -h of /dev/sda1 if more than 80% it's dangerous.
Update disk size on the fly for VM without restart
Increase disk size from console first
SSH inside VM : sudo growpart /dev/sda 1
Resize your file system : sudo resize2fs /dev/sda1
Verify : df -h
A safer method than editing the partition directly and which doesn't require maintaining your own images, is dracut's growroot module & cloud-init.
I've used this with CentOS 6 & 7 on Google Compute, AWS & Azure.
## you'll need to be root or use sudo
yum -y install epel-release
yum -y install cloud-init cloud-initramfs-tools dracut-modules-growroot cloud-utils-growpart
rpm -qa kernel | sed -e 's/^kernel-//' | xargs -I {} dracut -f /boot/initramfs-{}.img {}
# reboot for the resize to take affect
The partition will be resized automatically during the next boot.
Notes:
This is built into Ubuntu, which is why you don't see the problem there.
The partition size problem is seen with RedHat & CentOS with most pre-built images, not only Google Cloud. This method should work anywhere.
Note that you cannot umount the /dev/sda1 because it's running your OS. But you can create another partition by following:
See available space.
sudo cfdisk
Move with arrows and select Free space , then:
Click enter and a new partition will be created
Write changes on disk
Quit
Format partition (replace sdb1 with yours):
sudo mkfs -t ext4 /dev/sdb1
Check changes: lsblk -f
Mount new partition
sudo mount /dev/sda3 /mnt