I was streaming Kafka on AWS EC2 CentOS 7. My Session Manager Idle Timeout is set to 60min. And yet, after running for much less than that, the terminal got frozen, saying My session has been terminated. Of course, the Kafka streaming for disrupted as well.
When I tried to restart a new session with a new terminal, I got this error popup
Your session has been terminated for the following reasons: Plugin with name Standard_Stream not found. Step name: Standard_Stream
and I am still unable to restart a terminal.
What does this error mean and how to resolve it? Thanks.
So far you need to access the EC2 using SSH with key-pem to debug
(ask your admin)
Running tail -f got issue
tail: inotify resources exhausted
tail: inotify cannot be used, reverting to polling
Restart ssm-agent service also got issue No space left on device
but it's not about disk space
[root#env-test ec2-user]# systemctl restart amazon-ssm-agent.service
Error: No space left on device
[root#env-test ec2-user]# df -h |grep dev
devtmpfs 32G 0 32G 0% /dev
tmpfs 32G 0 32G 0% /dev/shm
/dev/nvme0n1p1 100G 82G 18G 83% /
So the error itself means that system is getting low on inotify
watches, that enable programs to monitor file/dirs changes. To see
the currently set limit (including output on my machine)
$ cat /proc/sys/fs/inotify/max_user_watches
8192
Check which processes using inotify to improve your apps or increase max_user_watches
for foo in /proc/*/fd/*; do readlink -f $foo; done | grep inotify | sort | uniq -c | sort -nr
5 /proc/1/fd/anon_inode:inotify
2 /proc/7126/fd/anon_inode:inotify
2 /proc/5130/fd/anon_inode:inotify
1 /proc/4497/fd/anon_inode:inotify
1 /proc/4437/fd/anon_inode:inotify
1 /proc/4151/fd/anon_inode:inotify
1 /proc/4147/fd/anon_inode:inotify
1 /proc/4028/fd/anon_inode:inotify
1 /proc/3913/fd/anon_inode:inotify
1 /proc/3841/fd/anon_inode:inotify
1 /proc/31146/fd/anon_inode:inotify
1 /proc/2829/fd/anon_inode:inotify
1 /proc/21259/fd/anon_inode:inotify
1 /proc/1934/fd/anon_inode:notify
Notice that the above inotify list include PID of ssm-agent
processes, it explains why we got issue with SSM when
max_user_watches reached limit
ps -ef | grep ssm-ag
root 3841 1 0 00:02 ? 00:00:05 /usr/bin/amazon-ssm-agent
root 4497 3841 0 00:02 ? 00:00:33 /usr/bin/ssm-agent-worker
Final Solution: Permanent solution (preserved across restarts)
echo "fs.inotify.max_user_watches=1048576" >> /etc/sysctl.conf sysctl -p
Verify:
$ aws ssm start-session --target i-123abc456efd789xx --region ap-northeast-2
Starting session with SessionId: userdev-03ccb1a04a6345bf5
sh-4.2$
This issue comes from EC2 instance not about SSM agent Go to link to
undestanding SSM agent.
optional link
In my case, extend the disk space works!
(syslog full of my case)
In my case too extending the disk space worked as my /var/logs was huge.
I use a squid.service file such as:
[Service]
Type=forking
ExecStart=/usr/local/squid/bin/squid start
ExecReload=/usr/local/squid/bin/squid reload
ExecStop=/usr/local/squid/bin/squid stop
KillMode=none
'/usr/local/squid /bin/squid' is a shell script with stop function that check squid.conf first before stopping squid.
stop) {
$SQUID -k check >/dev/null 2>&1
RETVAL=$?
if [ $RETVAL -eq 0 ] ; then
do shutdown
else
echo "squid.conf is error"
fi
return $RETVAL
}
we do not stop squid processes directly because of the worry that when squid.conf is correct at first,squid's processes will be running correctly,then we restart squid after modifying the squid.conf with wrong item, if we stop the squid's processes successfully ,starting squid will be failed because of wrong conf.Then the service will be unavailable. But if we check squid.conf first,the processes will not be stopped,so the service is available still.
But in centos7 system,when I do such steps ,something will be wrong:
1. systemctl start squid with correct squid.conf , systemctl status squid will be active running;
2. modify squid.conf with wrong, type in 'systemctl stop squid', 'systemctl status squid' will show failed because of stop script checking conf and exiting with code 1;
3. I modify squid.conf correctly again, when I type in 'systemctl stop squid',the stop script will not be invoked at all(I have a test with echoing in stop function,but noting is echoed ). "ExecStop=/usr/local/squid /bin/squid stop" ;
so I cannot restart squid if then, how can i enable 'systemctl stop squid' call '/usr/local/squid /bin/squid stop' again in step 3?
I'm writing an expect script to start an SSH tunnel.
It gets run on EC2 when the instance starts, as part of the deployment which creates the script from a .ebextensions config file.
When the script is run, it always gets stuck at this point:
Enter passphrase for key '/home/ec2-user/id_data_app_rsa':
If I run the same script manually on the server it succeeds and i can see the tunnel process running.
ps aux | grep ssh
root 19046 0.0 0.0 73660 1068 ? Ss 16:58 0:00 ssh -i /home/ec2-user/id_data_app_rsa -p222 -vfN -L 3306:X.X.X.X:3306 root#X.X.X.X
I can verify that the script is reading the SSH_PASSPHRASE correctly by printing it to the console.
set password $::env(SSH_PASSPHRASE)
send_user "retrieved env variable : $password "
This is the debug output I get from the EC2 logs:
Enter passphrase for key '/home/ec2-user/id_data_app_rsa':
interact: received eof from spawn_id exp0
I'm baffled as to why it's getting no further here when the EC2 deployer runs, but it continues normally when run manually.
This is the script in .ebextensions, the script itself starts at #!/usr/bin/expect:
files:
"/scripts/createTunnel.sh" :
mode: "000755"
owner: root
group: root
content: |
#!/usr/bin/expect
exp_internal 1
set timeout 60
# set variables
set password $::env(SSH_PASSPHRASE)
send_user "retrieved env variable : $password "
spawn -ignore HUP ssh -i /home/ec2-user/id_data_app_rsa -p222 -vfN -L 3306:X.X.X.X:3306 root#X.X.X.X
expect {
"(yes/no)?" { send "yes\n" }
-re "(.*)assphrase" { sleep 1; send -- "$password\n" }
-re "(.*)data_app_rsa" { sleep 1; send -- "$password\n" }
-re "(.*)assword:" { sleep 1; send -- "$password\n" }
timeout { send_user "un-able to login: timeout\n"; return }
"denied" { send_user "\nFatal Error: denied \n"}
eof { send_user "Closed\n" ; return }
}
interact
We finally resolved this. There were two things that seemed to be at issue:
Changing the final interact to expect eof.
Trimming down the
expect pattern matching as much as possible.
We noticed in testing that expect seemed to be matching falsely, sending a password, for example, when it should have been sending a 'yes' matching on the 'yes/no' prompt.
This is the final script we ended up with in case it's useful to anyone else:
#!/usr/bin/expect
exp_internal 1
set timeout 60
# set variables
set password $::env(SSH_TUNNEL_PASSPHRASE)
spawn -ignore HUP ssh -i /home/ec2-user/id_data_rsa -p222 -vfN -L 3306:X.X.X.X:3306 root#X.X.X.X
expect {
"(yes/no)?" { send "yes\r" }
"Enter passphrase" { sleep 2; send -- "$password\r"; sleep 2; exit }
}
expect eof
Your problem is here:
set password $::env(SSH_PASSPHRASE)
and the way shell works with environment variables. When the script is invoked, you assume your environment variables are set. Depending on how the script is invoked, $::env(SSH_PASSPHRASE) may not be set, resulting in the variable to be null/blank. When init scripts (or cloud-init) are run, they are not run with the environment of a login shell. So you should not assume that .profile or /etc/profile environment variables are set, but rather source or set them explicitly.
A possible solution may be
. ~ec2-user/.profile /path/to/above.script
I'm abit stuck with this backup script I have been writing. The goal of the script is to:
Wake-up a sleeping PC on my LAN
Run Microsofts Synctoy(cmd version) to sync all of the paired folders that I have setup and output results to log file.
If there is an error, it should write it to the log file and then send an email to me via the mailsend.exe.
The batch file is set to run every night with Windows 7's Scheduled tasks.
Contents of batch file:
#ECHO OFF
SET /a RETRY=0
SET /a RETRIES=5
SET MAC=000c76******
SET IP=192.168.0.8
SET SUBNET=255.255.255.0
SET PORT=7
ECHO %date% - %time% - Started sync.
:CHECK
PING -n 1 %IP% | find "bytes=">NUL
IF %ERRORLEVEL%==0 (
GOTO SYNC
)
IF %ERRORLEVEL%==1 (
GOTO WAKE
)
:WAKE
SET /a RETRY=%RETRY%+1
IF %RETRY% GEQ 6 (
SET ERR_VAL=RETRY
GOTO ERROR
)
ECHO Waking up \\NAS Attempt %RETRY%\5...
START C:\sync\wolcmd.exe %MAC% %IP% %SUBNET% %PORT%
timeout /T 30 /NOBREAK>NUL
GOTO CHECK
:SYNC
ECHO SyncToy is running...
"C:\Program Files\SyncToy 2.1\SyncToyCmd.exe" -R>C:\sync\synctoy_log.txt
IF %ERRORLEVEL% == 0 (
ECHO %date% - %time% - Success: Sync completed.>>C:\sync\synctoy_error_log.txt
GOTO END
) ELSE (
SET ERR_LEV=%ERRORLEVEL%
SET ERR_VAL=SYNC
GOTO ERROR
)
:ERROR
IF %ERR_VAL%==RETRY (
ECHO Error: Failed to sync, retries exceeded.
ECHO %date% - %time% - Error: Failed to sync, retries exceeded.>>C:\sync\synctoy_error_log.txt
)
IF %ERR_VAL%==SYNC (
ECHO Error: SyncToy error (%ERR_LEV%).
ECHO %date% - %time% - Error: SyncToy error (%ERR_LEV%).>>C:\sync\synctoy_error_log.txt
)
START C:\sync\mailsend.exe -to example.email#googlemail.com -from example.email#gmail.com -ssl -attach synctoy_error_log.txt,text/plain,i -smtp smtp.googlemail.com -port 465 -sub SyncToy_log +cc +bc -v -auth-login -user example.email#gmail.com -pass examplepass
GOTO END
:END
EXIT
Contents of synctoy_error_log.txt
18/02/2013 - 6:02:16.40 - Success: Sync completed.
20/02/2013 - 6:05:25.71 - Success: Sync completed.
21/02/2013 - 6:07:14.27 - Success: Sync completed.
22/02/2013 - 6:02:56.34 - Success: Sync completed.
24/02/2013 - 6:01:49.97 - Success: Sync completed.
25/02/2013 - 6:01:35.14 - Success: Sync completed.
As you can see there has been a couple of days where the log was not written. The PC with the scheduled task running and the PC I want to wake up should have been accesible at this time.
Is there anything that I am doing wrong here in my error checking or something?
I don't get an email saying there was a problem either, but if I disconnect the sleeping PC from the LAN and force the script to start, I do get an email saying it couldn't wake it up.
Thanks for any advice you can give me it's greatly appriciated. I know this isn't the most efficient script, but I've been trying to pick up everything I can to get by.
under what account does the script run as scheduled task? normally the nt account system is not able to use any network shares, try running under a user account which is allowed to log on as a batch task in secpol.
I'm testing out using memcached to cache django views. How can I tell if memcached is actually caching anything from the Linux command line?
You could use the official perl script:
memcached-tool 127.0.0.1:11211 stats
Or just use telnet and the stats command e.g.:
# telnet localhost [memcacheport]
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
stats
STAT pid 2239
STAT uptime 10228704
STAT time 1236714928
STAT version 1.2.3
STAT pointer_size 32
STAT rusage_user 2781.185813
STAT rusage_system 2187.764726
STAT curr_items 598669
STAT total_items 31363235
STAT bytes 37540884
STAT curr_connections 131
STAT total_connections 8666
STAT connection_structures 267
STAT cmd_get 27
STAT cmd_set 30694598
STAT get_hits 16
STAT get_misses 11
STAT evictions 0
STAT bytes_read 2346004016
STAT bytes_written 388732988
STAT limit_maxbytes 268435456
STAT threads 4
END
I know this question is old, but here is another useful approach for testing memcached with django:
As #Jacob mentioned, you can start memcached in very verbose mode (not as a daemon):
memcached -vv
To test your django cache config, you can use the low-level cache api.
First, start up the python interpreter and load your django project settings:
python manage.py shell
From the shell, you can use the low-level cache api to test your memcache server:
from django.core.cache import cache
cache.set('test', 'test value')
If your cache configuration is correct, you should see output in memcache similar to this:
<32 set :1:test 0 300 10
>32 STORED
Start memcache not as a daemon but normal, so just run memcached -vv for very verbose. You will see when get's and sets come in to the memcache server.
Simple way to test for memcache working was to sneak in a commented out timestamp on every page served up. If the timestamp stayed the same on multiple requests to a page, then the page was being cached by memcache.
In Django settings, I also setup the cache mechanism to use a file cache on the filesystem (really slow), but after hitting up the pages I could see that there were actual cache files being placed in the file path so I could confirm caching was active in Django.
I used both these steps to work out my caching problem. I actually did not have caching turned on correctly in Django. The newer method to activate caching is using the 'django.middleware.cache.CacheMiddleware' middleware (not the middleware with two middleware pieces that have to be the first/last middleware settings.)
From the command line, try the command below:
echo stats | nc 127.0.0.1 11211
If it doesn't return anything, memcache isn't running. Otherwise it should return a bunch of stats including uptime (and hit and miss counts)
The reference article is here,
https://www.percona.com/blog/2008/11/26/a-quick-way-to-get-memcached-status/
To see changes every 2 seconds:
watch "echo stats | nc 127.0.0.1 11211"
In Bash, you can check the statistics of memcache by this command:
exec 3<>/dev/tcp/localhost/11211; printf "stats\nquit\n" >&3; cat <&3
To flush the cache, use memflush command:
echo flush_all >/dev/tcp/localhost/11211
and check if the stats increased.
To dump all the cached objects, use memdump or memcdump command (part of memcached/libmemcached package):
memcdump --servers=localhost:11211
or:
memdump --servers=localhost:11211
If you're using PHP, to see whether is supported, check by: php -i | grep memcached.
Tracing
To check what memcached process is exactly processing, you can use network sniffers or debuggers (e.g. strace on Linux or dtrace/dtruss on Unix/OS X) for that. Check some examples below.
Strace
sudo strace -e read,write -fp $(pgrep memcached)
To format output in a better way, check: How to parse strace in shell into plain text?
Dtruss
Dtruss is a dtrace wrapper which is available on Unix systems. Run it as:
sudo dtruss -t read -fp $(pgrep memcached)
Tcpdump
sudo tcpdump -i lo0 -s1500 -w- -ln port 11211 | strings -10
Memcached can actually write to a logfile on its own, without having to resort to restarting it manually. The /etc/init.d/memcached init script (/usr/lib/systemd/system/memcached.service on EL7+; ugh) can call memcached with the options specified in /etc/memcached.conf (or /etc/sysconfig/memcached on EL5+). Among these options are verbosity and log file path.
In short, you just need to add (or uncomment) these two lines to that conf/sysconfig file...
-vv
logfile /path/to/log
...and restart the daemon with service memcached restart(EL3-7) or /etc/init.d/memcached restart(debuntus)
And then you can monitor this log in the traditional way, like tail -f /path/to/log, for example.
For extend Node's response, you can use socat UNIX-CONNECT:/var/run/memcached.sock STDIN to debug a unix socket.
Example:
$ socat UNIX-CONNECT:/var/run/memcached.sock STDIN
stats
STAT pid 931
STAT uptime 10
STAT time 1378574384
STAT version 1.4.13
STAT libevent 2.0.19-stable
STAT pointer_size 32
STAT rusage_user 0.000000
STAT rusage_system 0.015625
STAT curr_connections 1
STAT total_connections 2
STAT connection_structures 2
You can test memcached or any server by below script
lsof -i :11211 | grep 'LISTEN'>/dev/null 2>/dev/null;echo $?
if it returns 0 then the server is actually running or if 1 its not so if you want to know that the server is actually running on some port use the following script
lsof -i :11211 | grep 'LISTEN'>/dev/null 2>/dev/null;
if [ $? -eq 0]; then
echo "Your memcache server is running"
else
echo "No its not running"
fi
Can you use curl to fetch a page a few hundred times and time the results? You could also look at running a process on the server that simulates heavy CPU/disk load while doing this.
I wrote an expect script is-memcached-running that tests if memcached is running on a host/port combination (run as is-memcached-running localhost 11211):
#! /usr/bin/env expect
set timeout 1
set ip [lindex $argv 0]
set port [lindex $argv 1]
spawn telnet $ip $port
expect "Escape character is '^]'."
send stats\r
expect "END"
send quit\r
expect eof
If you run your system from a Makefile rule, you could make your startup depend on a make target that asserts it is up and running (or helps you get that state). It is verbose when the check fails to make it easy for us to debug failed ci runs, installs memcached when it's missing, and is brief and to the point otherwise:
#! /bin/bash
if [[ "$(type -P memcached)" ]]; then
echo 'memcached installed; checking if it is running'
memcached_debug=`mktemp memcache-check.XXXXX`
if is-memcached-running localhost 11211 >$memcached_debug 2>&1; then
echo 'Yep; memcached online'
else
cat $memcached_debug
echo
echo '****** Error: memcached is not running! ******'
if [[ "$OSTYPE" =~ ^darwin ]]; then
echo
echo 'Instructions to auto-spawn on login (or just start now) are shown'
echo 'at the end of a "brew install memcached" run (try now, if you did'
echo 'not do so already) or, if you did, after a "brew info memcached".'
echo
fi
exit 1
fi
rm -f $memcached_debug
else
echo memcached was not found on your system.
if [[ "$OSTYPE" =~ ^darwin ]]; then
brew install memcached
elif [[ "$OSTYPE" =~ ^linux ]]; then
sudo apt-get install memcached
else
exit 1
fi
fi
Following Aryashree post, this helped me to get an error if memcached not running locally:
import subprocess
port = 11211
res = subprocess.Popen(f"echo stats | nc 127.0.0.1 {port}",
shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
if res.stdout:
lines = res.stdout.read()
lineArr = lines.split('\r\n')
pidlineArr = lineArr[0].split(' ')
pid = pidlineArr[-1]
print(f"[MemCached] pid {pid} Running on port {port}")
else:
raise RuntimeError(f"No Memcached is present on port {port}")
I'm using Mezzanine and the only answer that worked for me was Jacobs answer. So stopping the daemon and running memcached -vv
If you're using RHEL or Centos 8
To get memcached log stuff to /var/log/messages (quick without rotation)
https://serverfault.com/questions/208538/how-to-specify-the-log-file-for-memcached-on-rhel-centos/1054741#1054741