How to take data backup using cassandra-snapshotter? - amazon-web-services

I have to take the data backup of my cassandra nodes and upload it to amazon AWS s3. When I execute the following command,
cassandra-snapshotter --aws-access-key-id=**** --aws-secret-access-key=**** --s3-bucket-name=inblox-exp-buck --s3-bucket-region=ap-southeast-2 --s3-base-path=test1 backup --hosts=52.64.45.152,52.64.28.145 --user=ubuntu
I get the following error,
[52.64.45.152] Executing task 'node_start_backup'
[52.64.28.145] Executing task 'node_start_backup'
Fatal error: Needed to prompt for a connection or sudo password (host: 52.64.28.145), but input would be ambiguous in parallel mode
Aborting.
Needed to prompt for a connection or sudo password (host: 52.64.28.145), but input would be ambiguous in parallel mode
Fatal error: Needed to prompt for a connection or sudo password (host: 52.64.45.152), but input would be ambiguous in parallel mode
Aborting.
Needed to prompt for a connection or sudo password (host: 52.64.45.152), but input would be ambiguous in parallel mode
Fatal error: One or more hosts failed while executing task 'node_start_backup'
Aborting.
[52.64.45.152] Executing task 'clear_node_snapshot'
[52.64.28.145] Executing task 'clear_node_snapshot'
[52.64.28.145] sudo: /usr/bin/nodetool clearsnapshot -t "20150416144918"
[52.64.45.152] sudo: /usr/bin/nodetool clearsnapshot -t "20150416144918"
Fatal error: Needed to prompt for a connection or sudo password (host: 52.64.28.145), but input would be ambiguous in parallel mode
Aborting.
Needed to prompt for a connection or sudo password (host: 52.64.28.145), but input would be ambiguous in parallel mode
Fatal error: Needed to prompt for a connection or sudo password (host: 52.64.45.152), but input would be ambiguous in parallel mode
Aborting.
Needed to prompt for a connection or sudo password (host: 52.64.45.152), but input would be ambiguous in parallel mode
Fatal error: One or more hosts failed while executing task 'clear_node_snapshot'
Aborting.
One or more hosts failed while executing task 'clear_node_snapshot'
What is happening here? How do I fix this problem ?

Cassandra-snapshotter does ssh to the host, so make sure you have your 'ubuntu' user's rsa-pub key listed in .ssh/authorized-keys file. Optionally, you can turn-off ssh option in code.

Only if the password is required to access the hosts,
same problem I found and I solved it by passing password as well
for help:
$cassandra-snapshotter backup -h
in your command should be like
cassandra-snapshotter --aws-access-key-id=**** --aws-secret-access-key=**** --s3-bucket-name=inblox-exp-buck --s3-bucket-region=ap-southeast-2 --s3-base-path=test1 backup --hosts=xx.xx.xx.xx,xx.xx.xx.xx --user=ubuntu --password=*****

I am able to take backup.
Setup
3 Node cluster
Separate machine that runs backup command
All are aws ec2 machines.
I have used below command.
cassandra-snapshotter --s3-bucket-name=BUCKET_NAME \
--s3-bucket-region=us-east-1 \
--s3-base-path=CLUSTER_BACKUP \
--aws-access-key-id=KEY \
--aws-secret-access-key=SECRET \
backup \
--hosts=PUBLIC_IP_1,PUBLIC_IP_2,PUBLIC_IP_3 \
--sshkey=YOUR_PEM_FILE.pem

Related

Cannot use vscode-cpptools in docker without root permission

I defined a simple docker here : https://github.com/htool-ddm/htool_testing_environments/blob/master/ubuntu/Dockerfile where I defined a user with
ARG USER=mpi
ENV USER ${USER}
ENV USER_HOME /home/${USER}
RUN useradd -s /bin/bash --user-group --system --create-home --no-log-init ${USER}
and when I used this image as a devcontainer with
"image": "pierremarchand/htool_testing_environments:ubuntu_gcc_openmpi",
"workspaceFolder": "/home/mpi",
"workspaceMount": "source=${localWorkspaceFolder},target=/home/mpi/,type=bind",
I get the following error:
Error - 4:48:52 PM] cpptools client: couldn't create connection to server.
Launching server using command /home/mpi/.vscode-server/extensions/ms-vscode.cpptools-1.13.9-linux-x64/bin/cpptools failed. Error: spawn /home/mpi/.vscode-server/extensions/ms-vscode.cpptools-1.13.9-linux-x64/bin/cpptools EACCES
I guess this is an issue with permission because it works when running the devcontainer with root permission (using "remoteUser": "root"). Is there an issue in the way I defined my docker image ? or is this an issue in the way I define my devcontainer ?

Subprocess Error when Launching ec2 cluster instance

I am getting subprocess error when launching ec2 cluster instance.
The terminal lags on
Waiting for cluster to enter 'ssh-ready' state'
when running
./spark-ec2 --key-pair=ru_spark --identity-file=ru_spark.pem --region=us-east-1 --zone=us-east-1a launch mycluster
Console:
Warning: Permanently added 'ec2-52-87-225-32.compute-1.amazonaws.com,52.87.225.32' (RSA) to the list of known hosts.
Connection to ec2-52-87-225-32.compute-1.amazonaws.com closed.
Warning: Permanently added 'ec2-52-87-225-32.compute-1.amazonaws.com,52.87.225.32' (RSA) to the list of known hosts.
Transferring cluster's SSH key to slaves...
ec2-34-207-153-79.compute-1.amazonaws.com
Warning: Permanently added 'ec2-34-207-153-79.compute-1.amazonaws.com,34.207.153.79' (RSA) to the list of known hosts.
Cloning spark-ec2 scripts from https://github.com/amplab/spark-ec2/tree/branch-1.6 on master...
Warning: Permanently added 'ec2-52-87-225-32.compute-1.amazonaws.com,52.87.225.32' (RSA) to the list of known hosts.
Cloning into 'spark-ec2'...
error: Peer reports incompatible or unsupported protocol version. while accessing https://github.com/amplab/spark-ec2/info/refs?service=git-upload-pack
fatal: HTTP request failed
Connection to ec2-52-87-225-32.compute-1.amazonaws.com closed.
Error executing remote command, retrying after 30 seconds: Command '['ssh', '-o', 'StrictHostKeyChecking=no', '-o', 'UserKnownHostsFile=/dev/null', '-i', 'ru_spark.pem', '-t', '-t', u'root#ec2-52-87-225-32.compute-1.amazonaws.com', 'rm -rf spark-ec2 && git clone https://github.com/amplab/spark-ec2 -b branch-1.6 spark-ec2']' returned non-zero exit status 128
I updated to curl ssl, changed file permissions to 400 and 600 for ru_spark.pem but neither have helped solve the issue.

Unable to start the Amazon SSM Agent - failed to start message bus

When registering an Amazon SSM Agent, it registers successfully in the SSM Managed Instances console, but the connection shows "Connection Lost".
When I try to start the service manually, I get the following error:
Error occurred fetching the seelog config file path: open /etc/amazon/ssm/seelog.xml: no such file or directory
Initializing new seelog logger
New Seelog Logger Creation Complete
2020-12-09 10:20:01 ERROR error occurred when starting amazon-ssm-agent: failed to start message bus, failed to start health channel: failed to listen on the channel: ipc:///var/lib/amazon/ssm/ipc/health, address in use
How exactly do I solve this? I've tried to restart the service a few times but no luck.
I was able to fix this issue by stopping the agent and purging the /var/lib/amazon/ssm/ipc directory
service amazon-ssm-agent stop
rm -rf /var/lib/amazon/ssm/ipc
service amazon-ssm-agent start

Connect to VPN with Podman

Having this Dockerfile:
FROM fedora:30
ENV LANG C.UTF-8
RUN dnf upgrade -y \
&& dnf install -y \
openssh-clients \
openvpn \
slirp4netns \
&& dnf clean all
CMD ["openvpn", "--config", "/vpn/ovpn.config", "--auth-user-pass", "/vpn/ovpn.auth"]
Building the image with:
podman build -t peque/vpn .
If I try to run it with (note $(pwd), where the VPN configuration and credentials are stored):
podman run -v $(pwd):/vpn:Z --cap-add=NET_ADMIN --device=/dev/net/tun -it peque/vpn
I get the following error:
ERROR: Cannot open TUN/TAP dev /dev/net/tun: Permission denied (errno=13)
Any ideas on how could I fix this? I would not mind changing the base image if that could help (i.e.: to Alpine or anything else as long as it allows me to use openvpn for the connection).
System information
Using Podman 1.4.4 (rootless) and Fedora 30 distribution with kernel 5.1.19.
/dev/net/tun permissions
Running the container with:
podman run -v $(pwd):/vpn:Z --cap-add=NET_ADMIN --device=/dev/net/tun -it peque/vpn
Then, from the container, I can:
# ls -l /dev/ | grep net
drwxr-xr-x. 2 root root 60 Jul 23 07:31 net
I can also list /dev/net, but will get a "permission denied error":
# ls -l /dev/net
ls: cannot access '/dev/net/tun': Permission denied
total 0
-????????? ? ? ? ? ? tun
Trying --privileged
If I try with --privileged:
podman run -v $(pwd):/vpn:Z --privileged --cap-add=NET_ADMIN --device=/dev/net/tun -it peque/vpn
Then instead of the permission-denied error (errno=13), I get a no-such-file-or-directory error (errno=2):
ERROR: Cannot open TUN/TAP dev /dev/net/tun: No such file or directory (errno=2)
I can effectively verify there is no /dev/net/ directory when using --privileged, even if I pass the --cap-add=NET_ADMIN --device=/dev/net/tun parameters.
Verbose log
This is the log I get when configuring the client with verb 3:
OpenVPN 2.4.7 x86_64-redhat-linux-gnu [SSL (OpenSSL)] [LZO] [LZ4] [EPOLL] [PKCS11] [MH/PKTINFO] [AEAD] built on Feb 20 2019
library versions: OpenSSL 1.1.1c FIPS 28 May 2019, LZO 2.08
Outgoing Control Channel Authentication: Using 160 bit message hash 'SHA1' for HMAC authentication
Incoming Control Channel Authentication: Using 160 bit message hash 'SHA1' for HMAC authentication
TCP/UDP: Preserving recently used remote address: [AF_INET]xx.xx.xx.xx:1194
Socket Buffers: R=[212992->212992] S=[212992->212992]
UDP link local (bound): [AF_INET][undef]:0
UDP link remote: [AF_INET]xx.xx.xx.xx:1194
TLS: Initial packet from [AF_INET]xx.xx.xx.xx:1194, sid=3ebc16fc 8cb6d6b1
WARNING: this configuration may cache passwords in memory -- use the auth-nocache option to prevent this
VERIFY OK: depth=1, C=ES, ST=XXX, L=XXX, O=XXXXX, emailAddress=email#domain.com, CN=internal-ca
VERIFY KU OK
Validating certificate extended key usage
++ Certificate has EKU (str) TLS Web Server Authentication, expects TLS Web Server Authentication
VERIFY EKU OK
VERIFY OK: depth=0, C=ES, ST=XXX, L=XXX, O=XXXXX, emailAddress=email#domain.com, CN=ovpn.server.address
Control Channel: TLSv1.2, cipher TLSv1.2 ECDHE-RSA-AES256-GCM-SHA384, 2048 bit RSA
[ovpn.server.address] Peer Connection Initiated with [AF_INET]xx.xx.xx.xx:1194
SENT CONTROL [ovpn.server.address]: 'PUSH_REQUEST' (status=1)
PUSH: Received control message: 'PUSH_REPLY,route xx.xx.xx.xx 255.255.255.0,route xx.xx.xx.0 255.255.255.0,dhcp-option DOMAIN server.net,dhcp-option DNS xx.xx.xx.254,dhcp-option DNS xx.xx.xx.1,dhcp-option DNS xx.xx.xx.1,route-gateway xx.xx.xx.1,topology subnet,ping 10,ping-restart 60,ifconfig xx.xx.xx.24 255.255.255.0,peer-id 1'
OPTIONS IMPORT: timers and/or timeouts modified
OPTIONS IMPORT: --ifconfig/up options modified
OPTIONS IMPORT: route options modified
OPTIONS IMPORT: route-related options modified
OPTIONS IMPORT: --ip-win32 and/or --dhcp-option options modified
OPTIONS IMPORT: peer-id set
OPTIONS IMPORT: adjusting link_mtu to 1624
Outgoing Data Channel: Cipher 'AES-128-CBC' initialized with 128 bit key
Outgoing Data Channel: Using 160 bit message hash 'SHA1' for HMAC authentication
Incoming Data Channel: Cipher 'AES-128-CBC' initialized with 128 bit key
Incoming Data Channel: Using 160 bit message hash 'SHA1' for HMAC authentication
ROUTE_GATEWAY xx.xx.xx.xx/255.255.255.0 IFACE=tap0 HWADDR=0a:38:ba:e6:4b:5f
ERROR: Cannot open TUN/TAP dev /dev/net/tun: No such file or directory (errno=2)
Exiting due to fatal error
Error number may change depending on whether I run the command with --privileged or not.
It turns out that you are blocked by SELinux: after running the client container and trying to access /dev/net/tun inside it, you will get the following AVC denial in the audit log:
type=AVC msg=audit(1563869264.270:833): avc: denied { getattr } for pid=11429 comm="ls" path="/dev/net/tun" dev="devtmpfs" ino=15236 scontext=system_u:system_r:container_t:s0:c502,c803 tcontext=system_u:object_r:tun_tap_device_t:s0 tclass=chr_file permissive=0
To allow your container configuring the tunnel while staying not fully privileged and with SELinux enforced, you need to customize SELinux policies a bit. However, I did not find an easy way to do this properly.
Luckily, there is a tool called udica, which can generate SELinux policies from container configurations. It does not provide the desired policy on its own and requires some manual intervention, so I will describe how I got the openvpn container working step-by-step.
First, install the required tools:
$ sudo dnf install policycoreutils-python-utils policycoreutils udica
Create the container with required privileges, then generate the policy for this container:
$ podman run -it --cap-add NET_ADMIN --device /dev/net/tun -v $PWD:/vpn:Z --name ovpn peque/vpn
$ podman inspect ovpn | sudo udica -j - ovpn_container
Policy ovpn_container created!
Please load these modules using:
# semodule -i ovpn_container.cil /usr/share/udica/templates/base_container.cil
Restart the container with: "--security-opt label=type:ovpn_container.process" parameter
Here is the policy which was generated by udica:
$ cat ovpn_container.cil
(block ovpn_container
(blockinherit container)
(allow process process ( capability ( chown dac_override fsetid fowner mknod net_raw setgid setuid setfcap setpcap net_bind_service sys_chroot kill audit_write net_admin )))
(allow process default_t ( dir ( open read getattr lock search ioctl add_name remove_name write )))
(allow process default_t ( file ( getattr read write append ioctl lock map open create )))
(allow process default_t ( sock_file ( getattr read write append open )))
)
Let's try this policy (note the --security-opt option, which tells podman to run the container in newly created domain):
$ sudo semodule -i ovpn_container.cil /usr/share/udica/templates/base_container.cil
$ podman run -it --cap-add NET_ADMIN --device /dev/net/tun -v $PWD:/vpn:Z --security-opt label=type:ovpn_container.process peque/vpn
<...>
ERROR: Cannot open TUN/TAP dev /dev/net/tun: Permission denied (errno=13)
Ugh. Here is the problem: the policy generated by udica still does not know about specific requirements of our container, as they are not reflected in its configuration (well, probably, it is possible to infer that you want to allow operations on tun_tap_device_t based on the fact that you requested --device /dev/net/tun, but...). So, we need to customize the policy by extending it with few more statements.
Let's disable SELinux temporarily and run the container to collect the expected denials:
$ sudo setenforce 0
$ podman run -it --cap-add NET_ADMIN --device /dev/net/tun -v $PWD:/vpn:Z --security-opt label=type:ovpn_container.process peque/vpn
These are:
$ sudo grep denied /var/log/audit/audit.log
type=AVC msg=audit(1563889218.937:839): avc: denied { read write } for pid=3272 comm="openvpn" name="tun" dev="devtmpfs" ino=15178 scontext=system_u:system_r:ovpn_container.process:s0:c138,c149 tcontext=system_u:object_r:tun_tap_device_t:s0 tclass=chr_file permissive=1
type=AVC msg=audit(1563889218.937:840): avc: denied { open } for pid=3272 comm="openvpn" path="/dev/net/tun" dev="devtmpfs" ino=15178 scontext=system_u:system_r:ovpn_container.process:s0:c138,c149 tcontext=system_u:object_r:tun_tap_device_t:s0 tclass=chr_file permissive=1
type=AVC msg=audit(1563889218.937:841): avc: denied { ioctl } for pid=3272 comm="openvpn" path="/dev/net/tun" dev="devtmpfs" ino=15178 ioctlcmd=0x54ca scontext=system_u:system_r:ovpn_container.process:s0:c138,c149 tcontext=system_u:object_r:tun_tap_device_t:s0 tclass=chr_file permissive=1
type=AVC msg=audit(1563889218.947:842): avc: denied { nlmsg_write } for pid=3273 comm="ip" scontext=system_u:system_r:ovpn_container.process:s0:c138,c149 tcontext=system_u:system_r:ovpn_container.process:s0:c138,c149 tclass=netlink_route_socket permissive=1
Or more human-readable:
$ sudo grep denied /var/log/audit/audit.log | audit2allow
#============= ovpn_container.process ==============
allow ovpn_container.process self:netlink_route_socket nlmsg_write;
allow ovpn_container.process tun_tap_device_t:chr_file { ioctl open read write };
OK, let's modify the udica-generated policy by adding the advised allows to it (note, that here I manually translated the syntax to CIL):
(block ovpn_container
(blockinherit container)
(allow process process ( capability ( chown dac_override fsetid fowner mknod net_raw setgid setuid setfcap setpcap net_bind_service sys_chroot kill audit_write net_admin )))
(allow process default_t ( dir ( open read getattr lock search ioctl add_name remove_name write )))
(allow process default_t ( file ( getattr read write append ioctl lock map open create )))
(allow process default_t ( sock_file ( getattr read write append open )))
; This is our new stuff.
(allow process tun_tap_device_t ( chr_file ( ioctl open read write )))
(allow process self ( netlink_route_socket ( nlmsg_write )))
)
Now we enable SELinux back, reload the module and check that the container works correctly when we specify our custom domain:
$ sudo setenforce 1
$ sudo semodule -r ovpn_container
$ sudo semodule -i ovpn_container.cil /usr/share/udica/templates/base_container.cil
$ podman run -it --cap-add NET_ADMIN --device /dev/net/tun -v $PWD:/vpn:Z --security-opt label=type:ovpn_container.process peque/vpn
<...>
Initialization Sequence Completed
Finally, check that other containers still have no these privileges:
$ podman run -it --cap-add NET_ADMIN --device /dev/net/tun -v $PWD:/vpn:Z peque/vpn
<...>
ERROR: Cannot open TUN/TAP dev /dev/net/tun: Permission denied (errno=13)
Yay! We stay with SELinux on, and allow the tunnel configuration only to our specific container.

AWS connection error

I am trying to connect to the Ubuntu VM on AWS, it was working fine but now shows:-
Permission denied (publickey).
when I run ssh-add /Users/username/Desktop/SSH_KEYS/* , I can connect although i get a warning WARNING: UNPROTECTED PRIVATE KEY FILE!
but after connection I get
1 update is a security update.
System restart required
and each time I reboot my mac machine I need to run ssh-add /Users/username/Desktop/SSH_KEYS/* again
Also when I am connected if I try to run
$ nvidia-smi
I get the following error:-
Failed to initialize NVML: Driver/library version mismatch
Can you please explain what is the reason for all this?
Thanks