Program crashes on VM just when finishing

Program crashes on VM just when finishing - google-cloud-platform

I am running samtools on a google VM with 8CPUs. It seems that when the process is finished, the program crashes giving the below error. At the same time, there is a problem with the bucket, showing this. Any ideas? Problems with saving the file?
Error:
username#instance-1:~/my_bucket$ /usr/local/bin/bin/samtools view -#20 -O sam -f 4 file_dedup.realign
ed.cram > file.unmapped.sam
samtools view: error closing standard output: -1
Also this comes up when tying ls in the bucket directory:
ls: cannot open directory '.': Transport endpoint is not connected

As we discovered at the comment section this issue is related to the difference between a FUSE and a POSIX file systems.
You can solve this issue in two ways:
Increase disk space on your VM instance (by following the documentation Resize the disk and Resize the file system and partitions) and stop using Google Cloud Storage Bucket mounted via FUSE.
Save data received from samtools to the VM's disk at first and then move them to the Google Cloud Storage Bucket mounted via FUSE.
You can estimate cost for each scenario with Google Cloud Pricing Calculator.
Keep in mind that persistent disks have restrictions, among them:
Each persistent disk can be up to 64 TB in size, so there is no need to manage arrays of disks to create large logical volumes.
Most instances can have up to 128 persistent disks and up to 257 TB of total persistent disk space attached. Total persistent disk space
for an instance includes the size of the boot persistent disk.
In addition, please have a look Quotas & limits for Google Cloud Storage.

Related

how to mitigate storage has grown to 10TB in GCP instance

I have GCP instance.
My Database size was 300GB.
but the instance size was growing to 10TB.
where my remaining size going?
I have running ongoing replication from AWS DMS to GCP .
Please help me.
Thanks

Probably your bin logs/wal logs have grown a lot without you noticing it, you can check for this at metrics explorer and check for the detailed storage of your instance. There is no way in GCP to shrink the disk on your own so you will have to either migrate your instance into a new one with an smaller disk or open a ticket to Cloud Support and ask them to shrink your disk once you configure your instance properly

Decreasing disk size on Google Virtual Machine Web Server

I am using Google Compute Engine VM for a web server. I currently have 3TB as disk space but want to bring it down to 1TB. Can someone tell me where I can do this from? Thanks.

As #John Hanley only increasing disk size is supported by Google Cloud:
gcloud compute disks resize resizes a Compute Engine disk(s).
Only increasing disk size is supported. Disks can be resized regardless of whether hey are attached.
I also found an answer in Serverfault that could help you with this topic here.
It uses fsarchiver tool in order to manage the boot disk resizing:
If the disk is not a boot disk you can do the following:
add a new disk with the required size and format it.
mount the new disk
cp -r --preserve=all /mnt/disk1/. /mnt/disk2/
edit the /etc/fstab to mount the new disk instead the old one
If you have standard disk and you want to shorten the cp time. You can
first create new ssd disk from snapshot and copy it to 2T ssd disk.
Then make a snapshot from the 2T disk and create a new 2T standard
disk.
If your disk is a boot disk, you can use a tool like fsarchiver:
Create an archive from the boot disk. fsarchiver savefs /mnt/backup/boot_disk.fsa /dev/sda
Restore the archive on the new disk fsarchiver restfs /mnt/backup/boot_disk.fsa id=0,dest=/dev/sdb

What's the equivalent to shared Docker volumes in GCE?

When developing with containers locally, docker-compose lets you create shared volumes that all of your containers can access. You can easily drop small credential files onto these volumes from one container, and have another container use them.
I'm trying to find something similar in Google Compute Engine but I haven't been able to find anything analagous.
Compute Engine disks cannot be shared between instances
Filestore instances start at a minimum of 1 Tb and are expensive overkill
Is there anything similar in Google Compute Engine to the concept of shared volumes in Docker, in terms of how it can be mounted to the instances, shared among instances, and small/cheap?
Does such a concept not exist in GCE, and is such a feature perhaps available, but only available, in Google Kubernetes Engine (GKE)?

Actually Compute Engine disks can be shared between instances, but at this time this feature is in beta.
According to Google terminology, Persistent Disk in Multi-writer Mode is called Shared PD or PD multi-writer. Shared PD is a persistent disk created with multiWriter option set to True. Shared PD can be attached to up to 2 VMs in read-write mode.
Google Cloud > Cloud SDK: CLI > Doc > Reference > gcloud beta compute disks create:
gcloud beta compute disks create --multi-writer
create Compute Engine persistent disk in multi-writer mode so that it can be attached with read-write access to multiple VMs. Can only be used with Zonal SSD persistent disks. Disks in multi-writer mode do not support resize and snapshot operations.
As for GKE, it supports disk sharing as well. You can share persistent disk between multiple Pods in read-only mode.
See Google Cloud > GKE > Doc > Using persistent disks with multiple readers for more details.

An alternative solution is to use Cloud Storage for this. If you have few Mb and acceptable I/O operations, you can use gcsfuse. The principle is simple: mount a CLoud Storage bucket in your file system and write to it as any other directory in your system.
GCSFuse convert the read/write operation in API Call and you are charged on API call (few $ for millions of calls, but if your app is I/O intensive, it can cost!). In addition, it's API calls, that means, it's not a local disk and latency (due to network, HTTPS handshake,...) is higher than with a local disk.
So, keep in mind that GCSFuse is simply a wrapper of Google Cloud Storage APIs.
Note: If you want to share credentials, why you don't use Google Secret Manager?

Syncing is slow from persistent Disk to Google Bucket

We have around 11TB of images in local storage and the same has been copied to Google Cloud Bucket. We have a requirement to sync all images incrementally i.e onlyn updated files. Currently we are syncing files using below gsutil command.
gsutil -m rsync -r -C /mnt/Test/ gs://test_images/test-H/
Issue which we are facing is it is taking around 6 days to copy and most of the time it is taking to scan the disk. Please let me know if any method to copy updated data at least for 24hours.

To increase the transfer speed, here some tips:
Use regional storage, the closest to your VM
Use a VM with at least 8vCPU to maximise the bandwith like described in quota
Depends on the machine type of the VM:
All shared-core machine types are limited to 1 Gbps.
2 Gbps per vCPU, up to 32 Gbps per VM for machine types that use the Skylake or later CPU platforms with 16 or more vCPUs. This egress rate is also available for ultramem machine types.
2 Gbps per vCPU, up to 16 Gbps per VM for all other machine types with eight or more vCPUs.

We have Increase the size of VM instance to N1-Standard-4 as it will provide more CPU power and network performance on the GCP network. We noticed in stackdriver that the server was hitting 100% CPU utilization at times along with being limited to the max speeds allowed for GCP networking transfers due to the compute sizing and also we mounted bucket in the same server and executed the script. Below is the command we used to mount and sync the files.
Below is the command used to authenticate google bucket.
gcloud auth application-default login
Mount disk by using below command.
gcsfuse --implicit-dirs Bucketname Mountpoint
sync the files using rsync command.

Mounting a NVME disk on AWS EC2

So I created i3.large with NVME disk on each nodes, here was my process :
lsblk -> nvme0n1 (check if nvme isn't yet mounted)
sudo mkfs.ext4 -E nodiscard /dev/nvme0n1
sudo mount -o discard /dev/nvme0n1 /mnt/my-data
/dev/nvme0n1 /mnt/my-data ext4 defaults,nofail,discard 0 2
sudo mount -a (check if everything is OK)
sudo reboot
So all of this works, I can connect back to the instance. I have 500 GiB on my new partition.
But after I stop and restart the EC2 machines, some of them randomly became inaccessible (AWS warning only 1/2 test status checked)
When I watch the logs of why it is inaccessible it tells me, it's about the nvme partition (but I did sudo mount -a to check if this was ok, so I don't understand)
I don't have the AWS logs exactly, but I got some lines of it :
Bad magic number in super-block while trying to open
then the superblock is corrupt, and you might try running e2fsck with an alternate superblock:
/dev/fd/9: line 2: plymouth: command not found

I have been using "c5" type instances since almost a month, mostly "c5d.4xlarge" with nvme drives. So, here's what has worked for me on Ubuntu instances:
first get the location nvme drive is located at:
lsblk
mine was always mounted at nvme1n1. Then check if it is an empty volume and doens't has any file system, (it mostly doesn't, unless you are remounting). the output should be /dev/nvme1n1: data for empty drives:
sudo file -s /dev/nvme1n1
Then do this to format(if from last step you learned that your drive had file system and isn't an empty drive. skip this and go to next step):
sudo mkfs -t xfs /dev/nvme1n1
Then create a folder in current directory and mount the nvme drive:
sudo mkdir /data
sudo mount /dev/nvme1n1 /data
you can now even check it's existence by running:
df -h

Stopping and starting an instance erases the ephemeral disks, moves the instance to new host hardware, and gives you new empty disks... so the ephemeral disks will always be blank after stop/start. When an instance is stopped, it doesn't exist on any physical host -- the resources are freed.
So, the best approach, if you are going to be stopping and starting instances is not to add them to /etc/fstab but rather to just format them on first boot and mount them after that. One way of testing whether a filesystem is already present is using the file utility and grep its output. If grep doesn't find a match, it returns false.
The NVMe SSD on the i3 instance class is an example of an Instance Store Volume, also known as an Ephemeral [ Disk | Volume | Drive ]. They are physically inside the instance and extremely fast, but not redundant and not intended for persistent data... hence, "ephemeral." Persistent data needs to be on an Elastic Block Store (EBS) volume or an Elastic File System (EFS), both of which survive instance stop/start, hardware failures, and maintenance.
It isn't clear why your instances are failing to boot, but nofail may not be doing what you expect when a volume is present but has no filesystem. My impression has been that eventually it should succeed.
But, you may need to apt-get install linux-aws if running Ubuntu 16.04. Ubuntu 14.04 NVMe support is not really stable and not recommended.
Each of these three storage solutions has its advantages and disadvantages.
The Instance Store is local, so it's quite fast... but, it's ephemeral. It survives hard and soft reboots, but not stop/start cycles. If your instance suffers a hardware failure, or is scheduled for retirement, as eventually happens to all hardware, you will have to stop and start the instance to move it to new hardware. Reserved and dedicated instances don't change ephemeral disk behavior.
EBS is persistent, redundant storage, that can be detached from one instance and moved to another (and this happens automatically across a stop/start). EBS supports point-in-time snapshots, and these are incremental at the block level, so you don't pay for storing the data that didn't change across snapshots... but through some excellent witchcraft, you also don't have to keep track of "full" vs. "incremental" snapshots -- the snapshots are only logical containers of pointers to the backed-up data blocks, so they are in essence, all "full" snapshots, but only billed as incrememental. When you delete a snapshot, only the blocks no longer needed to restore either that snapshot and any other snapshot are purged from the back-end storage system (which, transparent to you, actually uses Amazon S3).
EBS volumes are available as both SSD and spinning platter magnetic volumes, again with tradeoffs in cost, performance, and appropriate applications. See EBS Volume Types. EBS volumes mimic ordinary hard drives, except that their capacity can be manually increased on demand (but not decreased), and can be converted from one volume type to another without shutting down the system. EBS does all of the data migration on the fly, with a reduction in performance but no disruption. This is a relatively recent innovation.
EFS uses NFS, so you can mount an EFS filesystem on as many instances as you like, even across availability zones within one region. The size limit for any one file in EFS is 52 terabytes, and your instance will actually report 8 exabytes of free space. The actual free space is for all practical purposes unlimited, but EFS is also the most expensive -- if you did have a 52 TiB file stored there for one month, that storage would cost over $15,000. The most I ever stored was about 20 TiB for 2 weeks, cost me about $5k but if you need the space, the space is there. It's billed hourly, so if you stored the 52 TiB file for just a couple of hours and then deleted it, you'd pay maybe $50. The "Elastic" in EFS refers to the capacity and the price. You don't pre-provision space on EFS. You use what you need and delete what you don't, and the billable size is calculated hourly.
A discussion of storage wouldn't be complete without S3. It's not a filesystem, it's an object store. At about 1/10 the price of EFS, S3 also has effectively infinite capacity, and a maximum object size of 5TB. Some applications would be better designed using S3 objects, instead of files.
S3 can also be easily used by systems outside of AWS, whether in your data center or in another cloud. The other storage technologies are intended for use inside EC2, though there is an undocumented workaround that allows EFS to be used externally or across regions, with proxies and tunnels.

I just had a similar experience! My C5.xlarge instance detects an EBS as nvme1n1. I have added this line in fstab.
/dev/nvme1n1 /data ext4 discard,defaults,nofail 0 2
After a couple of rebooting, it looked working. It kept running for weeks. But today, I just got alert that instance was unable to be connected. I tried rebooting it from AWS console, no luck looks the culprit is the fstab. The disk mount is failed.
I raised the ticket to AWS support, no feedback yet. I have to start a new instance to recover my service.
In another test instance, I try to use UUID(get by command blkid) instead of /dev/nvme1n1. So far looks still working... will see if it cause any issue.
I will update here if any AWS support feedback.
================ EDIT with my fix ===========
AWS doesn't give me feedback yet, but I found the issue. Actually, in fstab, whatever you mount /dev/nvme1n1 or UUID, it doesn't matter. My issue is, my ESB has some errors in file system. I attached it to an instance then run
fsck.ext4 /dev/nvme1n1
After fixes a couple of file system error, put it in fstab, reboot, no problem anymore!

You may find useful new EC2 instance family equipped with local NVMe storage: C5d.
See announcement blog post: https://aws.amazon.com/blogs/aws/ec2-instance-update-c5-instances-with-local-nvme-storage-c5d/
Some excerpts from the blog post:
You don’t have to specify a block device mapping in your AMI or during the instance launch; the local storage will show up as one or more devices (/dev/nvme*1 on Linux) after the guest operating system has booted.
Other than the addition of local storage, the C5 and C5d share the same specs.
You can use any AMI that includes drivers for the Elastic Network Adapter (ENA) and NVMe
Each local NVMe device is hardware encrypted using the XTS-AES-256 block cipher and a unique key.
Local NVMe devices have the same lifetime as the instance they are attached to and do not stick around after the instance has been stopped or terminated.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js