Syncing is slow from persistent Disk to Google Bucket - google-cloud-platform

We have around 11TB of images in local storage and the same has been copied to Google Cloud Bucket. We have a requirement to sync all images incrementally i.e onlyn updated files. Currently we are syncing files using below gsutil command.
gsutil -m rsync -r -C /mnt/Test/ gs://test_images/test-H/
Issue which we are facing is it is taking around 6 days to copy and most of the time it is taking to scan the disk. Please let me know if any method to copy updated data at least for 24hours.

To increase the transfer speed, here some tips:
Use regional storage, the closest to your VM
Use a VM with at least 8vCPU to maximise the bandwith like described in quota
Depends on the machine type of the VM:
All shared-core machine types are limited to 1 Gbps.
2 Gbps per vCPU, up to 32 Gbps per VM for machine types that use the Skylake or later CPU platforms with 16 or more vCPUs. This egress rate is also available for ultramem machine types.
2 Gbps per vCPU, up to 16 Gbps per VM for all other machine types with eight or more vCPUs.

We have Increase the size of VM instance to N1-Standard-4 as it will provide more CPU power and network performance on the GCP network. We noticed in stackdriver that the server was hitting 100% CPU utilization at times along with being limited to the max speeds allowed for GCP networking transfers due to the compute sizing and also we mounted bucket in the same server and executed the script. Below is the command we used to mount and sync the files.
Below is the command used to authenticate google bucket.
gcloud auth application-default login
Mount disk by using below command.
gcsfuse --implicit-dirs Bucketname Mountpoint
sync the files using rsync command.

Related

How to Make AWS Infrastructure perform comparable to local server running on a MacBook Pro

I have a web application that is caching data in CSV files, and then in response to an HTTP request, reading the CSV files into memory, constructing a JavaScript Object, then sending that Object as JSON to the client.
When I run this on my Local Server on my Macbook Pro (2022, Chip: Apple M1 Pro, 16GB Memory, 500GB Hard Drive), the 24 CSV files at about 15MB each are all read in about 2.5 seconds, and then the subsequent processing takes another 3 seconds, for a total execution time of about 5.5 seconds.
When I deploy this application to AWS, however, I am struggling to create a comparably performant environment.
I am using AWS Elastic Beanstalk to spin up an EC2 instance, and then attaching an EBS volume to store the CSV files. I know that as the EBS is running on a separate instance, there is possible network latency, but my understanding is typically pretty negligible as far as overall effect on performance.
What I have tried thus far:
Using a Compute focused instance (c5.4xlarge) which is automatically EBS optimized. Then using a Provisioned IOPS (io2) with 1000 GiB storage and 400 IOPS. (Performance, about 10 seconds total)
Using a High Throughput EBS volume, which is supposed to offer greater performance for sequential read jobs (like what I imagined reading a CSV file would be), but that actually performed a little worse than the Provisioned IOPS EBS instance. (Performance, about 11 seconds total)
Can anyone offer any recommendations for which EC2 Instance and EBS Volume should be configured to achieve comparable performance with my local machine? I don't expect to get it matching exactly, but do expect that it can be closer than about twice as slow as the local server.

AWS Lambda - mounting RAM disk

AWS Lambda is limited to storing 512 MB of ephemeral data in /tmp
For a particular use case I need to process more than this - up to several GB in a few hundred files.
I could mount an EFS drive but that then requires mucking about with VPC and NAT Gateway which I am trying to avoid.
Am using various executables (via layers) on these files so I can't just load files into memory and process.
Is there a way of setting up a ramdisk in Lambda (I understand that I would have to provision and pay for a large amount of memory).
I have tried executing
mount -t tmpfs -o size=2G myramdisk /tmp/ramdisk
but receive error mount: command not found
As of 24 March 2022, Lambda supports a configurable ephemeral storage up to 10 GB of space.
Reference: https://aws.amazon.com/blogs/aws/aws-lambda-now-supports-up-to-10-gb-ephemeral-storage/

Program crashes on VM just when finishing

I am running samtools on a google VM with 8CPUs. It seems that when the process is finished, the program crashes giving the below error. At the same time, there is a problem with the bucket, showing this. Any ideas? Problems with saving the file?
Error:
username#instance-1:~/my_bucket$ /usr/local/bin/bin/samtools view -#20 -O sam -f 4 file_dedup.realign
ed.cram > file.unmapped.sam
samtools view: error closing standard output: -1
Also this comes up when tying ls in the bucket directory:
ls: cannot open directory '.': Transport endpoint is not connected
As we discovered at the comment section this issue is related to the difference between a FUSE and a POSIX file systems.
You can solve this issue in two ways:
Increase disk space on your VM instance (by following the documentation Resize the disk and Resize the file system and partitions) and stop using Google Cloud Storage Bucket mounted via FUSE.
Save data received from samtools to the VM's disk at first and then move them to the Google Cloud Storage Bucket mounted via FUSE.
You can estimate cost for each scenario with Google Cloud Pricing Calculator.
Keep in mind that persistent disks have restrictions, among them:
Each persistent disk can be up to 64 TB in size, so there is no need to manage arrays of disks to create large logical volumes.
Most instances can have up to 128 persistent disks and up to 257 TB of total persistent disk space attached. Total persistent disk space
for an instance includes the size of the boot persistent disk.
In addition, please have a look Quotas & limits for Google Cloud Storage.

Is there any way to increase the upload speed of Google Compute Engine?

I'm setting up a Plex server on my Google Cloud Platform instance, but media files are stalling since the Google Cloud upload rate does not exceed the 10 mbps mark. I live in Timon, Maranhão and the nearest Google server is in São Paulo in Brazil, the download rate reaches 1Gbps and the lenght is 60ms, plus the upload is only 10mbps .. Using the Speedtest.net site for the speed test. Could someone help me to improve the upload speed?
I think that the limitation is not on Google side since Google Cloud Platform does not impose bandwidth caps for ingress traffic, the amount of ingress traffic a GCE instance can handle, depends on the machine type and operating system.
On the other hand, the outbound or egress traffic from a virtual machine is subject to maximum network egress throughput caps. These caps are dependent on the number of vCPUs that a virtual machine instance has. Each core is subject to a 2 Gbits/second (Gbps) cap for peak performance. Each additional core increases the network cap, up to a theoretical maximum of 16 Gbps for each virtual machine.
You can see this yourself by setting up a bunch of instance types, and logging their IPerf performance. As #John Hanley states Google does not have any guarantees for performance via the Internet, since upload speeds can vary based on a number of conditions, including the ISP that is using from on-premises.
One of the (many) limits to the performance of a TCP connection is:
Throughput <= WindowSize / RoundTripTime
So it is possible that the window size your local system and the instance in GCP will provide in the upload direction needs to be increased to accommodate the round-trip-time between the two. What do you see for the RoundTripTime? For example, if you "ping" your instance from your local system what does it say about the round-trip-time?
Also, it is not enough for the receiver to advertise that much window, the sender must be willing/able to send that much. So, both sides may need to be tweaked.
Further, there has been a change in the computation of the per-VM network egress cap since Raul's answer. It is still computed as 2 Gbit/s multiplied by the number of vCPUs in the instance, but the upper bound (if you specify Skylake or better for the CPU family) is now 32 Gbit/s, and there is now a lower bound of 10 Gbit/s for instances with 2 or more vCPUs. That cap is applied to the VM as as whole. As before, those are "guaranteed not to exceed" not "guaranteed to achieve."
For future readers:
Alternatively, you can upload files in google storage. Then ssh to the server and download the file gcloud storage cp gs://BUCKET_NAME/OBJECT_NAME SAVE_TO_LOCATION
Optionally, you can zip those files before uploading to google cloud storage
gcloud storage CLI:
https://cloud.google.com/storage/docs/downloading-objects#cli-download-object

Mounting a NVME disk on AWS EC2

So I created i3.large with NVME disk on each nodes, here was my process :
lsblk -> nvme0n1 (check if nvme isn't yet mounted)
sudo mkfs.ext4 -E nodiscard /dev/nvme0n1
sudo mount -o discard /dev/nvme0n1 /mnt/my-data
/dev/nvme0n1 /mnt/my-data ext4 defaults,nofail,discard 0 2
sudo mount -a (check if everything is OK)
sudo reboot
So all of this works, I can connect back to the instance. I have 500 GiB on my new partition.
But after I stop and restart the EC2 machines, some of them randomly became inaccessible (AWS warning only 1/2 test status checked)
When I watch the logs of why it is inaccessible it tells me, it's about the nvme partition (but I did sudo mount -a to check if this was ok, so I don't understand)
I don't have the AWS logs exactly, but I got some lines of it :
Bad magic number in super-block while trying to open
then the superblock is corrupt, and you might try running e2fsck with an alternate superblock:
/dev/fd/9: line 2: plymouth: command not found
I have been using "c5" type instances since almost a month, mostly "c5d.4xlarge" with nvme drives. So, here's what has worked for me on Ubuntu instances:
first get the location nvme drive is located at:
lsblk
mine was always mounted at nvme1n1. Then check if it is an empty volume and doens't has any file system, (it mostly doesn't, unless you are remounting). the output should be /dev/nvme1n1: data for empty drives:
sudo file -s /dev/nvme1n1
Then do this to format(if from last step you learned that your drive had file system and isn't an empty drive. skip this and go to next step):
sudo mkfs -t xfs /dev/nvme1n1
Then create a folder in current directory and mount the nvme drive:
sudo mkdir /data
sudo mount /dev/nvme1n1 /data
you can now even check it's existence by running:
df -h
Stopping and starting an instance erases the ephemeral disks, moves the instance to new host hardware, and gives you new empty disks... so the ephemeral disks will always be blank after stop/start. When an instance is stopped, it doesn't exist on any physical host -- the resources are freed.
So, the best approach, if you are going to be stopping and starting instances is not to add them to /etc/fstab but rather to just format them on first boot and mount them after that. One way of testing whether a filesystem is already present is using the file utility and grep its output. If grep doesn't find a match, it returns false.
The NVMe SSD on the i3 instance class is an example of an Instance Store Volume, also known as an Ephemeral [ Disk | Volume | Drive ]. They are physically inside the instance and extremely fast, but not redundant and not intended for persistent data... hence, "ephemeral." Persistent data needs to be on an Elastic Block Store (EBS) volume or an Elastic File System (EFS), both of which survive instance stop/start, hardware failures, and maintenance.
It isn't clear why your instances are failing to boot, but nofail may not be doing what you expect when a volume is present but has no filesystem. My impression has been that eventually it should succeed.
But, you may need to apt-get install linux-aws if running Ubuntu 16.04. Ubuntu 14.04 NVMe support is not really stable and not recommended.
Each of these three storage solutions has its advantages and disadvantages.
The Instance Store is local, so it's quite fast... but, it's ephemeral. It survives hard and soft reboots, but not stop/start cycles. If your instance suffers a hardware failure, or is scheduled for retirement, as eventually happens to all hardware, you will have to stop and start the instance to move it to new hardware. Reserved and dedicated instances don't change ephemeral disk behavior.
EBS is persistent, redundant storage, that can be detached from one instance and moved to another (and this happens automatically across a stop/start). EBS supports point-in-time snapshots, and these are incremental at the block level, so you don't pay for storing the data that didn't change across snapshots... but through some excellent witchcraft, you also don't have to keep track of "full" vs. "incremental" snapshots -- the snapshots are only logical containers of pointers to the backed-up data blocks, so they are in essence, all "full" snapshots, but only billed as incrememental. When you delete a snapshot, only the blocks no longer needed to restore either that snapshot and any other snapshot are purged from the back-end storage system (which, transparent to you, actually uses Amazon S3).
EBS volumes are available as both SSD and spinning platter magnetic volumes, again with tradeoffs in cost, performance, and appropriate applications. See EBS Volume Types. EBS volumes mimic ordinary hard drives, except that their capacity can be manually increased on demand (but not decreased), and can be converted from one volume type to another without shutting down the system. EBS does all of the data migration on the fly, with a reduction in performance but no disruption. This is a relatively recent innovation.
EFS uses NFS, so you can mount an EFS filesystem on as many instances as you like, even across availability zones within one region. The size limit for any one file in EFS is 52 terabytes, and your instance will actually report 8 exabytes of free space. The actual free space is for all practical purposes unlimited, but EFS is also the most expensive -- if you did have a 52 TiB file stored there for one month, that storage would cost over $15,000. The most I ever stored was about 20 TiB for 2 weeks, cost me about $5k but if you need the space, the space is there. It's billed hourly, so if you stored the 52 TiB file for just a couple of hours and then deleted it, you'd pay maybe $50. The "Elastic" in EFS refers to the capacity and the price. You don't pre-provision space on EFS. You use what you need and delete what you don't, and the billable size is calculated hourly.
A discussion of storage wouldn't be complete without S3. It's not a filesystem, it's an object store. At about 1/10 the price of EFS, S3 also has effectively infinite capacity, and a maximum object size of 5TB. Some applications would be better designed using S3 objects, instead of files.
S3 can also be easily used by systems outside of AWS, whether in your data center or in another cloud. The other storage technologies are intended for use inside EC2, though there is an undocumented workaround that allows EFS to be used externally or across regions, with proxies and tunnels.
I just had a similar experience! My C5.xlarge instance detects an EBS as nvme1n1. I have added this line in fstab.
/dev/nvme1n1 /data ext4 discard,defaults,nofail 0 2
After a couple of rebooting, it looked working. It kept running for weeks. But today, I just got alert that instance was unable to be connected. I tried rebooting it from AWS console, no luck looks the culprit is the fstab. The disk mount is failed.
I raised the ticket to AWS support, no feedback yet. I have to start a new instance to recover my service.
In another test instance, I try to use UUID(get by command blkid) instead of /dev/nvme1n1. So far looks still working... will see if it cause any issue.
I will update here if any AWS support feedback.
================ EDIT with my fix ===========
AWS doesn't give me feedback yet, but I found the issue. Actually, in fstab, whatever you mount /dev/nvme1n1 or UUID, it doesn't matter. My issue is, my ESB has some errors in file system. I attached it to an instance then run
fsck.ext4 /dev/nvme1n1
After fixes a couple of file system error, put it in fstab, reboot, no problem anymore!
You may find useful new EC2 instance family equipped with local NVMe storage: C5d.
See announcement blog post: https://aws.amazon.com/blogs/aws/ec2-instance-update-c5-instances-with-local-nvme-storage-c5d/
Some excerpts from the blog post:
You don’t have to specify a block device mapping in your AMI or during the instance launch; the local storage will show up as one or more devices (/dev/nvme*1 on Linux) after the guest operating system has booted.
Other than the addition of local storage, the C5 and C5d share the same specs.
You can use any AMI that includes drivers for the Elastic Network Adapter (ENA) and NVMe
Each local NVMe device is hardware encrypted using the XTS-AES-256 block cipher and a unique key.
Local NVMe devices have the same lifetime as the instance they are attached to and do not stick around after the instance has been stopped or terminated.