When developing with containers locally, docker-compose lets you create shared volumes that all of your containers can access. You can easily drop small credential files onto these volumes from one container, and have another container use them.
I'm trying to find something similar in Google Compute Engine but I haven't been able to find anything analagous.
Compute Engine disks cannot be shared between instances
Filestore instances start at a minimum of 1 Tb and are expensive overkill
Is there anything similar in Google Compute Engine to the concept of shared volumes in Docker, in terms of how it can be mounted to the instances, shared among instances, and small/cheap?
Does such a concept not exist in GCE, and is such a feature perhaps available, but only available, in Google Kubernetes Engine (GKE)?
Actually Compute Engine disks can be shared between instances, but at this time this feature is in beta.
According to Google terminology, Persistent Disk in Multi-writer Mode is called Shared PD or PD multi-writer. Shared PD is a persistent disk created with multiWriter option set to True. Shared PD can be attached to up to 2 VMs in read-write mode.
Google Cloud > Cloud SDK: CLI > Doc > Reference > gcloud beta compute disks create:
gcloud beta compute disks create --multi-writer
create Compute Engine persistent disk in multi-writer mode so that it can be attached with read-write access to multiple VMs. Can only be used with Zonal SSD persistent disks. Disks in multi-writer mode do not support resize and snapshot operations.
As for GKE, it supports disk sharing as well. You can share persistent disk between multiple Pods in read-only mode.
See Google Cloud > GKE > Doc > Using persistent disks with multiple readers for more details.
An alternative solution is to use Cloud Storage for this. If you have few Mb and acceptable I/O operations, you can use gcsfuse. The principle is simple: mount a CLoud Storage bucket in your file system and write to it as any other directory in your system.
GCSFuse convert the read/write operation in API Call and you are charged on API call (few $ for millions of calls, but if your app is I/O intensive, it can cost!). In addition, it's API calls, that means, it's not a local disk and latency (due to network, HTTPS handshake,...) is higher than with a local disk.
So, keep in mind that GCSFuse is simply a wrapper of Google Cloud Storage APIs.
Note: If you want to share credentials, why you don't use Google Secret Manager?
Related
gcloud compute instances attach-disk wants a disk name, but it doesn't show up on my Disks page. It seems silly to create and pay for another disk when this one has much more storage than I plan to use.
Notice that the Cloud Shell is intended for interactive usage and that in general the disk is intended to be recycled, as you can't manage it and it will be deleted after 120 days of inactivity. You'll need to consider a different solution, such as Cloud Storage if you wish the data to persist in time. So you'd need to store your data in Cloud Storage and then create a new disk to store the information, as the Cloud Shell is a tool meant for rapid testing and prototyping and not as a development machine for persistent storage.
As per the GCP article enter here, you can attach-detach disk to the VM instance from gcloud shell.
To detach a disk from a instance:
gcloud compute instances detach-disk [INSTANCE_NAME] --disk=[DISK_NAME]
To attach a disk to another instance:
gcloud compute instances attach-disk [INSTANCE_NAME] --disk=[DISK_NAME] --boot
Simple question for GCE users: are persistent boot disks safe to be used or data loss could occur?
I've seen that I can attach additional persistent disks, but what about the standard boot disks (that should be persistent as well) ?
What happens during maintenance, equipment failures and so on ? Are these boot disks stored on hardware with built-in redundancy (raid and so on) ?
In other words, are a compute instance with persistent boot-disk similiar to a non-cloud VM stored on local RAID (from data-loss point of view) ?
Usually cloud instances are volatile, a crash, shutdown, maintenance and so on, will destroy all data stored.
Obvisouly, i'll have backups.
GCE Persistent Disks are designed to be durable and highly-available:
Persistent disks are durable network storage devices that your instances can access like physical disks in a desktop or a server. The data on each persistent disk is distributed across several physical disks. Compute Engine manages the physical disks and the data distribution to ensure redundancy and optimize performance for you.
(emphasis my own, source: Google documentation)
You have a choice of zonal or regional (currently in public beta) persistent disks, on an HDD or SSD-based platform. For boot disks, only zonal disks are supported as of the time of this writing.
As the name suggests, zonal disks are only guaranteed to persist their data within a single zone; outage or failure of that zone may render the data unavailable. Writes to regional disks are replicated to two zones in a region to safeguard against the outage of any one zone. The Google Compute Engine console, "Disks" section will show you that boot disks for your instances are zonal persistent disks.
Irrespective of the durability, it is obviously wise to keep your own backups of your persistent disks in another form of storage to safeguard other mechanisms for data loss, such as corruption in your application or user error by an operator. Snapshots of persistent disks are replicated to other regions; however, be aware of their lifecycle in the event the parent disk is deleted.
In addition to reviewing the comprehensive page linked above, I recommend reviewing the relevant SLA documentation to ascertain the precise guarantees and service levels offered to you.
Usually cloud instances are volatile, a crash, shutdown, maintenance and so on, will destroy all data stored.
The cloud model does indeed prefer instances which are stateless and can be replaced at will. This offers many scalability and robustness advantages, which can be achieved using managed instance groups, for example. However, you can use VMs for persistent storage if desired.
normally the data boot disk should be ok with restart and other maintenance operation. But it will be deleted with the compute by default.
If you use managed-instance-group, preemptible compute... and you want persistent data, you should use another storage system. If you juste use compute as is, it should be safe enough with backup.
I still think an additional persistent disk or another storage system is a better way to do things. But it's only my opinion.
I have created a Windows EC2 instance on AWS, and I have loaded it up with all of my needed software. My intention is to use this instance to create an image, so that I can (in the very near future) load up a much more powerful instance type using this image, and run a bunch of computations.
However, I also need to have a centralized location to store data. So, I created an EFS drive on AWS, and now I am trying to connect my instance to the EFS using a symbolic link that will persist to every other instance I load up in the future. I want to eventually have an army of instances, all of which use the centralized EFS drive as their primary storage device so that they can all load and save data, which can then be used by other instances.
I've been running Google searches all morning, but I'm coming up empty on how to do this. Any resources or tips would be greatly appreciated.
Thanks!
EFS is basically a managed NFS server. In order to mount this to a Windows instance, you will need to find an NFS client for windows.
An alternative would be to mount the EFS to a linux-based instance, and export the file system using Samba which could then be mounted on your Windows instances. Doing this you would lose out on a lot of the benefits of EFS (your linux instance is a single point of failure, and for high-bandwidth requirements will be a bottleneck) but it might be possible.
You don't say what you are trying to accomplish, but I would suggest designing a solution that would pull data from S3 as needed. That would also allow you to run multiple instances in parallel.
I am new to cloud environment. I do understand the definition and storage types EBS and S3. I wanted to understand the application of EBS as compared to S3.
I do understand EBS looks like a device for heavy though put operations. I cannot find any application where this can be used in comparison to S3. I could think of putting server logs on EBS on magnetic storage, as one EBS can be attached to one instance.
S3 you can use the scaling property to add some heavy data and expand in real time. We can deploy our slef managed dbs on this service.
Please correct me if I am wrong. Please help me understand what is best suited for what and application of them in comparison with one another.
As you stated, they are primarily different types of storage:
Amazon Elastic Block Store (EBS) is a persistent disk-storage service, which provides storage volumes to a virtual machine (similar to VMDK files in VMWare)
Amazon Simple Storage Service (S3) is an object store system that stores files as objects and optionally makes them available across the Internet.
So, how do people choose which to use? It's quite simple... If they need a volume mounted on an Amazon EC2 instance, they need to use Amazon EBS. It gives them a C:, D: drive, etc in Windows and a mountable volume in Linux. Computers traditionally expect to have locally-attached disk storage. Put simply: If the operating system or an application running on an Amazon EC2 instance wants to store data locally, it will use EBS.
EBS Volumes are actually stored on two physical devices in case of failure, but an EBS volume appears as a single volume. The volume size must be selected when the volume is created. The volume exists in a single Availability Zone and can only be attached to EC2 instances in the same Availability Zone. EBS Volumes persist even when the attached EC2 instance is Stopped; when the instance is Started again, the disk remains attached and all data has been presrved.
Amazon S3, however, is something quite different. It is a storage service that allows files to be uploaded/downloaded (PutObject, GetObject) and files are replicated across at least three data centers. Files can optionally be accessed via the Internet via HTTP/HTTPS without requiring a web server. There are no limits on the amount of data that can be stored. Access can be granted per-object, per-bucket via a Bucket Policy, or via IAM Users and Groups.
Amazon S3 is a good option when data needs to be shared (both locally and across the Internet), retained for long periods, backed-up (or even for storing backups) and made accessible to other systems. However, applications need to specifically coded to use Amazon S3 and many traditional application expect to store data on a local drive rather than on a separate storage service.
While Amazon S3 has many benefits, there are still situations where Amazon EBS is a better storage choice:
When using applications that expect to store data locally
For storing temporary files
When applications want to partially update files, because the smallest storage unit in S3 is a file and updating a part of a file requires re-uploading the whole file
For very High-IO situations, such as databases (EBS Provisioned IOPS can provide volumes up to 20,000 IOPS)
For creating volume snapshots as backups
For creating Amazon Machine Images (AMIs) that can be used to boot EC2 instances
Bottom line: They are primarily different types of storage and each have their own usage sweet-spot, just like a Database is a good form of storage depending upon the situation.
I'm trying to mount the same volume for a Beanstalk build but can't figure out how to make it work with the volume-id.
I can attach a new volume, and I can attach one based on a snapshot ID but neither are what I'm after.
My current .ebextension
commands:
01umount:
command: "umount /dev/sdh"
ignoreErrors: true
02mkfs:
command: "mkfs -t ext3 /dev/sdh"
03mkdir:
command: "mkdir -p /media/volume1"
ignoreErrors: true
04mount:
command: "mount /dev/sdh /media/volume1"
option_settings:
- namespace: aws:autoscaling:launchconfiguration
option_name: BlockDeviceMappings
value: /dev/sdh=:20
Which of course will mount a new volume, not attach an existing one. Perhaps snapshot is what I want and I just don't understand the terminology here?
I need the same data that was on the volume when the autoscaling kicks in to be on each EC2 instants that scales... A snapshot would surely just be the data that existed at the point the snapshot was created?
Any ideas or better approaches?
Elastic Block Store (EBS) allows you to create, snapshot/clone, and destroy virtual hard drives for EC2 instances. These drives ("volumes") can be attached to and detached from EC2 instances, but they are not a "share" or shared volume... so attaching a volume by ID becomes a non-useful idea after the first instance launched.
EBS volumes are hard drives. The analogy is imprecise (because they're on a SAN) but much the same way as you can't physically install the same hard drive in multiple servers, you can't attach an EBS volume to multiple instances (SAN != NAS).
Designing with a cloud mindset, all of your fixed resources would actually be on the snapshot (disk image) you deploy when you release a new version and then use to spawn each fresh auto-scaled instance... and nothing persistent would be stored there because -- just as important as scaling up, is scaling down. Autoscaled instances go away when not needed.
AWS has Simple Storage Service (S3) which is commonly used for storing things like documents, avatars, images, videos, and other resources that need to be accessible in a distributed environment. It is not a filesystem, and can't properly be compared to a filesystem, because it's an object store... but is a highly scalable and highly available storage service that is well-suited to distributed applications. s3fs allows an S3 "bucket" to be mounted into your machine's filesystem, but this is no panacea. That mechanism should be reserved for back-end process use, if you use it at all, because it's not appropriate for resources like code or templates, and will not perform as well for serving up content as S3 will perform if used as designed, with clients directly accessing it over https. You can secure the content through more than one mechanism, as documented.
AWS also now has Elastic File System (EFS) which sets up an array of storage that you can mount from all of your machines, using NFS. AWS provides the NFS server and the back-end storage. Unlike EBS, you do not need to know how much storage to provision up front, because it scales up and down based on what you've stored, billing you This service is still in "preview" as of this writing, so should not be used for production data.
Or, you can manually configure your own NFS server and mount it from the autoscaling machines. Making such as setup fail-safe is a bit tricky, though.