Mounting a NVME disk on AWS EC2 - amazon-web-services

So I created i3.large with NVME disk on each nodes, here was my process :
lsblk -> nvme0n1 (check if nvme isn't yet mounted)
sudo mkfs.ext4 -E nodiscard /dev/nvme0n1
sudo mount -o discard /dev/nvme0n1 /mnt/my-data
/dev/nvme0n1 /mnt/my-data ext4 defaults,nofail,discard 0 2
sudo mount -a (check if everything is OK)
sudo reboot
So all of this works, I can connect back to the instance. I have 500 GiB on my new partition.
But after I stop and restart the EC2 machines, some of them randomly became inaccessible (AWS warning only 1/2 test status checked)
When I watch the logs of why it is inaccessible it tells me, it's about the nvme partition (but I did sudo mount -a to check if this was ok, so I don't understand)
I don't have the AWS logs exactly, but I got some lines of it :
Bad magic number in super-block while trying to open
then the superblock is corrupt, and you might try running e2fsck with an alternate superblock:
/dev/fd/9: line 2: plymouth: command not found

I have been using "c5" type instances since almost a month, mostly "c5d.4xlarge" with nvme drives. So, here's what has worked for me on Ubuntu instances:
first get the location nvme drive is located at:
lsblk
mine was always mounted at nvme1n1. Then check if it is an empty volume and doens't has any file system, (it mostly doesn't, unless you are remounting). the output should be /dev/nvme1n1: data for empty drives:
sudo file -s /dev/nvme1n1
Then do this to format(if from last step you learned that your drive had file system and isn't an empty drive. skip this and go to next step):
sudo mkfs -t xfs /dev/nvme1n1
Then create a folder in current directory and mount the nvme drive:
sudo mkdir /data
sudo mount /dev/nvme1n1 /data
you can now even check it's existence by running:
df -h

Stopping and starting an instance erases the ephemeral disks, moves the instance to new host hardware, and gives you new empty disks... so the ephemeral disks will always be blank after stop/start. When an instance is stopped, it doesn't exist on any physical host -- the resources are freed.
So, the best approach, if you are going to be stopping and starting instances is not to add them to /etc/fstab but rather to just format them on first boot and mount them after that. One way of testing whether a filesystem is already present is using the file utility and grep its output. If grep doesn't find a match, it returns false.
The NVMe SSD on the i3 instance class is an example of an Instance Store Volume, also known as an Ephemeral [ Disk | Volume | Drive ]. They are physically inside the instance and extremely fast, but not redundant and not intended for persistent data... hence, "ephemeral." Persistent data needs to be on an Elastic Block Store (EBS) volume or an Elastic File System (EFS), both of which survive instance stop/start, hardware failures, and maintenance.
It isn't clear why your instances are failing to boot, but nofail may not be doing what you expect when a volume is present but has no filesystem. My impression has been that eventually it should succeed.
But, you may need to apt-get install linux-aws if running Ubuntu 16.04. Ubuntu 14.04 NVMe support is not really stable and not recommended.
Each of these three storage solutions has its advantages and disadvantages.
The Instance Store is local, so it's quite fast... but, it's ephemeral. It survives hard and soft reboots, but not stop/start cycles. If your instance suffers a hardware failure, or is scheduled for retirement, as eventually happens to all hardware, you will have to stop and start the instance to move it to new hardware. Reserved and dedicated instances don't change ephemeral disk behavior.
EBS is persistent, redundant storage, that can be detached from one instance and moved to another (and this happens automatically across a stop/start). EBS supports point-in-time snapshots, and these are incremental at the block level, so you don't pay for storing the data that didn't change across snapshots... but through some excellent witchcraft, you also don't have to keep track of "full" vs. "incremental" snapshots -- the snapshots are only logical containers of pointers to the backed-up data blocks, so they are in essence, all "full" snapshots, but only billed as incrememental. When you delete a snapshot, only the blocks no longer needed to restore either that snapshot and any other snapshot are purged from the back-end storage system (which, transparent to you, actually uses Amazon S3).
EBS volumes are available as both SSD and spinning platter magnetic volumes, again with tradeoffs in cost, performance, and appropriate applications. See EBS Volume Types. EBS volumes mimic ordinary hard drives, except that their capacity can be manually increased on demand (but not decreased), and can be converted from one volume type to another without shutting down the system. EBS does all of the data migration on the fly, with a reduction in performance but no disruption. This is a relatively recent innovation.
EFS uses NFS, so you can mount an EFS filesystem on as many instances as you like, even across availability zones within one region. The size limit for any one file in EFS is 52 terabytes, and your instance will actually report 8 exabytes of free space. The actual free space is for all practical purposes unlimited, but EFS is also the most expensive -- if you did have a 52 TiB file stored there for one month, that storage would cost over $15,000. The most I ever stored was about 20 TiB for 2 weeks, cost me about $5k but if you need the space, the space is there. It's billed hourly, so if you stored the 52 TiB file for just a couple of hours and then deleted it, you'd pay maybe $50. The "Elastic" in EFS refers to the capacity and the price. You don't pre-provision space on EFS. You use what you need and delete what you don't, and the billable size is calculated hourly.
A discussion of storage wouldn't be complete without S3. It's not a filesystem, it's an object store. At about 1/10 the price of EFS, S3 also has effectively infinite capacity, and a maximum object size of 5TB. Some applications would be better designed using S3 objects, instead of files.
S3 can also be easily used by systems outside of AWS, whether in your data center or in another cloud. The other storage technologies are intended for use inside EC2, though there is an undocumented workaround that allows EFS to be used externally or across regions, with proxies and tunnels.

I just had a similar experience! My C5.xlarge instance detects an EBS as nvme1n1. I have added this line in fstab.
/dev/nvme1n1 /data ext4 discard,defaults,nofail 0 2
After a couple of rebooting, it looked working. It kept running for weeks. But today, I just got alert that instance was unable to be connected. I tried rebooting it from AWS console, no luck looks the culprit is the fstab. The disk mount is failed.
I raised the ticket to AWS support, no feedback yet. I have to start a new instance to recover my service.
In another test instance, I try to use UUID(get by command blkid) instead of /dev/nvme1n1. So far looks still working... will see if it cause any issue.
I will update here if any AWS support feedback.
================ EDIT with my fix ===========
AWS doesn't give me feedback yet, but I found the issue. Actually, in fstab, whatever you mount /dev/nvme1n1 or UUID, it doesn't matter. My issue is, my ESB has some errors in file system. I attached it to an instance then run
fsck.ext4 /dev/nvme1n1
After fixes a couple of file system error, put it in fstab, reboot, no problem anymore!

You may find useful new EC2 instance family equipped with local NVMe storage: C5d.
See announcement blog post: https://aws.amazon.com/blogs/aws/ec2-instance-update-c5-instances-with-local-nvme-storage-c5d/
Some excerpts from the blog post:
You don’t have to specify a block device mapping in your AMI or during the instance launch; the local storage will show up as one or more devices (/dev/nvme*1 on Linux) after the guest operating system has booted.
Other than the addition of local storage, the C5 and C5d share the same specs.
You can use any AMI that includes drivers for the Elastic Network Adapter (ENA) and NVMe
Each local NVMe device is hardware encrypted using the XTS-AES-256 block cipher and a unique key.
Local NVMe devices have the same lifetime as the instance they are attached to and do not stick around after the instance has been stopped or terminated.

Related

Is volume in aws like a hard drive that the instance uses?

I am learning about aws and using ec2 instances. I am trying to understand what a volume is.
I have read from the aws site that:
An Amazon EBS volume is a durable, block-level storage device that you
can attach to your instances. After you attach a volume to an
instance, you can use it as you would use a physical hard drive.
Is it where things are stored when I install things like npm and node? Does it function like the harddrive o my server?
AWS EBS is block storage volume, and for the ease of understanding, yes you can consider it same as hard drive, however with more benefits over traditional hard drive. few of them are:
You can increase/decrease size of the storage as per your requirement
(Hence name Elastic)
You can add multiple ebs to your instances, for example 20 GB of volume1 and 30 GB of volume2
And for the question you asked if you can install npm & node yes you
can as it would be attached to your EC2 instance and your instance
can easily utilised attached data, modules,etc
For further explanation you can refer this user guide from AWS on EBS: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-volumes.html
Yes it is exactly like a hard drive on your server and you can have multiple devices.
The cool thing is that you also can expand them if you need extra space.
where things are stored when I install things like npm and node
Yes, technically ebs volume is virtual storage drive which is
connected to your instance through network, ( flash drive connected over network).
Since network is involved which implicitly means there will be some latency because of data transfer through network.
The data is persistent even if instance stops,terminates, hibernates or hardware failure.
Since it is network drive it can be attached or detached to any other instance.
Adding to this there is another type of storage which you will find called as instance store
You can specify instance store volumes for an instance only when you launch it. You can't detach an instance store volume from one instance and attach it to a different instance.
it gives very high IOPS because it is directly (physically) attached to instance.
The use case for instance store would where data changes rapidly like for cache or buffers.
Your data will be lost if any of these events happens like The underlying disk drive fails The instance stops, instance hibernates instance terminates or drive failure.

Google cloud VM reboot and data loss of attached persistent disk

This is the situation:
I hooked a disk to a VM. Reboot the VM (for any reasons). I have to remount the disk otherwise it is not available (unmounted) after the restart. So I remount the disk with the following command: sudo mount -o discard, defaults /dev/[DEVICE_ID] /mnt/disks/[MNT_DIR]
Does the fact that I have to remount the disk also mean that I have lost all the data inside?
Thanks in advance
The document that you shared with us says:
"If you detach this zonal persistent disk or create a snapshot from the boot disk for this instance, edit the /etc/fstab file and remove the entry for this zonal persistent disk"
Therefore, if you are not creating snapshots from the BOOTDISK you can reboot your instance without having any issue with your data.
However, if you are using snapshots or schedule snapshots of your SSD disk, I would recommend you to follow these best practises to create it :
https://cloud.google.com/compute/docs/disks/snapshot-best-practices
But also you can do persistent disk snapshots at any time without unmounting your disk. These recommendations are only in order to have a greater reliability and create the snapshots more quickly (this is also explained in the documentation: https://cloud.google.com/compute/docs/disks/snapshot-best-practices#prepare_for_consistency)
In the document that you linked, there is a description of how to add your mount point to /etc/fstab. Using the command line sudo mount -o ... mounts the disk temporarily, but the mount will be lost across reboots. Editing the /etc/fstab will cause the mount point to persist across reboots because that file is read during startup.
Thanks a lot for your answers. I'm sorry for my question that's not complete. I'm developer and I new in sysadmin.
As you can see here, I added "zonal persistent disk" (type permanent ssd disk) to my VM on Compute Engine (https://cloud.google.com/compute/docs/disks/add-persistent-disk).
Here says if I have a schedule snaphots it's no possible set automatically mount the disk to my VM after restart (that I may need for any reasons). So the question: how I sure that the restart, in addition to unmount the disc, it won't lose the data?
Having disk snapshots I would still be able to restore the data, but I would still understand what happens in this case. Meantime I read your suggestions on linux mount and I have understand that I will not lose the data on the disk with restart of the machine.
Thanks

How can one keep the data on a local SSD between stopping and restarting an instance

In my case I need only CPU compute for a while, and then at at the end I need GPUs. So I run the instance only with CPUs, then stop and restart with GPUs added (and CPUs reduced). However, it seems this will lead to the data on the local SSD being erased. Is there any way around that? Could one maybe back it up first with a snapshot for example and then restore the data to the local SSD after restarting the instance?
I have not tried out using local SSDs. I want to know what would happen.
You data may or may not survive machine restart - depending on how lucky on unlucky you are. Moreover, if your VM crashes (e.g. if underlying hardware fails) you may also lose contents of Local SSD at random time.
I don't think Local SSD implements snapshots or any sort of data redundancy functionality. You can however implement your own - e.g. you can partition your SSD using lvm, take lvm snapshots once in a while and upload them to e.g. GCS or store somewhere else.
In my experience, rebooting is typically fine, while shutting down will always result in data purge.
The easiest way I've found to backup and restore is to copy to/from a persistent drive or Google Cloud Storage. gsutil rsync works well for this. I don't believe snapshots work with local SSDs.
From google docs:
https://cloud.google.com/compute/docs/disks/local-ssd
Data on local SSDs persist only through the following events:
If you reboot the guest operating system.
If you configure your instance for live migration and the instance goes through a host maintenance event.
If the host system experiences a host error, Compute Engine makes a best effort to reconnect to the VM and preserve the local SSD data, but might not succeed. If the attempt is successful, the VM restarts automatically. However, if the attempt to reconnect fails, the VM restarts without the data. While Compute Engine is recovering your VM and local SSD, which can take up to 60 minutes, the host system and the underlying drive are unresponsive. To configure how your VM instances behave in the event of a host error, see Setting instance availability policies.
Data on Local SSDs does not persist through the following events:
If you shut down the guest operating system and force the instance to stop.
If you configure the instance to be preemptible and the instance goes through the preemption process.
If you configure the instance to stop on host maintenance events and the instance goes through a host maintenance event.
If the host system experiences a host error, and the underlying drive does not recover within 60 minutes, Compute Engine does not attempt to preserve the data on your local SSD. While Compute Engine is recovering your VM and local SSD, which can take up to 60 minutes, the host system and the underlying drive are unresponsive.
If you misconfigure the local SSD so that it becomes unreachable.
If you disable project billing. The instance will stop and your data will be lost.

When I stop and start an ec2 cents os instance , what data do I loose

I have an ec2 instance that is hosting a CentOS AMI image and the root device is EBS , however it is not EBS optimized.
I have installed a few packages on it now I want to stop and start it again , Amazon documentation says that the EBS data would be available but the instance store data would be lost.
How do I know where(EBS or Instance store) my packages are stored ? I see that package files are in /opr /var /etc directories .
Will I loose my installed packages if I stop and start the Amazon ec2 instance ?
Thanks.
When you create an EBS backed instance (with ephemeral or instance store storage, and it doesn't matter whether it's optimized or not optimized) you don't lose data in your /opt or /var or /etc directory or any of the system data. So you are safe to stop and then restart it. Keep in mind that your internal and public IP addresses change once you restart it.
The only data that you lose is if you have ephemeral volumes which are generally mounted volumes with devices like /dev/sdb, /dev/xvdb, /dev/xvdc, etc.
If you create an instance store "only" instance then you lose everything. However, you will be able to tell if your instance is this type by not having the option to "stop" it. Meaning you can only terminate it. These are the first type of instances that EC2 offered when they started and maybe up until 3-4 years ago were the only ones, so they are not used that much AFAIK unless you need an ephemeral volume as your root volume.
[Edit]
This is what it's supposed to look like for an EBS backed instance (non-optimized):
You will not lose your data if the instance is setup as EBS.
EBS optimised is another option which adds additional IOPS, useful for busy database applications, etc.

Growing Amazon EBS Volume sizes [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 9 years ago.
Improve this question
I'm quite impressed with Amazon's EC2 and EBS services. I wanted to know if it is possible to grow an EBS Volume.
For example: If I have a 50 GB volume and I start to run out of space, can I bump it up to 100 GB when required?
You can grow the storage, but it can't be done on the fly. You'll need to take a snapshot of the current block, add a new, larger block and re-attach your snapshot.
There's a simple walkthrough here based on using Amazon's EC2 command line tools
You can't simply 'bump in' more space on the fly if you need it, but you can resize the partition with a snapshot.
Steps do to this:
unmount ebs volume
create a ebs snapshot
add new volume with more space
recreate partition table and resize
filesystem
mount the new ebs volume
Look at http://aws.amazon.com/ebs/ - EBS Snapshot:
Snapshots can also be used to instantiate multiple new volumes,
expand the size of a volume or move
volumes across Availability Zones.
When a new volume is created, there is
the option to create it based on an
existing Amazon S3 snapshot. In that
scenario, the new volume begins as an
exact replica of the original volume.
By optionally specifying a different
volume size or a different
Availability Zone, this functionality
can be used as a way to increase the
size of an existing volume or to
create duplicate volumes in new
Availability Zones. If you choose to
use snapshots to resize your volume,
you need to be sure your file system
or application supports resizing a
device.
I followed all the answer, all have something missing with all respect.
If you follow these steps you can grow your EBS volume and keep your data (this is not for the root volume). For simplicity I am suggesting to use AWS consule to create snapshot,... you can do that using AWS command line tools too.
We are not touching the root volume here.
Goto your AWS console:
Shutdown your instance ( it will be for a few minutes only)
Detach the volume you are planning to grow (say /dev/xvdf)
Create a snapshot of the volume.
Make a new volume with a larger size using the snapshot you just created
Attach the new volume to your instance
Start your instance
SSH to your instance:
$ sudo fdisk -l
This gives your something like:
Disk /dev/xvdf: 21.5 GB, 21474836480 bytes
12 heads, 7 sectors/track, 499321 cylinders, total 41943040 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0xd3a8abe4
Device Boot Start End Blocks Id System
/dev/xvdf1 2048 41943039 20970496 83 Linux
Write down Start and Id values. (in this case 2048 and 83)
Using fdisk ,delete the partition xvdf1 and create a new one that starts exactly from the same block (2048). We will give it the same Id (83):
$ sudo fdisk /dev/xvdf
Command (m for help): d
Selected partition 1
Command (m for help): n
Partition type:
p primary (0 primary, 0 extended, 4 free)
e extended
Select (default p): p
Partition number (1-4, default 1):
Using default value 1
First sector (2048-41943039, default 2048):
Using default value 2048
Last sector, +sectors or +size{K,M,G} (2048-41943039, default 41943039):
Using default value 41943039
Command (m for help): t
Selected partition 1
Hex code (type L to list codes): 83
Command (m for help): w
The partition table has been altered!
Calling ioctl() to re-read partition table.
Syncing disks.
This step is explained well here: http://litwol.com/content/fdisk-resizegrow-physical-partition-without-losing-data-linodecom
Almost done, we just have to mount the volume and run resize2fs:
Mount the ebs volume: (mine is at /mnt/ebs1)
$ sudo mount /dev/xvdf1 /mnt/ebs1
and resize it:
$ sudo resize2fs -p /dev/xvdf1
resize2fs 1.42 (29-Nov-2011)
Filesystem at /dev/xvdf1 is mounted on /mnt/ebs1; on-line resizing required
old_desc_blocks = 1, new_desc_blocks = 2
Performing an on-line resize of /dev/xvdf1 to 5242624 (4k) blocks.
The filesystem on /dev/xvdf1 is now 5242624 blocks long.
ubuntu#ip-xxxxxxx:~$
Done! Use df -h to verify the new size.
As long a you are okay with a few minutes of downtime, Eric Hammond has written a good article on resizing the root disk on a running EBS instance: http://alestic.com/2010/02/ec2-resize-running-ebs-root
All great recommendations, and I thought I'd add this article I found, which relates to expanding a Windows Amazon EC2 EBS instance using the Amazon Web UI tools to perform the necessary changes. If you're not comfortable using CLI, this will make your upgrade much easier.
http://www.tekgoblin.com/2012/08/27/aws-guides-how-to-resize-a-ec2-windows-ebs-volume/
Thanks to TekGoblin for posting this article.
You can now do this through the AWS Management Console. The process is the same as in the other answers but you no longer need to go to the command line.
BTW: As with physical disks, it might be handy to use LVM; ex:
http://www.davelachapelle.ca/guides/ubuntu-lvm-guide/
http://www.centos.org/docs/5/html/Cluster_Logical_Volume_Manager/
Big advantage: It allows adding (or removing) space dynamically.
It can also easily be moved between/among instances.
Caveats:
it must be configured ahead of time
a simple JBOD setup means you lose everything if you lose one "disk"
My steps:
stop the instance
find the ebs volume attached to the instance and create a snapshot of it
create a new volume with bigger disk space using the above snapshot. Unfortunately the UI on the aws console to create a snapshot is almost unusable because it's listing all the snapshots on aws. Using command line tool is a lot easier, like this:
ec2-create-volume -s 100 --snapshot snap-a31fage -z us-east-1c
detach the existing ebs (smaller) volume from the instance
attach the new (bigger) volume to the instance, and make sure attach it to the same device the instance is expecting (in my case it is /dev/sda1)
start the instance
You are done!
Other than step 3 above, you can do everything using the aws management console.
Also NOTE as mentioned here:
https://serverfault.com/questions/365605/how-do-i-access-the-attached-volume-in-amazon-ec2
the device on your ec2 instance might be /dev/xv* while aws web console tells you it's /dev/s*.
Use command "diskpart" for Windows OS, have a look here : Use http://support.microsoft.com/kb/300415
Following are the steps I followed for a non-root disk (basic not dynamic disk)
Once you have taken a snapshot, dismounted the old EBS volume (say 600GB) and created a larger EBS volume (say 1TB) and mounted this new EBS volume - you would have to let Windows know of the resizing (from 600GB to 1TB) so at command prompt (run as administrator)
diskpart.exe
select disk=9
select volume=Z
extend
[my disk 9,volume labelled Z, was a volume of size 1TB created from an ec2-snapshot of size 600GB - I wanted to resize 600GB to 1TB and so could follow the above steps to do this.]
I highly recommend Logical Volume Manager (LVM) for all EBS volumes, if your operating system supports it. Linux distributions generally do. It's great for several reasons.
Resizing and moving of logical volumes can be done live, so instead of the whole offline snapshot thing, which requires downtime, you could just add create another larger EBS volume, add it to the LVM pool as a physical volume (PV), move the logical volume (LV) to it, remove the old physical volume from the pool, and delete the old EBS volume. Then, you simply resize the logical volume, and resize the filesystem on it. This requires no downtime at all!
It abstracts your storage from your 'physical' devices. Moving partitions across devices without needing downtime or changes to mountpoints/fstab is very handy.
It would be nice if Amazon would make it possible to resize EBS volumes on-the-fly, but with LVM it's not that necessary.
if your root volume is xfs file system then then run this command xfs_growfs /