bit of background:
I have an esxi 5.5 cluster with vcenter HA.
I have multiple iscsi LUNs which are hosted on Ubuntu running iscsi target and software RAID (mdadm).
A few days ago I noticed a bunch of vm's were inaccessible.
I removed them from inventory thinking I'd add them back by browsing the datastore.
The datastore was showing inactive. The other datastores (same server) were fine.
rescan/refresh didnt work. I removed from inventory all the vm's hosted on the datastore with the problem but wasnt able to remove it still.
"HostDatastoreSystem.RemoveDatastore" for object on vCenter Server .
on the esxi hosts I ran /etc/init.d/storageRM stop then rescanned and restarted storageRM. This got rid of the datastore from vcenter console.
Tried to remove and add it back from the iscsi adapter, this was fine.
But when I try to add it as a datastore under configuration/storage I get another error - unable to read the partition information for device.
Its VMFS5, mirrored RAID1. 4tb.
I've logged onto the esxi shell directly on one of the hosts and used partedUtil to investigate and try to repair it.
getting the following if I try to getUsableSectors or getptbl
Error: The primary GPT table states that the backup GPT is located beyond the end of disk. This may happen if the disk has shrunk or partition table is corrupted. Fix, by writing backup table at the end? This will also fix the last usable sector appropriately as per the new reduced size. diskPath (/dev/disks/t10.94544500000000002318F588822755821C9CFF1605288097) diskSize (7813774720) AlternateLBA (23441323007) LastUsableLBA (23441322974)
Warning: The available space to /dev/disks/t10.94544500000000002318F588822755821C9CFF1605288097 appears to have shrunk. This may happen if the disk size has reduced. The space has been reduced by (15627548288 blocks). You can fix the GPT to correct the available space or continue with the current settings ? This will also move the backup table at the end if it is not at the end already. diskSize (7813774720) AlternateLBA (23441323007) LastUsableLBA (23441322974) NewLastUsableLBA (7813774686)
Error: Can't have a partition outside the disk!
Unable to read partition table for device /vmfs/devices/disks/t10.94544500000000002318F588822755821C9CFF1605288097
trying to fix it:
partedUtil fixGpt /vmfs/devices/disks/t10.94544500000000002318F588822755821C9CFF1605288097
FixGpt tries to fix any problems detected in GPT table.
Please ensure that you don't run this on any RDM (Raw Device Mapping) disk.
Are you sure you want to continue (Y/N): y
Error: The primary GPT table states that the backup GPT is located beyond the end of disk. This may happen if the disk has shrunk or partition table is corrupted. Fix, by writing backup table at the end? This will also fix the last usable sector appropriately as per the new reduced size. diskPath (/dev/disks/t10.94544500000000002318F588822755821C9CFF1605288097) diskSize (7813774720) AlternateLBA (23441323007) LastUsableLBA (23441322974)
Fix/Ignore/Cancel? fix
Error: Can't have a partition outside the disk!
Unable to read partition table on device /vmfs/devices/disks/t10.94544500000000002318F588822755821C9CFF1605288097
One of the other datastores is identical with identical disks so I tried to setptbl using the size from that.
partedUtil setptbl /vmfs/devices/disks/t10.94544500000000002318F588822755821C9CFF1605288097 gpt "1 2048 7813774686 AA31E02A400F11DB9590000C2911D1B8 0"
gpt
0 0 0 0
1 2048 7813774686 AA31E02A400F11DB9590000C2911D1B8 0
Error: The primary GPT table states that the backup GPT is located beyond the end of disk. This may happen if the disk has shrunk or partition table is corrupted. Fix, by writing backup table at the end? This will also fix the last usable sector appropriately as per the new reduced size. diskPath (/dev/disks/t10.94544500000000002318F588822755821C9CFF1605288097) diskSize (7813774720) AlternateLBA (23441323007) LastUsableLBA (23441322974)
Warning: The available space to /dev/disks/t10.94544500000000002318F588822755821C9CFF1605288097 appears to have shrunk. This may happen if the disk size has reduced. The space has been reduced by (15627548288 blocks). You can fix the GPT to correct the available space or continue with the current settings ? This will also move the backup table at the end if it is not at the end already. diskSize (7813774720) AlternateLBA (23441323007) LastUsableLBA (23441322974) NewLastUsableLBA (7813774686)
Error: Can't have a partition outside the disk!
On the iscsitarget host the LUNs show healthy. mdstat also shows healthy RAID and disks.
Is there anything else I can try to repair this and recover the vm's?
Thanks for helping.
Related
I have set up apache geode for caching.
Cluster Configuration:
Locator: 1GB - Mounted volume 2GB
Server2: 1GB - Mounted volume 2GB
Server2: 1GB - Mounted volume 2GB
Region configuration in cache.xml
<region name="answerCache">
<region-attributes data-policy="PARTITION_PERSISTENT_OVERFLOW">
<eviction-attributes>
<lru-heap-percentage action="overflow-to-disk" />
</eviction-attributes>
</region-attributes>
</region>
Geode pushes the data to disk (based on LRU) when region fills with data.
But I'm not getting any configuration where geode lets me delete entry from disk if its getting filled.
I'm getting Out of memory error if disk gets full.
I want to apply LRU on disk writes as well so that least used entries can be deleted from disk.
I don't think there's a feature like this embedded within Apache Geode at the moment and, according to how I see it, it wouldn't make much sense to add it either way. The overflow feature basically limits the region size in memory by moving the values of least recently used (LRU) entries to disk (values only), the keys are kept in memory with a "pointer" to the actual entry on disk so they can be recovered whenever needed.
If you want to remove entries from the disk-store, you first need to delete them from the actual Region on memory (Region.destroy, Region.remove, etc.), Apache Geode will handle the deletion process and remove the entry from disk as well, automatically.
Cheers.
You can use GFSH disk store commands to manage disk stores. You can even use a GUI (https://github.com/ldom22/GGGUI)
In physical machine, I can do partition with command 'fdisk' by steps as below link:
http://puremonkey2010.blogspot.com/2017/01/linux-linux-hard-disk-format-command.html
But in Google cloud VM instance, it is not allowed to do so:
Command (m for help): w The partition table has been altered!
Calling ioctl() to re-read partition table.
WARNING: Re-reading the partition table failed with error 16: Device
or resource busy. The kernel still uses the old table. The new table
will be used at the next reboot or after you run partprobe(8) or
kpartx(8) Syncing disks.
So supposed I have a partition as below:
[root#johnwiki Tasks]# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 40G 0 disk
└─sda1 8:1 0 20G 0 part /
How do I create a new partition sdb2 to use the rest 20G from sda?
Many thanks!
Ps I tried to look into document from google from here and no proper example to show me how to.
============= Solved =============
It turned out that the fdisk can work as well. I just need to reboot the instance in order to reflect the action in doing partition. Otherwise I won't be able do mkfs on the /dev/sda2 (new partition).
Reference:
- https://blog.gtwang.org/linux/linux-add-format-mount-harddisk/
There is some documentation in the following link that you can follow to resize the file system and partitions on a persistent disk on a Cloud VM instance.
https://cloud.google.com/compute/docs/disks/add-persistent-disk#resize_partitions
My question is so simple:
What happens when I increase the size of running volume of ec2 instance.
1) Does my all data wiped ?
2) Does the space of my instance will also modify with new size ?
Actually my instance has storage of 8GB and that is almost full. I want to increase space that can help me to save more files to my instance.
I have found this option in my console.
I have found that connected ec2 volume. Does directly modifying the volume size will automatically reflect my instance space after reboot.
I
know this is quiet simple. I am just worried about my existing data.
Thank you for your help !
Assuming you have found the option in console to modify the size of the instance and the Instance here is Linux Instance. What the other answer forgets to mentions an important thing that is according to AWS Documentation:
Modifying volume size has no practical effect until you also extend
the volume's file system to make use of the new storage capacity. For
more information, see Extending a Linux File System after Resizing the
Volume.
For ext2, ext3, and ext4 file systems, this command is resize2fs. For XFS file systems, this command is xfs_growfs
Note:
If the volume you are extending has been partitioned, you need to increase the size of the partition before you can resize the file system
To check if your volume partition needs resizing:
Use the lsblk command to list the block devices attached to your instance. The example below shows three volumes: /dev/xvda, /dev/xvdb, and /dev/xvdf.
In Case if the partition occupies all of the room on the device, so it does not need resizing.
However, /dev/xvdf1if is an 8-GiB partition on a 35-GiB device and there are no other partitions on the volume. In this case, the partition must be resized in order to use the remaining space on the volume.
To extend a Linux file system
Log In to Instance via SSH
Use the df -h command to report the existing disk space usage on the file system.
Expand the modified partition using growpart (and note the unusual syntax of separating the device name from the partition number):
sudo growpart /dev/xvdf 1
Then Use a file system-specific command to resize each file system to the new volume capacity.
Finally Use the df -h command to report the existing file system disk space usage
Note : It is Recommended to take snapshot of ebs volume before making any changes.
Please Refer to this AWS Documentation
Well you can just modify the volume directly and this will not affect any file, it will take around 1 min or so to upgrade the size or you might want to restart your instance.
to ensure data safety you can create a snapshot of that volume and from that snapshot create a new volume of whatever size you want and delete the old volume which now contains old data.
I had one disk attached to an instance & i had taken snapshot of it.
Now, after few days - the disk went bad and i want to restore the disk.
What i have implemented is :
Store metadata of snapshot, when taken
When restore request comes, i create new disk from snapshot
detach original disk (say attached inside host as /dev/sdz )
attach Newly created disk to the same instance
With this way, the user will get the view that the disk has been restored using the snapshot he had taken.
Now, the problem i'm seeing with this approach is :
as the original disk was attached as /dev/sdz, after detach & attach of NEW disk, the new disk should be seen as /dev/sdz ONLY,
Otherwise the application or upper-layers may break.
So, is there any provision that google-cloud APIs provide to handle this ?
PLEASE NOTE: I'm using google-api-python-client library & code is in Python.
I believe the name you are referring to is the "index" of the disk. I am not sure of that however. If that is the case, you would just need to make sure the index of the new disk matches the index of the disk you remove.
That being said, there are better ways to do this if you can modify your fstab. For example, you can use the "deviceName" by mounting /dev/disk/by-id/whatever in which case you would just need to make sure that the new disk has the same deviceName as the old disk.
Another option is to use the UUID of the filesystem to mount. Since these new disks are snapshots of the old disk, they will have the same UUID.
ls -l /dev/disk/by-uuid/
That should not change unless you reformat the partition entirely. In your fstab, instead of /dev/sdz1, you would use UUID=ef7481ea-a6f9-425b-940f-56e9c93492dd or whatever.
I've tried default setup of Data Pipeline for cross-regional table copy. Copying one table to another in same region (eu-west-1).
On pipeline activation, EMR cluster is launched, runs for approx 20 minutes and then it's terminated with pipeline being in "success" state.
Problem is that only 389 entries are copied from my table :/ (number is the same on multiple runs). Total number of entries is close to 100000.
I've tried turning on logs (no errors there), walking through them, launching 4.1.0 cluster, increasing throughputs, etc, nothing solves the case.
Does cross-regional table copy work? What could be the problem? Why no error? How do I debug it?
Datapipeline config: https://gist.github.com/mariusgrigaitis/adceb18354b52d845278