Disk clean up in Apache geode - diskspace

I have set up apache geode for caching.
Cluster Configuration:
Locator: 1GB - Mounted volume 2GB
Server2: 1GB - Mounted volume 2GB
Server2: 1GB - Mounted volume 2GB
Region configuration in cache.xml
<region name="answerCache">
<region-attributes data-policy="PARTITION_PERSISTENT_OVERFLOW">
<eviction-attributes>
<lru-heap-percentage action="overflow-to-disk" />
</eviction-attributes>
</region-attributes>
</region>
Geode pushes the data to disk (based on LRU) when region fills with data.
But I'm not getting any configuration where geode lets me delete entry from disk if its getting filled.
I'm getting Out of memory error if disk gets full.
I want to apply LRU on disk writes as well so that least used entries can be deleted from disk.

I don't think there's a feature like this embedded within Apache Geode at the moment and, according to how I see it, it wouldn't make much sense to add it either way. The overflow feature basically limits the region size in memory by moving the values of least recently used (LRU) entries to disk (values only), the keys are kept in memory with a "pointer" to the actual entry on disk so they can be recovered whenever needed.
If you want to remove entries from the disk-store, you first need to delete them from the actual Region on memory (Region.destroy, Region.remove, etc.), Apache Geode will handle the deletion process and remove the entry from disk as well, automatically.
Cheers.

You can use GFSH disk store commands to manage disk stores. You can even use a GUI (https://github.com/ldom22/GGGUI)

Related

AWS S3 Block Size to calculate total number of mappers for Hive Workload

Does S3 stores the data in form of blocks? if yes, what is the default block size? Is there a way to alter the block size?
Block Size is not applicable to Amazon S3. It is an object storage system, not a virtual disk.
There is believed to be some partitioning of uploaded data into the specific blocks it was uploaded -and if you knew those values then readers may get more bandwidth. But certainly the open source hive/spark/mapreduce applications don't know the API calls to find this information out or look at these details. Instead the S3 connector takes some configuration option (for s3a: fs.s3a.block.size) to simulate blocks.
It's not so beneficial to work out that block size if it took an HTTP GET request against each file to determine the partitioning...that would slow down the (Sequential) query planning before tasks on split files were farmed out to the worker nodes. HDFS lets you get the listing and block partitioning + location in one API call (listLocatedStatus(path)); S3 only has a list call to return the list of (objects, timestamps, etags) under a prefix, (S3 List API v2) so that extra check would slow things down. If someone could fetch that data and show that there'd be benefits, maybe it'd be useful enough to implement. For now, calls to S3AFIleSystem.listLocatedStatus() against S3 just get some made up list of locations splitting of blocks by the fs.s3a.block.size value and with the location (localhost). All the apps known that location == localhost means "whatever"

Best way to retire an index

I am retiring an old elastic search index in AWS that has not received a new document since 2016. However, something is still trying to search it.
I still want deprecate this index in a manner manner where I can get back to the original state quickly. I have created a manual snapshot of the index and it is sitting in S3. I was planning on deleting the domain, but, from what I understand, that deletes everything billable under AWS including the end point. As I mentioned above, I want to be able to get back to the original state of the index. So this domain contains a series of indexes. The largest index is 20.5 Gb. I was going to delete the large index and resize the cluster to a smaller instance size and footprint. Will this work or will it be unsearchable?
I've no experience using Elasticsearch on AWS, but I have an idea about your index.
You say the index has received no new documents for a long time. If this also means no deletions and no updates, you could theoretically just take this index to a new cluster, using either snapshot + restore, or a cross-cluster reindex. Continue operating your old cluster until you're sure the new one is working well.
Again - not familiar with AWS terminology, but it sounds like this approach translates to using separate "domains". First you fully ensure the new "domain" is working with the right hardware spec and data, and then delete the old "domain".
TL;DR -> yes!
The backup to S3 will work, but the documents will be unsearchable because in order to downsize the storage you have to delete the index.
But if someday you want to restore the data from S3 back to the index, you can.
You can resize instances and storage sizes with no downtime, however, that takes a long time and you pay extra for the machines while they are resizing.
Example:
you change your storage size from 100gb to 99gb
elasticsearch service will spin up another instance, copy all your data from the old instance to the new one and then delete the old one.
same for instance sizes.
machine up, cluster sync, machine down.
while they are syncing, you pay for them.
your plan will work, es is very flexible.
if you really don't trust aws, just make a json export from the index and keep it on s3 too, just in case things go south.

What happen when I increase the size of running volume of ec2 instance

My question is so simple:
What happens when I increase the size of running volume of ec2 instance.
1) Does my all data wiped ?
2) Does the space of my instance will also modify with new size ?
Actually my instance has storage of 8GB and that is almost full. I want to increase space that can help me to save more files to my instance.
I have found this option in my console.
I have found that connected ec2 volume. Does directly modifying the volume size will automatically reflect my instance space after reboot.
I
know this is quiet simple. I am just worried about my existing data.
Thank you for your help !
Assuming you have found the option in console to modify the size of the instance and the Instance here is Linux Instance. What the other answer forgets to mentions an important thing that is according to AWS Documentation:
Modifying volume size has no practical effect until you also extend
the volume's file system to make use of the new storage capacity. For
more information, see Extending a Linux File System after Resizing the
Volume.
For ext2, ext3, and ext4 file systems, this command is resize2fs. For XFS file systems, this command is xfs_growfs
Note:
If the volume you are extending has been partitioned, you need to increase the size of the partition before you can resize the file system
To check if your volume partition needs resizing:
Use the lsblk command to list the block devices attached to your instance. The example below shows three volumes: /dev/xvda, /dev/xvdb, and /dev/xvdf.
In Case if the partition occupies all of the room on the device, so it does not need resizing.
However, /dev/xvdf1if is an 8-GiB partition on a 35-GiB device and there are no other partitions on the volume. In this case, the partition must be resized in order to use the remaining space on the volume.
To extend a Linux file system
Log In to Instance via SSH
Use the df -h command to report the existing disk space usage on the file system.
Expand the modified partition using growpart (and note the unusual syntax of separating the device name from the partition number):
sudo growpart /dev/xvdf 1
Then Use a file system-specific command to resize each file system to the new volume capacity.
Finally Use the df -h command to report the existing file system disk space usage
Note : It is Recommended to take snapshot of ebs volume before making any changes.
Please Refer to this AWS Documentation
Well you can just modify the volume directly and this will not affect any file, it will take around 1 min or so to upgrade the size or you might want to restart your instance.
to ensure data safety you can create a snapshot of that volume and from that snapshot create a new volume of whatever size you want and delete the old volume which now contains old data.

Attaching the disk with same device-path or UUID

I had one disk attached to an instance & i had taken snapshot of it.
Now, after few days - the disk went bad and i want to restore the disk.
What i have implemented is :
Store metadata of snapshot, when taken
When restore request comes, i create new disk from snapshot
detach original disk (say attached inside host as /dev/sdz )
attach Newly created disk to the same instance
With this way, the user will get the view that the disk has been restored using the snapshot he had taken.
Now, the problem i'm seeing with this approach is :
as the original disk was attached as /dev/sdz, after detach & attach of NEW disk, the new disk should be seen as /dev/sdz ONLY,
Otherwise the application or upper-layers may break.
So, is there any provision that google-cloud APIs provide to handle this ?
PLEASE NOTE: I'm using google-api-python-client library & code is in Python.
I believe the name you are referring to is the "index" of the disk. I am not sure of that however. If that is the case, you would just need to make sure the index of the new disk matches the index of the disk you remove.
That being said, there are better ways to do this if you can modify your fstab. For example, you can use the "deviceName" by mounting /dev/disk/by-id/whatever in which case you would just need to make sure that the new disk has the same deviceName as the old disk.
Another option is to use the UUID of the filesystem to mount. Since these new disks are snapshots of the old disk, they will have the same UUID.
ls -l /dev/disk/by-uuid/
That should not change unless you reformat the partition entirely. In your fstab, instead of /dev/sdz1, you would use UUID=ef7481ea-a6f9-425b-940f-56e9c93492dd or whatever.

vmware esxi 5.5 iscsi gpt repair

bit of background:
I have an esxi 5.5 cluster with vcenter HA.
I have multiple iscsi LUNs which are hosted on Ubuntu running iscsi target and software RAID (mdadm).
A few days ago I noticed a bunch of vm's were inaccessible.
I removed them from inventory thinking I'd add them back by browsing the datastore.
The datastore was showing inactive. The other datastores (same server) were fine.
rescan/refresh didnt work. I removed from inventory all the vm's hosted on the datastore with the problem but wasnt able to remove it still.
"HostDatastoreSystem.RemoveDatastore" for object on vCenter Server .
on the esxi hosts I ran /etc/init.d/storageRM stop then rescanned and restarted storageRM. This got rid of the datastore from vcenter console.
Tried to remove and add it back from the iscsi adapter, this was fine.
But when I try to add it as a datastore under configuration/storage I get another error - unable to read the partition information for device.
Its VMFS5, mirrored RAID1. 4tb.
I've logged onto the esxi shell directly on one of the hosts and used partedUtil to investigate and try to repair it.
getting the following if I try to getUsableSectors or getptbl
Error: The primary GPT table states that the backup GPT is located beyond the end of disk. This may happen if the disk has shrunk or partition table is corrupted. Fix, by writing backup table at the end? This will also fix the last usable sector appropriately as per the new reduced size. diskPath (/dev/disks/t10.94544500000000002318F588822755821C9CFF1605288097) diskSize (7813774720) AlternateLBA (23441323007) LastUsableLBA (23441322974)
Warning: The available space to /dev/disks/t10.94544500000000002318F588822755821C9CFF1605288097 appears to have shrunk. This may happen if the disk size has reduced. The space has been reduced by (15627548288 blocks). You can fix the GPT to correct the available space or continue with the current settings ? This will also move the backup table at the end if it is not at the end already. diskSize (7813774720) AlternateLBA (23441323007) LastUsableLBA (23441322974) NewLastUsableLBA (7813774686)
Error: Can't have a partition outside the disk!
Unable to read partition table for device /vmfs/devices/disks/t10.94544500000000002318F588822755821C9CFF1605288097
trying to fix it:
partedUtil fixGpt /vmfs/devices/disks/t10.94544500000000002318F588822755821C9CFF1605288097
FixGpt tries to fix any problems detected in GPT table.
Please ensure that you don't run this on any RDM (Raw Device Mapping) disk.
Are you sure you want to continue (Y/N): y
Error: The primary GPT table states that the backup GPT is located beyond the end of disk. This may happen if the disk has shrunk or partition table is corrupted. Fix, by writing backup table at the end? This will also fix the last usable sector appropriately as per the new reduced size. diskPath (/dev/disks/t10.94544500000000002318F588822755821C9CFF1605288097) diskSize (7813774720) AlternateLBA (23441323007) LastUsableLBA (23441322974)
Fix/Ignore/Cancel? fix
Error: Can't have a partition outside the disk!
Unable to read partition table on device /vmfs/devices/disks/t10.94544500000000002318F588822755821C9CFF1605288097
One of the other datastores is identical with identical disks so I tried to setptbl using the size from that.
partedUtil setptbl /vmfs/devices/disks/t10.94544500000000002318F588822755821C9CFF1605288097 gpt "1 2048 7813774686 AA31E02A400F11DB9590000C2911D1B8 0"
gpt
0 0 0 0
1 2048 7813774686 AA31E02A400F11DB9590000C2911D1B8 0
Error: The primary GPT table states that the backup GPT is located beyond the end of disk. This may happen if the disk has shrunk or partition table is corrupted. Fix, by writing backup table at the end? This will also fix the last usable sector appropriately as per the new reduced size. diskPath (/dev/disks/t10.94544500000000002318F588822755821C9CFF1605288097) diskSize (7813774720) AlternateLBA (23441323007) LastUsableLBA (23441322974)
Warning: The available space to /dev/disks/t10.94544500000000002318F588822755821C9CFF1605288097 appears to have shrunk. This may happen if the disk size has reduced. The space has been reduced by (15627548288 blocks). You can fix the GPT to correct the available space or continue with the current settings ? This will also move the backup table at the end if it is not at the end already. diskSize (7813774720) AlternateLBA (23441323007) LastUsableLBA (23441322974) NewLastUsableLBA (7813774686)
Error: Can't have a partition outside the disk!
On the iscsitarget host the LUNs show healthy. mdstat also shows healthy RAID and disks.
Is there anything else I can try to repair this and recover the vm's?
Thanks for helping.