Restoration from snapshots in virtualbox

Restoration from snapshots in virtualbox - virtualbox

I am using virtual box and maintaining a regular back by taking snapshots and storing it in an external hard disk.Now the system in which my virtual box was installed have crashed. How can i recover my last work from the snapshots that have stored in the external hard disk.

Snapshots are essentially "diff files" meaning that this file documents the changes between sessions (or within a session).
You can't diff a non existing base.
Example:
Lets look at the following set of commands:
Pick a number
Add 3
Substract 4
Multiply by 2
Now the outcome would change according to the first number you've picked so if the base is unknown the set of "diff" doesn't really help.
Try putting your hands on the vmdk/vhd/vdi file again, this might do the trick more easily.
Kind regards,
Yaron Shahrabani.

The way I did is that you copy the snapshot to the default Snapshot folder of that vm e.g. for my Windows 2000 "Snapshot" folder is in /home/mian/VirtualBox VMs/Windows_Professional_2000_sp4_x86/Snapshots
Once copied run this command
vboxmanage snapshot Windows_professional_2000_sp4_x86 restore Snapshot5
If the name contain space e.g Snapshot 5 then run this command
vboxmanage snapshot Windows_professional_2000_sp4_x86 restore Snapshot\ 5
It is for linux but you can almost similar for windows too like changing vboxmanage to vboxmanage.exe

Related

How do you clear the persistent storage for a notebook instance on AWS SageMaker?

So I'm running into the following error on AWS SageMaker when trying to save:
Unexpected error while saving file: untitled.ipynb [Errno 28] No space left on device
If I remove my notebook, create a new identical one and run it, everything works fine. However, I'm suspecting the Jupyter checkpoint takes up too much space if I save the notebook while it's running and therefore I'm running out of space. Sadly, getting more storage is not an option for me, so I'm wondering if there's any command I can use to clear the storage before running my notebook?
More specifically, clearing the persistent storage in the beginning and at the end of the training process.
I have googled like a maniac but there is no suggestion aside from "just increase the amount of storage bro" and that's why I'm asking the question here.
Thanks in advance!

If you don't want your data to be persistent across multiple notebook runs, just store them in /tmp which is not persistent. You have at least 10GB. More details here.

I had the exact same problem, and was not unable to find a decent answer to it online. However, I was fortunately able to resolve the issue.
I use an R kernel, so the solution might be slightly different.
You can check the storage going in the terminal and typing db -kh
You are likely mounted on the /home/ec2-user/SageMaker and can see its "Size" "Used" "Avail" and "Use%".
There are hidden folders that function as a recycle bin. When I use R command list.dirs() it reveals a folder named ./.Trash-1000/ which kept a lot of random things that had been supposedly removed from the storage.
I just deleted the folder unlink('./.Trash-1000/', recursive = T) and it the entire storage was freed.
Hope it helps.

Batch File to compare 1 local and 1 network folder and copy local to another local if they match

I have a network folder that contains sub-folders and files on a network drive. I want to automate copying the files and folders to my 4 local computers. Due to bandwidth issues I have a scheduled task that pulls the update over at night to a single computer. I would like a batch file for the other 3 local computers that can verify when the 2 folders on separate devices (1 local and 1 remote) are in sync then copy the local files to itself.
I have looked through Robocopy, and several of the other compare commands and I see they give me a report of the differences, but what I am looking for is something conditional to continue batch processing. I would execute it from a scheduled task, but basically it would perform like:
IF \remotepc\folder EQU \localpc1\folder" robocopy "\localpc1\folder" "c:\tasks\updater" /MIR
ELSE GOTO EOF
Thanks in advance. Any help is appreciated.

A Beyond Compare script can compare two folders, then copy matching files to a third folder.
Script:
load c:\folder1 \\server1\share\folder1
expand all
select left.exact.files
copyto left path:base c:\output
To run the script, use the command line:
"c:\program files\beyond compare 4\bcompare.exe" "#c:\script.txt"
The # character makes Beyond Compare run a file as a script instead of loading it for GUI comparison.
Beyond Compare scripting resources:
Beyond Compare Help - Scripting
Beyond Compare Help - Scripting Reference
Beyond Compare Forums - Scripting

HDFS space release - optimal solution

I would like to release some space in HDFS, so I need to find out some of the unwanted/unused HDFS blocks/files and need to delete or archive. So what would be considered as an optimal solution as of now? I am using Clouder distribution. (My cluster HDFS capacity is 900 TB and used 700 TB)

If you are running a licensed version of Cloudera, you can use Cloudera Navigator to see which files have not been used for a period of time and you can assign a policy to delete them.
If not, you are likely looking at writing scripts to identify the files that haven't been used and you'll have to manually delete them.

AWS ec2 to run a python program using latex and OpenCV

A friend and I are working on a machine learning project together. We've managed to collect about 5,000 tex documents (we hope to get up to around 100,000 soon). We have a python script that we run on each document to do some text manipulation, extract particular parts of the tex code, compile the parts, convert the compiled parts to cropped PNG images, and search a converted PNG of the full tex for the cropped images using OpenCV. The code takes between 30 seconds and 2 minutes on the documents we've tried so far, so we really need to speed it up.
I've been tasked with gaining access to a computer cluster and figuring out how to implement our code on such a cluster. Someone suggested I look into using AWS, so I've made an account and have been trying to figure out how to use EC2 for the past few hours. Am I on the right track, or is there some other part of AWS or something else entirely that would be better suited to my task?
Whatever I use, it has to have access to the various python libraries in our code and to pdflatex and the full set of tex packages. Is this possible on EC2? I have almost no idea how to go about using EC2 (I've managed to start some instances, but how do I use them to run my script? and do I need to change my python script to accomodate the parallel processing, or does EC2 take care of that somehow? is it as easy as starting a linux instance and installing the programs I need like I would on any other linux machine?). None of the tutorials are immediately useful, and I'm still not even sure if EC2 is capable of doing what I'm looking for. Any advice is appreciated.

I wouldn't normally answer this kind of question but it sounds like you are doing something interesting. So let's have a go
Q1.
"We have a python script that we run on each document to do some text
manipulation, extract particular parts of the tex code, compile the
parts, convert the compiled parts to cropped PNG images, and search a
converted PNG of the full tex for the cropped images using OpenCV.. we
really need to speed it up"
Probably you could split the 100,000 documents into 10 parts and set up
10 instances of the processing software and do the run in parallel.
To set up 10 instances the same, there are many methods but one of the simpler ways is to set up one machine as desired, take a snapshot, make an AMI and then
use the AMI to launch many more copies.
There might be an extra step with putting the results of the search into some
kind of central database.
I don't know anything about OpenCV but there are several suggestions that with a G3 instance type (this has a GPU) it might go faster. Google for "Open CV on AWS"
Q2.
"trying to figure out how to use EC2 for the past few hours. Am I on
the right track, or is there some other part of AWS or something else
entirely that would be better suited to my task?"
EC2 is a general purpose virtual machine, so if you already have code that runs on
some other machine it is easy to move it to EC2
EC2 has many features but one you might find interesting is "spot instances", these are short lived but cheap ( typically 10% of the price ) instance launch
Q3.
Whatever I use, it has to have access to the various python libraries
in our code and to pdflatex and the full set of tex packages. Is this
possible on EC2?
Yes, they will pip install or install from packages just like any other system
Q4.
how do I use them to run my script? and do I need to change my python
script to accomodate the parallel processing, or does EC2 take care of
that somehow? is it as easy as starting a linux instance and
installing the programs I need like I would on any other linux
machine?
As described above your basic task seems to scale well, you may need a step to
collate the results. Yes it is basically the same as any other linux machine

Detecting mounted drives on Linux and Mac OS X

I’m using QDir::drives() to get the list of drives. It works great on Windows, but on Linux and Mac it only returns a single item “/”, i. e. root. It is expected behavior, but how can I get a list of drives on Mac and Linux?
Non-Qt, native API solutions are also welcome.
Clarification on "drive" definition: I'd like to get a list of mount points that are visble as "drives" in Finder or Linux built-in file manager.

As far as the filesystem is concerned, there is no concept of drives in Unix/Linux (I can't vouch for MacOSX but I'd say it's the same). The closest thing would probably be mount points, but a normal application shouldn't bother about them since all is already available under the filesystem root / (hence the behaviour of QDir::drives() that you observe).
If you really want to see which mount points are in use, you could parse the output of the mount command (without any arguments) or, at least on Linux, the contents of the /etc/mtab file. Beware though, mount points can get pretty hairy real quick (loop devices, FUSE filesystems, network shares, ...) so, again, I wouldn't recommend making use of them unless your application is designed to administer them.
Keep in mind that on Unix-y OSes, mount points are normally a matter for system administrators, not end-users, unless we're speaking of removable media or transient network shares.
Edit: Following your clarifications in the comments, on Linux you should use getmntent or getmntent_r to parse the contents of the /etc/mtab file and thus get a list of all mount points and the corresponding devices.
The trick after that is to determine which ones you want to display (removable? network share?). I know that /sys/block/... can help with that, but I don't know all the details so you'll have to dig a bit more.
For example, to check whether /dev/sdd1 (a USB key) mounted on /media/usb0/ is a removable device, you could do (note how I use the device name sdd, not the partition name sdd1):
$ cat /sys/block/sdd/removable
1
As opposed to my main hard drive:
$ cat /sys/block/sda/removable
0
Hope this puts you on the right track.

For OS X, the Disk Arbitration framework can be used to list and monitor drives and mount points

Scraping the output of mount shell command is certainly one option on either platform - although, what is your definition of a drive here? Physical media, removable drivers, network volumes? You'll need to do a lot of filtering.
On MacOSX, the mount point for removable media, network volumes, and secondary hard-drives is always under /Volumes/, so simply enumerating items in this directory will do the trick if your definition of a drive is broad. This ought to be fairly safe as they're all automounted .
On Linux, there are a variety of locations depending on the particular distro in use. /mnt/ is the traditional, but there are others.

In linux, the way to get information about drives currently mounted is to parse the mtab file. glibc provides a macro _PATH_MNTTAB to locate this file. See http://www.gnu.org/software/libc/manual/html_node/Mount-Information.html#Mount-Information

If you know the format of the drive/drives in question, you can use the df command to output the list of drives from the console or programatically as a system command. For example, to find all the ext4 drives:
df -t ext4
You can simply add additional formats onto the same command if you are interested in more than one type:
df -t ext4 -t tmpfs
This is going to return to you the physical location of the drive, the amount of memory it has, the amount of memory used, the amount of memory free, the use% and where it is mounted on the filesystem.
df will show you all of the drives mounted on the system, but some are going to be things that aren't really what you are looking for like temporary file systems, etc.
Not sure if this will work on OSX or not, but it does work on my Ubuntu 12.04 distribution.

Another way is to check for "Volumes"
df -H | grep "/Volumes"

I know that this is old, but it failed to mention getfsstat which I ended up using in macos. You can get a list of mounts (which will include most disks) using getfsstat. See man 2 getfsstat for further information.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js