What happens under live migration - vmware

I want to understand what happens under the hood in a live migration for execution of my final year project
According to my understanding ,with two host sharing a common storage via SAN
1)When a vm is migrated from one host to another host,the VM files are transferred from one ESXI to another ,but the question is they have a shared storage so how are they going to transfer.
2)VMDK,snapshots files are transferred during live migration
Now I have questions
1)Only VMDK,.vmx files are transferred
2)with VMotion the memory pages are transferred,so what are this memory pages,are they files ,or what are they physically
3)Where is the code for migration present,in hypervisor or VCenter
4)Can we get a stacktrace for vm ,hypervisor during a migration and if yes how would that be possible (I tried a strace to get a basic on how a VM (ubuntu) would call a hypervisor but that only gives me till the linux system and not beyond that )
Can anyone please guide me on this .

VMotion overview
Phase 1: Guest Trace Phase
The guest VM is staged for migration during this phase. Traces are
placed on the guest memory pages to track any modifications by the
guest during the migration. Tracing all of the memory can cause a
brief, noticeable drop in workload throughput. The impact is generally
proportional to the overall size of guest memory.
Phase 2: Precopy Phase
Because the virtual machine continues to run and actively modify its
memory state on the source host during this phase, the memory contents
of the virtual machine are copied from the source vSphere host to the
destination vSphere host in an iterative process. The first iteration
copies all of the memory. Subsequent iterations copy only the memory
pages that were modified during the previous iteration. The number of
precopy iterations and the number of memory pages copied during each
iteration depend on how actively the memory is changed on the source
vSphere host, due to the guest’s ongoing operations. The bulk of
vMotion network transfer is done during this phase—without taking any
significant number of CPU cycles directly from the guest. One would
still observe an impact on guest performance, because the write trace
fires during the precopy phase will cause a slight slowdown in page
writes.
Phase 3: Switchover Phase
During this final phase, the virtual machine is momentarily
quiesced on the source vSphere host, the last set of memory
changes are copied to the target vSphere host, and the virtual
machine is resumed on the target vSphere host. The guest briefly
pauses processing during this step. Although the duration of this
phase is generally less than a second, it is the most likely phase
where the largest impact on guest performance (an abrupt, temporary
increase of latency) is observed. The impact depends on a variety of
factors not limited to but including network infrastructure, shared
storage configuration, host hardware, vSphere version, and dynamic
guest workload.
From my experience, I would say I am always loosing at least 1 ping during Phase 3.
Regarding your questions:
1) All data is transferred over TCP/IP network. NO .vmdk is transferred unless it's Storage VMotion. All details you can find in the documentation
2) .nvram is VMware VM memory file. All the list of VMware VM file types can be validated here
3) All the logic is in hypervisor. vSphere Client/ vCenter are management products. VMware has proprietary code base, so I don't think you can get actual source code. At the same time, you are welcome to check ESXi cli documentation. VMotion invokation due to licensing restrictions can be done only via client.
4) Guest OS (in your case Ubuntu) is not aware of the fact the it uses virtual hardware at all. There is NO way for guest OS to track migration or any other VMware kernel/vmfs activity in general.

Related

Choice of VMWare Esxi host

Given a vsphere client, I am trying to find a way to determine the ESXi host on which a VM of provided specs can be spawned. Does anyone know of any formula by means of which one could relate the available CPU, ram and disk on an ESXi host and make a decision as to which host is a better choice to use to spawn a VM of a defined flavor - flavor here being a specified set of cpus, ram and disk.
Basically, I want to determine the number of VMs of a given specification (CPU, ram and disk) that can be spawned on a host.
You can use Configuration Maximums page to determine how many Esxi instances your physical hardware supports.
The upper limit is 128 virtual CPUs, 6 TB of RAM and 120 devices
There's two main ways to go about this:
If you happen to have access to vROps, they have that capability
built in the "Optimize Capacity" section of their UI.
Use your programming language of choice to perform those
calculations manually. Referencing a single host, divide the host's
available CPU MHz by the desired VM MHz, divide the host's available
RAM by the desired VM RAM, take the host's datastore with the lowest
about of space and divide that by the desired space of the VM. The
lowest returned figure from those 3 calculations will be your max
number of VMs that can be spawned on that particular host.

VMware CPU Utilization on a particular host

We are working on VMware and HA and DRS are enabled on a cluster. We want to set CPU threshold for every host i.e. if cpu utilization goes above 80% vm's move automatically.
Thanks in advance
Details
You receive an event message when the number of phisycal cpu on the host exceeds their limit.
This occurs when clients register more cpu on an ESXi host than the host can support.
Impact
If the limit is exceeded, the management agent is at risk of running out of system resources. Consequently, VMware vCenter Server might stop managing the corresponding ESXi host.
Solution
To ensure management functionality, restrict the number of Pcpu to the limit indicated by the VMkernel.Boot.maxPCPUS property of the ESXi host.
To lower the value of maximum number of allowed registered pcpu:
Edit the VMkernel.Boot.maxPCPUS variable by selecting the host in vCenter Server.
Open the Configuration tab and select Advanced Options in the Software box.
Expand the VMkernel option and click Boot.
Alter the value in the text box to the right of the VMkernel.Boot.maxPCPUS variable.
Hope to be useful ;)

Alternative to VMWare Server on Windows Server 2008 R2

Currently we are running a VMWare Server on a Windows Server 2008 R2. The hardware specs of the machine are very good. Nonetheless, performance in virtual machines is not at all acceptable when two or more virtual machines are running at the same time (just running, not performing any CPU or disk intensive tasks).
Hence we are looking for alternatives. VMWare's website is full of buzz words only, I cannot figure out if they provide a product fitting our requirements. But alternatives from other suppliers are also welcome.
There are some constraints:
The virtualization product must run on Windows 2008 R2 - the server will not be virtualized (hence esx is excluded)
Many Virtual Machines already exist. They must be usable with the new system, or the conversion process must be simple
The virtualization engine must be able to run without an interactive user session (hence VMWare Player and VirtualBox are excluded)
It must be possible to reset a machine to a snapshot and to start a machine via command line from a different (i.e. not the host) machine (something like the vmrun command)
Several machines must be able to run in parallel without causing an enormous drop in performance
Do you have some hints for that?
Have you considered Hyper-V (native hypervisor in Windows)?
However I would suggest troubleshooting the performance issues (the most common is not enough RAM for VM or host - which result in paging and poor performance)
Though I could not find a real alternative to VMWare Server with the constraints given, I could at least speed the performance up:
changing the disk policies from "Optimize for safety" to "Optimize for performance" reduced the time of most build projects by a third
installing IP version 6 protocol on the XP machines typically brought another 10%
The slowest integation testing project (installation of Dragon Naturally Seaking 12) is now done in 20 minutes instead of 2h20min.
Still, when copying larger files from the host to the virtual machine, performance is inacceptable - while copying them from a different VM on the same host works far better...
I would still consider esxi and 2008 on top of that if i would be in your place.
We used vmware server and performance is simply not comparable to esxi especially if you are using IO intensive applications.

How to keep a VMWare VM's clock in sync?

I have noticed that our VMWare VMs often have the incorrect time on them. No matter how many times I reset the time they keep on desyncing.
Has anyone else noticed this? What do other people do to keep their VM time in sync?
Edit: These are CLI linux VMs btw..
If your host time is correct, you can set the following .vmx configuration file option to enable periodic synchronization:
tools.syncTime = true
By default, this synchronizes the time every minute. To change the periodic rate, set the following option to the desired synch time in seconds:
tools.syncTime.period = 60
For this to work you need to have VMWare tools installed in your guest OS.
See http://www.vmware.com/pdf/vmware_timekeeping.pdf for more information
according to VMware's knowledge base, the actual solution depends on the Linux distro and release, in RHEL 5.3 I usually edit /etc/grub.conf and append this parameters to the kernel entry: divider=10 clocksource=acpi_pm
Then enable NTP, disable VMware time synchronization from vmware-toolbox and finally reboot the VM
A complete table with guidelines for each Linux distro can be found here:
TIMEKEEPING BEST PRACTICES FOR LINUX GUESTS
http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1006427
I'll answer for Windows guests. If you have VMware Tools installed, then the taskbar's notification area (near the clock) has an icon for VMware Tools. Double-click that and set your options.
If you don't have VMware Tools installed, you can still set the clock's option for internet time to sync with some NTP server. If your physical machine serves the NTP protocol to your guest machines then you can get that done with host-only networking. Otherwise you'll have to let your guests sync with a genuine NTP server out on the internet, for example time.windows.com.
Something to note here. We had the same issue with Windows VM's running on an ESXi host. The time sync was turned on in VMWare Tools on the guest, but the guest clocks were consistently off (by about 30 seconds) from the host clock. The ESXi host was configured to get time updates from an internal time server.
It turns out we had the Internet Time setting turned on in the Windows VM's (Control Panel > Date and Time > Internet Time tab) so the guest was getting time updates from two places and the internet time was winning. We turned that off and now the guest clocks are good, getting their time exclusively from the ESXi host.
In my case we are running VMWare Server 2.02 on Windows Server 2003 R2 Standard. The Host is also Windows Server 2003 R2 Standard. I had the VMware Tools installed and set to sync the time. I did everything imaginable that I found on various internet sites. We still had horrendous drift, although it had shrunk from 15 minutes or more down to the 3 or 4 minute range.
Finally in the vmware.log I found this entry (resides in the folder as the .vmx file):
"Your host system does not guarantee synchronized TSCs across different CPUs, so please set the /usepmtimer option in your Windows Boot.ini file to ensure that timekeeping is reliable. See Microsoft KB http://support.microsoft.com/kb... for details and Microsoft KB http://support.microsoft.com/kb... for additional information."
Cause: This problem occurs when the computer has the AMD Cool'n'Quiet technology (AMD dual cores) enabled in the BIOS or some Intel multi core processors. Multi core or multiprocessor systems may encounter Time Stamp Counter (TSC) drift when the time between different cores is not synchronized. The operating systems which use TSC as a timekeeping resource may experience the issue. Newer operating systems typically do not use the TSC by default if other timers are available in the system which can be used as a timekeeping source. Other available timers include the PM_Timer and the High Precision Event Timer (HPET).
Resolution: To resolve this problem check with the hardware vendor to see if a new driver/firmware update is available to fix the issue.
Note The driver installation may add the /usepmtimer switch in the Boot.ini file.
Once this (/usepmtimer switch) was done the clock was dead on time.
This documentation solved this problem for me.
The CPU speed varies due to power saving. I originally noticed this because VMware gave me a helpful tip on my laptop, but this page mentions the same thing:
Quote from : VMWare tips and tricks
Power saving (SpeedStep, C-states, P-States,...)
Your power saving settings may interfere significantly with vmware's performance. There are several levels of power saving.
CPU frequency
This should not lead to performance degradation, outside of having the obvious lower performance when running the CPU at a lower frequency (either manually of via governors like "ondemand" or "conservative"). The only problem with varying the CPU speed while vmware is running is that the Windows clock will gain of lose time. To prevent this, specify your full CPU speed in kHz in /etc/vmware/config
host.cpukHz = 2167000
VMware experiences a lot of clock drift. This Google search for 'vmware clock drift' links to several articles.
The first hit may be the most useful for you: http://www.fjc.net/linux/linux-and-vmware-related-issues/linux-2-6-kernels-and-vmware-clock-drift-issues
When installing VMware Tools on a Windows Guest, “Time Synchronisation” is not enabled by default.
However – “best practise” is to enable time synch on Windows Guests.
There a several ways to do this from outside the VM, but I wanted to find a way to enable time sync from within the guest itself either on or after tools install.
Surprisingly, this wasn’t quite as straightforward as I expected.
(I assumed it would be posible to set this as a parameter / config option during tools install)
After a bit of searching I found a way to do this in a VMware article called “Using the VMware Tools Command-Line Interface“.
So, if time sync is disabled, you can enable it by running the following command line in the guest:
VMwareService.exe –cmd “vmx.set_option synctime 0 1″
Additional Notes
For some (IMHO stupid) reason, this utility requires you to specify the current as well as the new value
0 = disabled
1 = enabled
So – if you run this command on a machine which has this already set, you will get an error saying – “Invalid old value“.
Obviously you can “ignore” this error when run (so not a huge deal) but the current design seems a bit dumb.
IMHO it would be much more sensible if you could simply specify the value you want to set and not require the current value to be specified.
i.e.
VMwareService.exe –cmd “vmx.set_option synctime <0|1>”
In Active Directory environment, it's important to know:
All member machines synchronizes with any domain controller.
In a domain, all domain controllers synchronize from the PDC Emulator (PDCe) of that domain.
The PDC Emulator of a domain should synchronize with local or NTP.
It's important to consider this when setting the time in vmware or configuring the time sync.
Extracted from: http://www.sysadmit.com/2016/12/vmware-esxi-configurar-hora.html
I added the following job to crontab. It is hacky but i think should work.
*/5 * * * * service ntpd stop && ntpdate pool.ntp.org && service ntpd start
It stops ntpd service updates from service and starts ntpd again

My VMware ESX server console volume went readonly. How can I save my VMs?

Two RAID volumes, VMware kernel/console running on a RAID1, vmdks live on a RAID5. Entering a login at the console just results in SCSI errors, no password prompt. Praise be, the VMs are actually still running. We're thinking, though, that upon reboot the kernel may not start again and the VMs will be down.
We have database and disk backups of the VMs, but not backups of the vmdks themselves.
What are my options?
Our current best idea is
Use VMware Converter to create live vmdks from the running VMs, as if it was a P2V migration.
Reboot host server and run RAID diagnostics, figure out what in the "h" happened
Attempt to start ESX again, possibly after rebuilding its RAID volume
Possibly have to re-install ESX on its volume and re-attach VMs
If that doesn't work, attach the "live" vmdks created in step 1 to a different VM host.
It was the backplane. Both drives of the RAID1 and one drive of the RAID5 were inaccessible. Incredibly, the VMware hypervisor continued to run for three days from memory with no access to its host disk, keeping the VMs it managed alive.
At step 3 above we diagnosed the hardware problem and replaced the RAID controller, cables, and backplane. After restart, we re-initialized the RAID by instructing the controller to query the drives for their configurations. Both were degraded and both were repaired successfully.
At step 4, it was not necessary to reinstall ESX; although, at bootup, it did not want to register the VMs. We had to dig up some buried management stuff to instruct the kernel to resignature the VMs. (Search VM docs for "resignature.")
I believe that our fallback plan would have worked, the VMware Converter images of the VMs that were running "orphaned" were tested and ran fine with no data loss. I highly recommend performing a VMware Converter imaging of any VM that gets into this state, after shutting down as many services as possible and getting the VM into as read-only a state as possible. Loading a vmdk either elsewhere or on the original host as a repair is usually going to be WAY faster than rebuilding a server from the ground up with backups.