Can large amount of page cache lead to OOM in other processes? - hdfs

I ran HDFS on those linux machines. I used "-Xmx" and "-Xms" to assign datanode 25GB memory. But when hdfs used only 10GB memory, it throws out an Out-Of-Memory exception, and meanwhile, the "cached memory" is about "50GB"(the server I used has 64GB total memory).
So I guess it's page cache that occupied the system memory and led to OutOfMemory in HDFS, is this possible?
And it seems there's no such a parameter "cache_stop" in CentOS 6.3 which is the os that I use, is there any other way to stop page cache?

Related

How to right size cloud instance?

I have a Spring MVC web application which I want to deploy in cloud.It could be AWS, Azure, Google cloud. I have to find out how much RAM & hard disk is needed. Currently I have deployed the application at my local machine in tomcat. Now I go to localhost:8080 & click on server status button. Under OS header it tells me:
Physical memory: 3989.36 MB Available memory: 2188.51 MB Total page
file: 7976.92 MB Free page file: 5233.52 MB Memory load: 45
Under JVM header it tells me:
Free memory: 32.96 MB Total memory: 64.00 MB Max memory: 998.00 MB
How to infer RAM & hard disk size from these data? There must be some empirical formula, like memory of OS + factor*jvm_size & I assume jvm size = memory size of applications. And while deploying to cloud we will not deploy all those example applications.
These stats from your local machine is in the idle state and does not have any traffic so it will definitely taking and consuming fewer resources.
You can not decide the clouds machine size on the bases of local machine memory stats but it will help little like the minimum resources can consume the application.
So the better way is to perform some load test if you are expecting the huge number of user then design accordingly on the base of load test.
The short way is to read the requirement or recommended system of a deployed application.
Memory
256 MB of RAM minimum, 512 MB or more is recommended. Each user
session requires approximately 5 MB of memory.
Hard Disk Space
About 100 MB free storage space for the installed product (this does
not include WebLogic Server or Apache Tomcat storage space). Refer to
the database installation instructions for recommendations on database
storage allocation.
you can further look here and here
so if you want desing for staging or development then you can choose one of these two AWS instancs.
t2.micro
1VCPU 1GB RAM
t2.small
1vCPU 2GB RAM
https://aws.amazon.com/ec2/instance-types/

Can a database connection leak cause increased CPU usage?

There's a server that might be experiencing PostgreSQL database connection leaks. That server has also maxed out it's CPU at times (as indicated by %user being extremely high upon running sar -u). Could database connection leaks be causing the abnormally high CPU usage?
This can happen if the connections are busy running queries that take forever and consume CPU.
Use operating system tools on the PostgreSQL server to see which processes consume CPU. On Linux that would be top.

What happens under live migration

I want to understand what happens under the hood in a live migration for execution of my final year project
According to my understanding ,with two host sharing a common storage via SAN
1)When a vm is migrated from one host to another host,the VM files are transferred from one ESXI to another ,but the question is they have a shared storage so how are they going to transfer.
2)VMDK,snapshots files are transferred during live migration
Now I have questions
1)Only VMDK,.vmx files are transferred
2)with VMotion the memory pages are transferred,so what are this memory pages,are they files ,or what are they physically
3)Where is the code for migration present,in hypervisor or VCenter
4)Can we get a stacktrace for vm ,hypervisor during a migration and if yes how would that be possible (I tried a strace to get a basic on how a VM (ubuntu) would call a hypervisor but that only gives me till the linux system and not beyond that )
Can anyone please guide me on this .
VMotion overview
Phase 1: Guest Trace Phase
The guest VM is staged for migration during this phase. Traces are
placed on the guest memory pages to track any modifications by the
guest during the migration. Tracing all of the memory can cause a
brief, noticeable drop in workload throughput. The impact is generally
proportional to the overall size of guest memory.
Phase 2: Precopy Phase
Because the virtual machine continues to run and actively modify its
memory state on the source host during this phase, the memory contents
of the virtual machine are copied from the source vSphere host to the
destination vSphere host in an iterative process. The first iteration
copies all of the memory. Subsequent iterations copy only the memory
pages that were modified during the previous iteration. The number of
precopy iterations and the number of memory pages copied during each
iteration depend on how actively the memory is changed on the source
vSphere host, due to the guest’s ongoing operations. The bulk of
vMotion network transfer is done during this phase—without taking any
significant number of CPU cycles directly from the guest. One would
still observe an impact on guest performance, because the write trace
fires during the precopy phase will cause a slight slowdown in page
writes.
Phase 3: Switchover Phase
During this final phase, the virtual machine is momentarily
quiesced on the source vSphere host, the last set of memory
changes are copied to the target vSphere host, and the virtual
machine is resumed on the target vSphere host. The guest briefly
pauses processing during this step. Although the duration of this
phase is generally less than a second, it is the most likely phase
where the largest impact on guest performance (an abrupt, temporary
increase of latency) is observed. The impact depends on a variety of
factors not limited to but including network infrastructure, shared
storage configuration, host hardware, vSphere version, and dynamic
guest workload.
From my experience, I would say I am always loosing at least 1 ping during Phase 3.
Regarding your questions:
1) All data is transferred over TCP/IP network. NO .vmdk is transferred unless it's Storage VMotion. All details you can find in the documentation
2) .nvram is VMware VM memory file. All the list of VMware VM file types can be validated here
3) All the logic is in hypervisor. vSphere Client/ vCenter are management products. VMware has proprietary code base, so I don't think you can get actual source code. At the same time, you are welcome to check ESXi cli documentation. VMotion invokation due to licensing restrictions can be done only via client.
4) Guest OS (in your case Ubuntu) is not aware of the fact the it uses virtual hardware at all. There is NO way for guest OS to track migration or any other VMware kernel/vmfs activity in general.

How can i monitor memory USAGE of a guest VM in hyper-v (non-dynamic)

So i want to monitor the actual memory used by a VM.
If dynamic memory is enables, I use the "memory demand" counter.
But what if it isn't? I can get the memory visible to the VM, the memory allocated to the VM, but not the usage.
Without dynamic memory enabled, Hyper-V will allocate exactly as much memory as you have assigned to the VM, whether the VM is actually using it or not.
If you assign 2GB of RAM to a VM, that's how much memory Hyper-V will allocate on the host machine.
To determine the actual amount of memory in use by the VM, you'll probably have to query the OS in the VM directly (via WMI or some other remote management solution).

ColdFusion server crashing on hourly basis

I am facing serious ColdFusion Server crashing issue. I have many live sites on that server so that is serious and urgent.
Following are the system specs:
Windows Server 2003 R2, Enterprise X64 Edition, Service Pack 2
ColdFusion (8,0,1,195765) Enterprise Edition
Following are the hardware specs:
Intel(R) Xeon(R) CPU E7320 #2.13 GHZ, 2.13 GHZ
31.9 GB of RAM
It is crashing on the hourly bases. Can somebody help me to find out the exact issue? I tried to find it through ColdFusion log files but i do not find anything over there. Every times when it crashes, i have to reset the ColdFusion services to get it back.
Edit1
When i saw the runtime log files "ColdFusion-out165.log" so i found following errors
error ROOT CAUSE:
java.lang.OutOfMemoryError: Java heap space
javax.servlet.ServletException: ROOT CAUSE:
java.lang.OutOfMemoryError: Java heap space
04/18 16:19:44 error ROOT CAUSE:
java.lang.OutOfMemoryError: GC overhead limit exceeded
javax.servlet.ServletException: ROOT CAUSE:
java.lang.OutOfMemoryError: GC overhead limit exceeded
Here are my current JVM settings:
As you can see my JVM setting are
Minimum JVM Heap Size (MB): 512
Maximum JVM Heap Size (MB): 1024
JVM Arguments
-server -Dsun.io.useCanonCaches=false -XX:MaxPermSize=512m -XX:+UseParallelGC -Dcoldfusion.rootDir={application.home}/../ -Dcoldfusion.libPath={application.home}/../lib
Note:- when i tried to increase Maximum JVM Heap size to 1536 and try to reset coldfusion services, it does not allow me to start them and give the following error.
"Windows could not start the ColdFusion MX Application Server on Local Computer. For more information, review the System Event Log. If this is a non-Microsoft service, contact the service vendor, and refer to service-specific error code 2."
Should i not able to set my maximum heap size to 1.8 GB, because i am using 64 bit operating system. Isn't it?
How much memory you can give to your JVM is predicated on the bitness off your JVM, not your OS. Are you running a 64-bit CF install? It was an uncommon thing to do back in the CF8 days, so worth asking.
Basically the error is stating you're using too much RAM for how much you have available (which you know). I'd be having a look at how much stuff you're putting into session and application scope, and culling back stuff that's not necessary.
Objects in session scope are particularly bad: they have a far bigger footprint than one might think, and cause more trouble than they're worth.
I'd also look at how many inactive but not timed-out sessions you have, with a view to being far more agressive with your session time-outs.
Have a look at your queries, and get rid of any SELECT * you have, and cut them back to just the columns you need. Push dataprocessing back into the DB rather than doing it in CF.
Farm scheduled tasks off onto a different CF instance.
Are you doing anything with large files? Either reading and processing them, or serving them via <cfcontent>? That can chew memory very quickly.
Are all your function-local variables in CFCs properly VARed? Especially ones in CFCs which end up in shared scopes.
Do you accidentally have debugging switched on?
Are you making heavy use of custom tags or files called in with <cfmodule>? I have heard apocryphyal stories of custom tags causing memory leaks.
Get hold of Mike Brunt or Charlie Arehart to have a look at your server config / app (they will obviously charge consultancy fees).
I will update this as I think of more things to look out for.
Turn on ColdFusion monitor in the administrator. Use it to observe behavior. Find long running processes and errors.
Also, make sure that memory monitoring is turned off in the ColdFusion Server Monitor. That will bring down a production server easily.
#Adil,
I have same kind of issue but it wasn't crashing it but CPU usage going high upto 100%, not sure it relevant to your issue but atleast worth to look.
See question at below URL:
Strange JRUN issue. JRUN eating up 50% of memory for every two hours
My blog entry for this
http://www.thecfguy.com/post.cfm/strange-coldfusion-issue-jrun-eating-up-to-50-of-cpu
For me it was high traffic site and storing client variables in registry which was making thing going wrong.
hope this help.