Generating SCOM alerts for web servers - web-services

I have 4-7 sharepoint servers. We have a scom alert already implemented to generate alert if the server is down. But we want to implement scom alert if the website is down.
Can we genearate alert using scom by using ping functionality?
My idea is, we ping the server continuosly and when the website is unresponsive for some time we get alert saying that the website is unresposive.
Can this be implemented? And how much effort is needed? And do we need any other services to be implemented?
Any help would be appreciated.

Ravi,
Forgive me for the post being more philosophy and less answer.
For better or worse, Microsoft has resisted implemented a simple ping monitor in SCOM. Their is a solid reason for this. It would be overly leveraged by folks that don't know any better. The result of which would reflect poorly on the quality of SCOM as a monitoring tool. What I mean by that is that a ping monitor is a terrible idea as it doesn't tell the poor soul that was awoken at 2am much of anything beyond the highest level notion that something is wrong.
If you have 5 minutes to sit in front of the SCOM console to create a ping alert then you would serve your support teams much better if you spent those same 5 minutes creating a Web Application Availability monitor. The reason for this is that the Web App Avail monitor will actually look at the response to ensure that it is logical and successful.
Here is the documentation to create a Web Application Availability Monitor. It looks difficult only until your first implementation. It really is a snap. https://technet.microsoft.com/en-us/library/hh881882(v=sc.12).aspx
Consider that if you had a ping monitor and someone accidentally deleted your index.html file, your ping will happily chug along without telling anyone. Same with a bad code update. Heck, you could even stop your web application server and ping is still going to respond.
Conversely, If you had a Web App Avail monitor pointed at each of the nodes in a load balanced web farm and your load balancer failed, all of your Web monitors will continue to post as healthy while your monitor looking at the load balancer will start to fail. A quick glace at the console will tell your support team that indeed the issue is not with the web servers themselves.
It is a good philosophy to implement your monitors in a way that they testing the target as completely as possible and in the most isolated way possible. You would not want to point a Web App Avail monitor at a load balancer as you would not necessarily know which endpoint did not respond to SCOM to trigger the alert. Some folks go to great lengths to work around this by implementing health check pages that respond with there hostnames. This is usually not necessary, simply create a monitor against each individual node. You are going to want to monitor your load balancer directly so that you know it is up as well.
On another note, there already is a SharePoint management pack (actually one for each version of SharePoint) that you can download from MS for free. This management pack will automatically discovery and monitor all of the components of SharePoint in your infrastructure. It works quite well but if you are new to SCOM then the volume of data and alerts that it creates can be a bit overwhelming at first.
SharePoint 2016 (there is one for each version) management pack: https://blogs.technet.microsoft.com/wbaer/2015/09/08/system-center-operations-management-pack-for-sharepoint-server-2016-it-preview/
There is also a third party management pack that allows you to simply create ping monitors. People REALLY want this. I respectfully will tell you that that they are doing more harm then good in the majority of implementations that use this. But at the end of the day sometimes you just want something that works and you understand so here it is:
Ping management pack: https://www.opslogix.com/ping-management-pack/

Related

Suddenly scheduled tasks are not running in coldfusion 8

I am using Coldfusion MX8 server and one of the scheduled task was running from 2 years but now suddenly from 01/12/2014 scheduled tasks are not running. When i browsed the file in browser then the file is running successfully without error.
I am not sure is there any updatation or license expiration problem. I am aware that mid of this year Adobe closed the support for coldfusion 8.
The first most common problem of this problem is external to the server. When you say you browsed to the file and it worked in a browser, it is very important to know if that test was performed on the server desktop. Knowing that you can browse to the file from your desktop or laptop is of small value.
The most common source of issues like this is a change in the DNS or network stack that is interfereing with resolution. For example, if the internal DNS serving your DMZ suddenly starts serving the "external" address - suddenly your server can't browse to your domain. Or if the IP served by the server for the domain in question goes from being 127.0.0.1 to some other IP that the server can't acces correctly due to reverse proxy or LB or some other rule. Finally, sometimes the Apache or IIS is altered so that an IP that previously was serviced (127.0.0.1 being the most common example) now does not respond.
If it is something intrinsic to the scheduler service then Frank's advice is pretty good - especially look for "proxy schduler" entries in the log - they can give you good clues. I would also log results of a scheduled task to a file. Then check the file. If it exists then your scheduled tasks ARE running - they are just not succeeding. Good luck!
I've seen the cf scheduling service crash in CF8. The rest of CF is unaffected.
Have you tried restarting the server?
Here are your concerns:
Your File (works since you tested it manually).
Your Scheduled Task (failed).
Your Coldfusion Application (Service) (any changes here)?
Your Server (what about here).
To test your problem create a duplicate task and schedule it. Leave the other one in place (maybe set your new one to run earlier). Use the same file too. See if it completes.
If it doesn't then you have a larger problem. Since the Coldfusion Server sits atop of the JVM there could be something happening there. Things just don't stop working unless something got corrupted or you got compromised. If you hardened your server by rearranging/renaming the file structure to make it more secure...It would break your task.
So going back: if your test schedule works then determine what is different between the two. Note you have logging capabilities. Logging abilities for CF8
If you are not directly incharge of maintaining this server, then I would recommend asking around and see if there was recent maintenance, if so, what was done to the server?

2008 R2 Domain EvtSubscribe delays

I use the Windows EvtSubscribe API in my program that runs as a service, generally on Windows Server 2008 R2 Domain Controllers. It is registered for kerberos logon events and it's purpose is to provide single sign on for my application on the network.
I grab the username/IP from the logon event and use them to pre-authenticate an IP address. This has worked well in a large number of sites until it was used recently on an extremely large site (60,000 users logging on and off throughout the day). The Domain Controller isn't under extremely high load as far as I can tell from Process Monitor but the events are not being passed on to my application right away, they delay by what can be 20 minutes to an hour.
I use the PUSH method as described in the API. The code is identical.
In Event Viewer, looking at the security logs the logon events come in immediately when a user logs on to the domain. However the event is not pushed to my application till much, much later.
I have never seen this occur at any of the other sites my application has been installed on and I'm wondering if its a configuration issue on the servers themselves. The site with the delays has 4 clustered domain controllers in total with my application running and reporting on each. All 4 periodically experience extended delays in receiving the events.
Has anyone else come across something similar or have any ideas what could be at play?
I have tried replicating it using VMs and ADTest to generate load without much luck.

BizTalkServerIsolatedHost disappeared from one server in multi-server group

Afternoon all,
We have a group of four BizTalk servers: two orchestration hosts and two adapter hosts. We have a number of orchestrations exposed as web services, and for the purposes of this question, it is important to note that these web services are hosted on the adapter servers, and run under the BizTalkServerIsolatedHost host instance.
This morning, we started seeing odd errors on both of the adapter servers when SOAP calls came into the web services, like this:
The Messaging Engine failed to
register the adapter for “SOAP” for
the receive location blahblahblah.
Please verify that the receive
location exists, and that the isolated
adapter runs under an account that has
access to the BizTalk databases.
We restarted IIS on both servers, which fixed the errors on ONE server, but the other server continued to fail. The errors continued after a reboot as well.
After chasing our tails for a while, we eventually discovered that the BizTalkServerIsolatedHost host instance on the still-failing server was gone. Just... gone. These applications have been in production for months. Everything had been working swimmingly through the morning, until this just happened.
I don't want to muddy the waters, because I think the problems are unrelated, but in the interest of providing enough information, this problem exactly coincided with a problem in our load-balancing network hardware. The load balancer, which provides a single URL to consumers, and round-robins between the two adapter servers, just stopped working. This problem has not been resolved, so I don't know what happened, but it certainly made troubleshooting more interesting...
So, I have two questions:
Has anyone seen this before, where a host instance disappears?
We cannot find anything in the event viewer or anywhere else that says the host instance was deleted. Is this logged somewhere?
Thanks,
Jason

architecture and tools for a remote control application?

I'm working on the design of a remote control application. From my iPhone or a web browser, I'll send a few commands. Soon my home computer will perform the commands and send back results. I know there are remote desktop apps, but I want something programmable, something simpler, and something that I wrote.
My current direction is to use Amazon Simple Queue Service (SQS) as the message bus. The iPhone places some messages in a queue. My local Java/JRuby program notices the messages on the queue, performs the work and sends back status via a different queue.
This will be a very low-volume application. At $1.00 for a million requests (plus a handful of data transfer charges), Amazon SQS looks a lot more affordable than having my own server of any type. And super reliable, that's important for me too.
Are there better/standard toolkits or architectures for this kind of remote control? Cost is not a big issue, but I prefer the tons I learn by doing it myself.
I'm moderately concerned about security, but doubt it will be a problem. The list of commands recognized will be very short, and only recognized in specific contexts. No "erase hard drive" stuff.
update: I'll probably distribute these programs to some other people who want the same function, but who don't have Amazon SQS accounts. For now, they'll use anonymous access to my queues, with random 80-character queue names.
Well, I think it's a clever approach -- and as you said, the costs for your little traffic aren't even worth mentioning.
As I mentioned in the comment, it's a good way to leave your home machine behind your firewall and not have an open port on the internet.
I would suggest using OnlineMQ.com as a start; they have a free package.

First call to web service each day is slow

While building this web service and the app that calls it, we have noticed that the first call to the web service each day is extremely slow. It even will time out on some days. However, every call after that work great. Can anybody shed light on why this might be and how we can get rid of this pain?
Thanks in advance!
If it's an ASP.NET web service, it may be the CLR initializing and loading and verifying the assemblies for the first time. You may want to consider pre-compilation
Agree with the other answers on caching, initialization, etc. As far as a workaround, one possibility may be to set up some sort of daily task (SQL Server job, Windows service, something else?) to simulate a hit to the service each day, so that your users don't experience this first slow request.
If it is an ASP.NET web service, then you might want to check the settings of the application pool the web service is running in, especially the idle timeout which defaults to 20 minutes in IIS7.
Configuring IIS7 idle-timeout
Even if it is not an ASP.NET web service, other web servers will have equivalent configuration settings you have to tweak to keep your web service alive overnight.
Can you duplicate the same behavior on your database? It could just be the db needing to optimise the query for the first run (Maybe the parameter is today's date?).
Are there a lot of static constructors or set up code in the Global.asax class? Because IIS recycles worker processes periodically, the start up code may be running again.
The rule for optimization is: don't guess. Put in profiling to find out exactly what is slow, and then work to make that faster. Everything already posted provides excellent tips on where to start looking for slowness.