2008 R2 Domain EvtSubscribe delays - c++

I use the Windows EvtSubscribe API in my program that runs as a service, generally on Windows Server 2008 R2 Domain Controllers. It is registered for kerberos logon events and it's purpose is to provide single sign on for my application on the network.
I grab the username/IP from the logon event and use them to pre-authenticate an IP address. This has worked well in a large number of sites until it was used recently on an extremely large site (60,000 users logging on and off throughout the day). The Domain Controller isn't under extremely high load as far as I can tell from Process Monitor but the events are not being passed on to my application right away, they delay by what can be 20 minutes to an hour.
I use the PUSH method as described in the API. The code is identical.
In Event Viewer, looking at the security logs the logon events come in immediately when a user logs on to the domain. However the event is not pushed to my application till much, much later.
I have never seen this occur at any of the other sites my application has been installed on and I'm wondering if its a configuration issue on the servers themselves. The site with the delays has 4 clustered domain controllers in total with my application running and reporting on each. All 4 periodically experience extended delays in receiving the events.
Has anyone else come across something similar or have any ideas what could be at play?
I have tried replicating it using VMs and ADTest to generate load without much luck.

Related

Scalable login/lobby servers for a multiplayer game

I am developing a multiplayer game (client-server model) and I am stuck when it comes to scaling its servers.
I understand that most games never even reach 10 000+ players, and I don't think mine will either.
Though if I would be very lucky to get that I want to develop the servers so they cannot become a huge obsticle later.
I have searched a lot for a solution to my problem on the internet, checking GDC talks about it and checking other posts on this website, but none of them seems to solve my specific problem.
My current setup is below and all servers are written in C++ using ENet as my network library.
Game server
This server handles the actual gameplay of the game and requires quite a lot of CPU and packages being sent between the server and its connected clients.
But this dedicated server is hosted by the players themselves, so I don't have to think about scaling it at all.
Lobby server
This server handles the server list, containing all servers currently up.
All game servers are sending a UDP package to this server every 5 seconds to say they are still alive.
This is so the lobby server can keep an updated list of all servers currently online.
All clients are sending a UDP package to this server when they want to fetch all servers (which is only in the server
list screen), and the lobby server sends back a list of all servers.
This does not happen that often and the lobby server is limited to send 4 servers per second to a client (and not a huge package containing all servers).
Login server
This server handles creating accounts, lost password, logins, friends and their current game status,
private messages to other logged in players and player profiles that specifies what in-game items they have.
All clients are sending a UDP package to this server every 5 seconds to say they are still alive, while also
sending what game they are currently in. The server then sends back their friend lists online/offline/in-game statuses.
This is so their friends can keep an updated list of which friend is online/offline/in-game.
It sends messages only with player actions otherwise, like creating an account, logging in, changing/resetting password,
adding/removing/ignoring a friend, private messages to friends, etc.
My questions
What I am worried about is that my lobby and login server might not be scalable and that they would have too much traffic on them.
1. Could they in theory be hosted on just a single computer? Or would it be too much traffic for 10 000+ players?
2. If they can be hosted on a single computer, will the servers still not have issues for people that live far away?
Would it be better to have the lobby and login servers per region of the world in that case?
The bad thing about that is that the players would not be able to see servers in the US if they live in Europe, and that their account and items would not exist on the other servers.
3. Might be far-fetched, but if I would rewrite both servers to instead be on a website with a database and make the client/game server do
web requests instead (such as HTTPS or calling a php with specific headers),
would it help in solving my problems somehow?
All your problems and questions are solved by serverless cloud based solution AWS Lambda e.g. or similar. In this case the scalability is not your problem. Just develop the logic. This will save you much time.
If you would like to make servers as single app hosted by your own server. Consider using something like e.g. Go instead of C++. It's designed exactly for these purposes. I mean highly loaded web/network services.
Well, this is c++ and i code in java, but maybe the logic is useful for you any way so i will tell you how i end up implementing something similar but in a casino.
In my case I have 2 diferrent sockets in the same server program, one of the sockets is TCP and it handles all logins, registers and payments, while the second socket is UDP and it handle the actual game multiple payers are playing, then you could group internally all those UDP connection in groups (probably arrays of sockets) to generate those lobbies. Doing that all your server is just one class that could run in a single pc using 2 ports (one for each socket) However this do not solve the problem of the ping for people who live far away.
If ping is a problem (not my case in a casino) you could probably host your server region base but removing the login, registration and paymets of your server and replace it for a connection to a central server (this central server should be TCP and you could also implement a https socket to also allow your webpage to connect to this server and create accounts or pay you directly from the browser)
sorry to mess your life even more, but i hope it helps

Generating SCOM alerts for web servers

I have 4-7 sharepoint servers. We have a scom alert already implemented to generate alert if the server is down. But we want to implement scom alert if the website is down.
Can we genearate alert using scom by using ping functionality?
My idea is, we ping the server continuosly and when the website is unresponsive for some time we get alert saying that the website is unresposive.
Can this be implemented? And how much effort is needed? And do we need any other services to be implemented?
Any help would be appreciated.
Ravi,
Forgive me for the post being more philosophy and less answer.
For better or worse, Microsoft has resisted implemented a simple ping monitor in SCOM. Their is a solid reason for this. It would be overly leveraged by folks that don't know any better. The result of which would reflect poorly on the quality of SCOM as a monitoring tool. What I mean by that is that a ping monitor is a terrible idea as it doesn't tell the poor soul that was awoken at 2am much of anything beyond the highest level notion that something is wrong.
If you have 5 minutes to sit in front of the SCOM console to create a ping alert then you would serve your support teams much better if you spent those same 5 minutes creating a Web Application Availability monitor. The reason for this is that the Web App Avail monitor will actually look at the response to ensure that it is logical and successful.
Here is the documentation to create a Web Application Availability Monitor. It looks difficult only until your first implementation. It really is a snap. https://technet.microsoft.com/en-us/library/hh881882(v=sc.12).aspx
Consider that if you had a ping monitor and someone accidentally deleted your index.html file, your ping will happily chug along without telling anyone. Same with a bad code update. Heck, you could even stop your web application server and ping is still going to respond.
Conversely, If you had a Web App Avail monitor pointed at each of the nodes in a load balanced web farm and your load balancer failed, all of your Web monitors will continue to post as healthy while your monitor looking at the load balancer will start to fail. A quick glace at the console will tell your support team that indeed the issue is not with the web servers themselves.
It is a good philosophy to implement your monitors in a way that they testing the target as completely as possible and in the most isolated way possible. You would not want to point a Web App Avail monitor at a load balancer as you would not necessarily know which endpoint did not respond to SCOM to trigger the alert. Some folks go to great lengths to work around this by implementing health check pages that respond with there hostnames. This is usually not necessary, simply create a monitor against each individual node. You are going to want to monitor your load balancer directly so that you know it is up as well.
On another note, there already is a SharePoint management pack (actually one for each version of SharePoint) that you can download from MS for free. This management pack will automatically discovery and monitor all of the components of SharePoint in your infrastructure. It works quite well but if you are new to SCOM then the volume of data and alerts that it creates can be a bit overwhelming at first.
SharePoint 2016 (there is one for each version) management pack: https://blogs.technet.microsoft.com/wbaer/2015/09/08/system-center-operations-management-pack-for-sharepoint-server-2016-it-preview/
There is also a third party management pack that allows you to simply create ping monitors. People REALLY want this. I respectfully will tell you that that they are doing more harm then good in the majority of implementations that use this. But at the end of the day sometimes you just want something that works and you understand so here it is:
Ping management pack: https://www.opslogix.com/ping-management-pack/

Architecture Design for API of Cloud Service

Background:
I've a local application that process the user input for 3 second (approximately) and then return an answer (output) to the user.
(I don't want to go into details about my application in purpose of not complicate the question and keep it a pure architectural question)
My Goal:
I want to make my application a service in the cloud and expose API
(for the upcoming website and for clients that will connect the service without install the software locally)
Possible Solutions:
Deploy WCF on the cloud and use my application there, so clients can invoke the service and use my application on the cloud. (RPC style)
Use a Web-API that will insert the request into queue and then a worker role will dequeue requests and post the results to a DB, so the client will send one request for creating a request in the queue, and another request for getting the result (which the Web-API will get from the DB).
The Problems:
If I go with the WCF solution (#1) I cant handle great loads of requests, maybe 10-20 simultaneously.
If I go with the WebAPI-Queue-WorkerRole solution (#2) sometimes the client will need to request the results multiple times its can be a problem.
If I go with the WebAPI-Queue-WorkerRole solution (#2) the process isn't sync, the client will not get the result once the process of his request is done, he need to request the result.
Questions:
In the WebAPI-Queue-WorkerRole solution (#2), can I somehow alert the client once his request has processed and done ? so I can save the client multiple request (for the result).
Asking multiple times for the result isn't old stuff ? I remmemeber that 10 - 15 years ago its was accepted but now ? I know that VirusTotal API use this kind of design.
There is a better solution ? one that will handle great loads and will be sync or async (returning result to the client once it done) ?
Thank you.
If you're using Azure, why not simply fire up more servers and use load balancing to handle more load? In that way, as your load increases, you have more servers to handle the requests.
Microsoft recently made available the Azure Service Fabric, which gives you a lot of control over spinning up and shutting down these services.

Should I use MSMQ or IIS

I have a web site that exposes a web service to all my desktop clients.
Randomly, these clients will invoke the web service which in turn will add a message jpeg in byte array format to the MSMQ.
I have a service application that reads from this queue and performs an enhancement on this jpeg and saves it to the hard drive.
The number of clients uploading at anyone time is unpredictable.
I choose this method because I do not want to put any strain on IIS. The enhancements my service application performs is not much 'erg' but it exists nevertheless.
However, after realizing that my service application had stopped for sometime and required restarting I noticed the RAM leap up to clear the backlog. Whilst I have corrected this and the service is now coded to restart automatically on fail I surmised that a backlog could exists at busy times which again give a higher RAM usage.
Now, should I accept to do the processing all within my web service and then save to the hard drive or am I correct in using a MSMQ?
I am using C# and asp.net

Service blocks windows startup

We have automatically started service which in some cases spends a lot of the time loading necessary data, let's say 10 minutes. During this time it works as expected (processing some huge data files required to start). I report the progess by C++ SetServiceStatus function, it is working fine.
This service is not dependent on anything and has only one dependency which is again our own service. It is started after those 10 minutes, it needs the first "server" service to be fully running to accept the requests.
I thought that windows would start all other automatic services (in less then 10 minutes as usually) and then start working normally but system is completely blocked during startup (i can't login to computer or ping the computer) until this one specific service is started (reports SERVICE_RUNNING by SetServiceStatus). When out service completely starts, the other missing system services (required for network, remote desktop, whatever, it's quite random) are also started. Is this normal behaviour? Why are non-depending processes (as remote desktop, network connections, etc.) waiting for this process? Am I missing something?
I tried to add some dependencies to postpone the startup of my service but I ended up with many dependencies and behaviour still somehow random (as order of services is random). Sometimes I was able to login but for example Start button started working only after those 10 minutes when my service was started. I am not sure what is "the last service" to depend on and what services to include to my depend-list and on some computers this services can be disabled and it can bring new problems... so I don't like this solution very much.
Another option was Delayed start option for our service. This should start service when all other automatic services are running. Well, this works fine, windows boots, computer running and responding, our service is started, but the performance is very bad, many times slower than usually, it seems that delayed started services have much lower priority or something like that.
My only current solution is to report to system that my service is running (by SetServiceStatus function), but to continue loading (this works, I tested it). But then we have problem with our dependent service as it needs to be started when the first one is really ready. It can be solved but I still wonder how is this possible and if there is something I could use to keep the current state of automatic started service which reports "started" when it is really fully started and prepared to work. Thanks for any ideas.
Set SERVICE_RUNNING as soon as possible, and then continue processing in background. Make your other service resilient to the first service being in a running state, but not yet ready to service.
The longer the service is in the starting state the more problems we get from different windows versions.