Jrun ColdFusion service intermittently fails to start

Jrun ColdFusion service intermittently fails to start - coldfusion

We occasionaly have a problem where we attempt to start the Jrun service and it fails with the following two errors:
error JRun Naming Service unable to start on port 2902
java.net.BindException: Port in use by another service or process: 2902
info No JDBC data sources have been configured for this server (see jrun-resources.xml)
error java.net.BindException: Port in use by another service or process: 8300
We then have to reboot the machine and Jrun comes up with no problem. This is very intermittent - happens perhaps one out of every 10 times we restart Jrun services.
I saw another reference on StackOverflow that if Windows Services take longer than 30 seconds to restart Windows shuts down the startup proccess. Perhaps that is the issue here? The logs indeed indicate that these errors are thrown about 37+ seconds after the restart command is issued.
We are on a 64bit platform on WinServer 2008.
Thanks!

We've been experiencing a similar problem on some of our servers. Unfortunately, netstat never indicated any sort of actual port conflict for us. My suspicion is that it's related to our recent deployment of a ColdFusion "cumulative hotfix" to our servers. We use the multi-server edition of CF 8.0.1 enterprise with a large number of instances on each machine -- each with its own JVM and its own distinct set of ports. Each CF instance is attached to its own IIS website and runs as its own Windows Service.
Within the past few weeks, we started getting similar "port in use" exceptions on startup, on our 32-bit machines as well as our 64-bit machines, all of which are running Windows Server 2003. I found several possible culprits and tried the following:
In jrun-jms.xml for each CF instance, there's an entry for the RMI transport layer that reads <port>0</port> -- which, according to the JRun documentation, means "choose a random port." I made that non-random and distinct per instance (in the 2600-2650 range) and restarted each instance. Things improved temporarily, perhaps coincidentally.
In the same file, under the entry for the TCPIP transport later, every instance defaulted to <port>2522</port> -- so I changed those to distinct ports per instance in the 2500-2550 range and restarted each instance. That didn't seem to help at all.
I tried researching whether ports in the 2500-3000 range might be used for any other purpose, and I couldn't find anything obvious, and besides, netstat wasn't telling me that any of my choices were in use.
I found something online about Windows designating ports from 1024 to 5000 as the "dynamic port" range, so I added 10000 to the port numbers I had set in jrun-jms.xml and restarted each instance again. Still didn't help.
I tried changing the port in jndi.properties, also by adding 10000 to the port numbers. Unfortunately this meant wiping out all my wsconfig connections to IIS and creating them again from scratch. I had to edit wsconfig_jvm.config as well, adding -DWSConfig.PortScanStartPort=12900 to java.args, so it could detect my CF instances. (By default it only scans ports 2900-3000. See bpurcell.org for details. It's an old post but still relevant.) So far so good!
My best guess is that Adobe (or MS Windows) changed the way some of its code grabs "random" ports. But all I know for sure so far is that the steps outlined above appear to have fixed the problem.

Have you verified that the services are in fact stopping? Task manager should show no instances of jrun.exe. You can also check to see what is bound to that port by opening a command window and running
netstat -a -b
This will list all your open ports, plus what program is using them. You can also use
netstat -a -o
Which does the same thing as the above, but will list the process id instead of the program name. You can then cross-reference those with task manager. You'll need to enable showing the PIDs in task manager by going to View->Select Columns and making sure PID is checked. My guess would be that the jrun processes are not shutting down in a timely fashion.

Related

How exactly does the WiX 'Service Install' work internally?

I have a problem with a web service that is installed and started with a .msi that is created with the WiX toolset.
The service can be installed and started on all the machines I tested so far (shown as running in the Services Manager) but on some machines it is not reachable (for example via a browser) and not shown in the list of listening ports on that machine (displayed with 'netstat -a').
I am trying to figure out what's going wrong but I am not really familiar with web service development and configuration. It's a third party service, thus I don't know how it works internally.
A good starting point for me would be to find out, what exactly happens when a service is installed and started during the execution of the .msi-file.
Maybe I could try to tackle the problem on a lower level then.
Below is my code in the ServiceInstall-Element:
<ServiceInstall
Id="ServiceID"
Type="ownProcess"
Vital="yes"
Name="ServiceName"
DisplayName="ServiceDisplayName"
Description="Lorem Ipsum"
Start="auto"
Account="LocalSystem"
ErrorControl="normal"
Interactive="no"
Arguments="action=run">
</ServiceInstall>
The argument is important - without it, the service won't start or run.
Maybe someone else encounterd the same or a similar problem and can help me out.
Thanks already in advance - each hint is appreciated.
EDIT I (15.04.18):
As it might be a problem with the specific service, I will add some further information here:
It's a third party software called CryptoLicensing:
http://www.ssware.com/cryptolicensing/cryptolicensing_net.htm
Part of this software is that specific program, that serves as a License Server and does the license registration, for example in a customer's network.
The service can be run as a Windows application or installed and run as a Windows service. In both cases it should be listening on a (pre-)specified port on the installed machine.
Whenever I start the .exe as an application, everything works as intended. The service is reachable (for example with the browser) and can be accessed from other machines in the network.
When the .exe is installed and started as a service, it does not work as intended on every machine. For example if I install and start the service on my laptop, it is shown as running in the Services Manager, but is not reachable on its assigned URL (not even on the localhost) nor is the specific port displayed in the active listening ports, for example with 'netstat -a'.
The service itself starts without any error messages and does not log any errors or exceptions as it seems to be running without any problems.
I contacted the vendor, but sometimes he doesn't reply quickly and he is not very specific in his replies.
Before asking the question I assumed that it was a problem with the Windows user rights and the WiX installer but during the discussion here I had the feeling that it might a problem with the service itself.
I hope this 'new' piece of information helps in isolating and location the problem.
Thanks to everyone who helped so far!

Hopefully not stating the obvious here, but WiX doesn't do much except populate the ServiceInstall table in the MSI file, so this is about why Windows Installer won't start the service. ServiceInstall table:
https://msdn.microsoft.com/en-us/library/windows/desktop/aa371637(v=vs.85).aspx
Also, this isn't really about ServiceInstall - it's probably about the ServiceControl element in your WiX source, but it's not clear whether that's how you're starting it or if you're starting it manually later on. That does make a difference. What is the error message and where are you getting it, and is it a 1920 or 1921 error (in the context of ServiceControl).
The main reason a service will start on one system but not another is missing dependencies. If your service is C++ based (the post doesn't say) then there are probably dependencies on C runtimes, UCRT runtimes, MFC or ATL runtimes and so on.

First: are you sure this service is intended to run as LocalSystem? (MSDN, SO).
Second: did you check the event logs in detail for anything obvious? If the service is good you should find a hint at least. Something to start with. I find that I sometimes miss the actual logs in the event viewer because it is so "crowded". My take on it: empty the log and stop and restart the service.
Something locking / blocking: If the service installs and runs OK I would suspect other factors such as firewalls (hardware & software), security software in general (anti-virus, malware scanners), network configuration issues (proxies, WINS, DNS and all the complexities involved in networking). Is the service trying to reach an UNC path?
Diverse Machines: What are the target machines? Are they virtual, are they physical, are they test machines, are they operative SOE machines in corporate networks? Are they the same OS version and edition?
Further Ideas: It is not quite related, but maybe skim this list of suggestions for debugging from another answer (I am not sure why it was down-voted, I think it is an OK list to inspire debugging ideas): Windows Application Startup Error Exception code: 0xe0434352 (maybe just skim the bolded words for ideas - Recommended).
sc.exe: And finally, perhaps check the sc.exe tool (Service Control) and see if it can provide you with some useful information for debugging.
sc.exe in the context of killing hung services (sample use).
sc.exe from MSDN
Some further links:
Windows Services Frequently Asked Questions (FAQ). Content seems to be up to date - at face value at least. These guys claim to be experts on services. I have no idea who they are.
Essential Tools for Windows Services: SC.EXE
Run Service Control (sc.exe) command on secure port

After almost 20 months we finally (and accidentally) found a solution to the problem! For the few machines, on which the service did not run properly, setting the NoInteractiveServices value in the registry to 0 did the trick. A value of 1 (which is default) means that no service is allowed to run interactively, regardless of whether it has the SERVICE_INTERACTIVE_PROCESS property. More information on Interactive Services.
I am not completely satisfied with the solution, because on all the other machines NoInteractiveServices is set to 1 AND the service runs properly anyway. However, on the machines where the service did not run interactively this solution worked for us. Thus I will accept this as an answer.
If anyone has more information on this issue and can explain why this works, feel free to
add them - I would be very interested!

Suddenly scheduled tasks are not running in coldfusion 8

I am using Coldfusion MX8 server and one of the scheduled task was running from 2 years but now suddenly from 01/12/2014 scheduled tasks are not running. When i browsed the file in browser then the file is running successfully without error.
I am not sure is there any updatation or license expiration problem. I am aware that mid of this year Adobe closed the support for coldfusion 8.

The first most common problem of this problem is external to the server. When you say you browsed to the file and it worked in a browser, it is very important to know if that test was performed on the server desktop. Knowing that you can browse to the file from your desktop or laptop is of small value.
The most common source of issues like this is a change in the DNS or network stack that is interfereing with resolution. For example, if the internal DNS serving your DMZ suddenly starts serving the "external" address - suddenly your server can't browse to your domain. Or if the IP served by the server for the domain in question goes from being 127.0.0.1 to some other IP that the server can't acces correctly due to reverse proxy or LB or some other rule. Finally, sometimes the Apache or IIS is altered so that an IP that previously was serviced (127.0.0.1 being the most common example) now does not respond.
If it is something intrinsic to the scheduler service then Frank's advice is pretty good - especially look for "proxy schduler" entries in the log - they can give you good clues. I would also log results of a scheduled task to a file. Then check the file. If it exists then your scheduled tasks ARE running - they are just not succeeding. Good luck!

I've seen the cf scheduling service crash in CF8. The rest of CF is unaffected.
Have you tried restarting the server?

Here are your concerns:
Your File (works since you tested it manually).
Your Scheduled Task (failed).
Your Coldfusion Application (Service) (any changes here)?
Your Server (what about here).
To test your problem create a duplicate task and schedule it. Leave the other one in place (maybe set your new one to run earlier). Use the same file too. See if it completes.
If it doesn't then you have a larger problem. Since the Coldfusion Server sits atop of the JVM there could be something happening there. Things just don't stop working unless something got corrupted or you got compromised. If you hardened your server by rearranging/renaming the file structure to make it more secure...It would break your task.
So going back: if your test schedule works then determine what is different between the two. Note you have logging capabilities. Logging abilities for CF8
If you are not directly incharge of maintaining this server, then I would recommend asking around and see if there was recent maintenance, if so, what was done to the server?

Multiple NetApi calls failing inconsistently

We have a multi-threaded process which makes multiple calls to multiple target machines from a source machine using NetApi’s eg. NetServerGetInfo, LSAOpenPolicy, NetShareEnum, NetWKstaGetInfo, NetWKstaUserEnum etc… We make quite significant number of calls and have observed that over a period of time these calls fail. For example NetServerGetInfo starts returning error 53 after a while. This issue persist until we restart Workstation service or the machine. Accessing the target shares directly also does not work after such error is returned by our process.
The source machine from where we are making calls is a Win 2k8 R2 and the target machines are 2k3 servers.
We are suspecting some kind of issue with NetApi calls or some kind of handle leak.
Has anyone faced similar issues while using these APIs and managed to figure out a solution?
I found few references online for similar issues:
http://social.technet.microsoft.com/Forums/en-US/windowsserver2008r2networking/thread/9f93508c-71fa-4807-b41a-8f558563afe3/
Snippet from above link:
Experiencing the exact same issue as stated about except we have 2 Windows Server 2008 R2's acting as Terminal Servers connecting to Server 2003 Shares. Rebooting the terminal servers seems to resolve the problem for about 2-4 days and then re-appears. The XP/Vista/Win7 workstations on the network has no problem accessing the shares on the 2003 Server, only the 2008 R2 servers.
Connecting the the 2003 Shares using the FQDN or IP address works, but using \servername returns network path not found. Setting up WINS on the network did not resolve this, or adding a static entry in the hosts file to the server.
There is no firewall software installed on the servers and we don't use Symantec products on the network (No Symantec Endpoint security).
Viewing of the eventlog also turned up the Event ID: 1006, could not validate DNS server, even though name resolution appears to be functioning without a problem.
http://support.microsoft.com/kb/816621
http://technet.microsoft.com/en-us/library/dd296694%28WS.10%29.aspx
https://serverfault.com/questions/205043/windows-share-the-specified-network-name-is-no-longer-available

Is there a way for the cache to stay up without timeout after crash in AppFabric Cache?

First my setup that is used for testing purpose:
3 Virtual Machines running with the following configuration:
MS Windows 2008 Server Standard Edition
Latest version of AppFabric Cache
Each one has a local network share where the config file is stored (I have added all the machines in each config)
The cache is distributed but not high availibility (we don't have Enterprise version of Windows)
Each host is configured as lead, so according to the documentation at least one host should be allowed to crash.
Each machine has the website I testing installed, and local cache configured
One linux machine that is used as a proxy (varnish is used) to distribute the traffic for testing purpose.
That's the setup and now on to the problem. The scenario I am testing is simulating one of the servers crashing and then bring it back in the cluster. I have problem both with the server crashing and bringing it back up. Steps I am using to test it:
Direct the traffic with Varnish on the linux machine to one server only.
Log in to make sure there is something in the cache.
Unplug the network cable for one of the other servers (simulates that server crashing)
Now I get a cache timeout and I get a service error. I want the application to still be up on the servers that didn't crash, and it take some time for the cache to come back up on the remaining servers. Is that how it should be? Plugging the network cable back in and starting the host cause a similar problem.
So my question is if I have missed something? What I would like to see happen is that if one server crashes the cache should still remaing upp since a majority of the leads are still up, and starting the crashed server again should bring it back gracefully into the cluster without any causing any problems on the other hosts. But that might no be how it works?

I ran through a similar test scenario a few months ago where I had a test client generating load on a 3 lead-server cluster with a variety of Puts, Gets, and Removes. I rebooted one of the servers multiple times while the load test was running and the cache stayed online. If I remember correctly, there were a limited number errors as that server rebooted, but overall the cache appeared to remain healthy.
I'm not sure why you're not seeing similar results, but I would try removing the Varnish proxy from your test and see if that helps.

how to read list of running processes on a remote computer in C++

What can be done to know and list all running processes on a remote computer?
One idea is to have a server listening to our request on the remote machine and the other one is to use ssh.
The problem is i dont know whether there will be such a server running on the remote machine and i cannot use ssh because it needs authentication.
Is there any other way out ?

If you
cannot install a server program on the remote machine
cannot use anything that requires authentication
then you should not be allowed to know the list of all running processes on a machine. That request would be a security nightmare!
You can do something much simpler without (as many) security problems: scan the publicly-available ports for programs that are running. Programs like nmap.org let you know a fair bit of information about the publicly-running programs on machines.

I have done something similar in the past using SNMP. I don't have the specifics in front of me, but something like "snmpwalk -v2 -c public hostname prTable" got me the process table. I recall later configuring SNMP to generate errors when the number of processes didn't meet our specified requirement, like httpd must have at least 1 and less than 50.

I suggest you look at the code for a remote login, rlogin. You could remotely login to an account that has the privileges that you need. Once logged in, you can fetch a list of processes.
This looks like a good application for a script rather than a C or C++ program.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js