We have a multi-threaded process which makes multiple calls to multiple target machines from a source machine using NetApi’s eg. NetServerGetInfo, LSAOpenPolicy, NetShareEnum, NetWKstaGetInfo, NetWKstaUserEnum etc… We make quite significant number of calls and have observed that over a period of time these calls fail. For example NetServerGetInfo starts returning error 53 after a while. This issue persist until we restart Workstation service or the machine. Accessing the target shares directly also does not work after such error is returned by our process.
The source machine from where we are making calls is a Win 2k8 R2 and the target machines are 2k3 servers.
We are suspecting some kind of issue with NetApi calls or some kind of handle leak.
Has anyone faced similar issues while using these APIs and managed to figure out a solution?
I found few references online for similar issues:
http://social.technet.microsoft.com/Forums/en-US/windowsserver2008r2networking/thread/9f93508c-71fa-4807-b41a-8f558563afe3/
Snippet from above link:
Experiencing the exact same issue as stated about except we have 2 Windows Server 2008 R2's acting as Terminal Servers connecting to Server 2003 Shares. Rebooting the terminal servers seems to resolve the problem for about 2-4 days and then re-appears. The XP/Vista/Win7 workstations on the network has no problem accessing the shares on the 2003 Server, only the 2008 R2 servers.
Connecting the the 2003 Shares using the FQDN or IP address works, but using \servername returns network path not found. Setting up WINS on the network did not resolve this, or adding a static entry in the hosts file to the server.
There is no firewall software installed on the servers and we don't use Symantec products on the network (No Symantec Endpoint security).
Viewing of the eventlog also turned up the Event ID: 1006, could not validate DNS server, even though name resolution appears to be functioning without a problem.
http://support.microsoft.com/kb/816621
http://technet.microsoft.com/en-us/library/dd296694%28WS.10%29.aspx
https://serverfault.com/questions/205043/windows-share-the-specified-network-name-is-no-longer-available
Related
I have an older dedicated PC running on my home network as a webserver. Trying to retire it by replacing it with a VM on a brand new workstation Santa brought me. Simple home hobbyist network consists of router, 10Gb switch and of course computing devices off of that.
The new machine is running Windows11 Pro, and via Hyper-V I have a CentOS 7 VM. I've configured the firewall to enable http service (and port 80) being accessible from outside my network. I'm running httpd. From behind my router/switch I can access the web server with no problem, from the host machine and other machines on my network). Alas - I'm unable to access this web server remotely/externally - even after turning of the VM's firewall and ensuring port forwarding was properly pointing to IP:80 from my router. I have been scouring the web/forums/etc. for days now - nothing I've tried seems to work.
Also, I was careful in ensuring the Hyper-V settings for a virtual switch are pointing to my actual hardware NIC and set accordingly as found here and other forums (see attached image for details).
From all the "experimentations" I've tried - it's seems like the port is just not being forwarded from my router properly. So it's really pointing towards my router at this point. BUT - I can and have configured real hardware before (over decades) with no problem. Since I'm NEWB to Hyper-V and VM's - I'm worried some setting may not be correct.
Thus - reaching out to anyone with similar experience who's solved this problem. Thanks in advance.
Here's a graphic in which I captured some of the many things I've tried to no avail.
settings, etc.
I have successfully built the client and server modules from the Getting Started with Winsock tutorial.
I have a desktop and a laptop both connected to my wireless router – both running Windows 10.
Running the client module on the laptop, I am able to successfully transmit data back-and-forth to the desktop (running the server module) using the desktop's IP address.
Running the client module on the desktop with the laptop's IP address as the command line argument, I get an "Unable to connect to server!" message after a ten second delay.
If I try to run both modules on the desktop in separate console windows using the "localhost" command line argument, the client console displays "Bytes sent: 14" and hangs waiting for a response from the server – however this works if I use either the desktop name or the desktop IP address in place of "localhost".
I am able to run both modules on the laptop using either "localhost", the laptop name, or the laptop IP address as arguments.
I have gone through the same motions with port 27015 forwarded on the router and incoming and outgoing firewall exceptions added to both the desktop and the laptop – there is no difference either way.
Any assistance would be greatly appreciated as I cannot figure out why this works in one direction but not the other.
Thank you for the suggestions Karsten and Andriy. I first tried getting the two computers to ping each other and neither was successful. After researching online, I was able to get them to ping after turning on "echo requests" in the firewall settings, but my original problem persisted. I then tried turning off both firewalls and I was able to get my server and client programs to work both ways. That wasn't a great long-term solution, so I tried selectively disabling the firewalls and realized it was an issue on the laptop's end. I noticed that my "server.exe" program was in the allowed apps list twice – one instance granting private access and one granting public access – but only one instance was active. I deleted both and added "server.exe" again with both public and private access boxes checked, which solved my issue.
First my setup that is used for testing purpose:
3 Virtual Machines running with the following configuration:
MS Windows 2008 Server Standard Edition
Latest version of AppFabric Cache
Each one has a local network share where the config file is stored (I have added all the machines in each config)
The cache is distributed but not high availibility (we don't have Enterprise version of Windows)
Each host is configured as lead, so according to the documentation at least one host should be allowed to crash.
Each machine has the website I testing installed, and local cache configured
One linux machine that is used as a proxy (varnish is used) to distribute the traffic for testing purpose.
That's the setup and now on to the problem. The scenario I am testing is simulating one of the servers crashing and then bring it back in the cluster. I have problem both with the server crashing and bringing it back up. Steps I am using to test it:
Direct the traffic with Varnish on the linux machine to one server only.
Log in to make sure there is something in the cache.
Unplug the network cable for one of the other servers (simulates that server crashing)
Now I get a cache timeout and I get a service error. I want the application to still be up on the servers that didn't crash, and it take some time for the cache to come back up on the remaining servers. Is that how it should be? Plugging the network cable back in and starting the host cause a similar problem.
So my question is if I have missed something? What I would like to see happen is that if one server crashes the cache should still remaing upp since a majority of the leads are still up, and starting the crashed server again should bring it back gracefully into the cluster without any causing any problems on the other hosts. But that might no be how it works?
I ran through a similar test scenario a few months ago where I had a test client generating load on a 3 lead-server cluster with a variety of Puts, Gets, and Removes. I rebooted one of the servers multiple times while the load test was running and the cache stayed online. If I remember correctly, there were a limited number errors as that server rebooted, but overall the cache appeared to remain healthy.
I'm not sure why you're not seeing similar results, but I would try removing the Varnish proxy from your test and see if that helps.
We occasionaly have a problem where we attempt to start the Jrun service and it fails with the following two errors:
error JRun Naming Service unable to start on port 2902
java.net.BindException: Port in use by another service or process: 2902
info No JDBC data sources have been configured for this server (see jrun-resources.xml)
error java.net.BindException: Port in use by another service or process: 8300
We then have to reboot the machine and Jrun comes up with no problem. This is very intermittent - happens perhaps one out of every 10 times we restart Jrun services.
I saw another reference on StackOverflow that if Windows Services take longer than 30 seconds to restart Windows shuts down the startup proccess. Perhaps that is the issue here? The logs indeed indicate that these errors are thrown about 37+ seconds after the restart command is issued.
We are on a 64bit platform on WinServer 2008.
Thanks!
We've been experiencing a similar problem on some of our servers. Unfortunately, netstat never indicated any sort of actual port conflict for us. My suspicion is that it's related to our recent deployment of a ColdFusion "cumulative hotfix" to our servers. We use the multi-server edition of CF 8.0.1 enterprise with a large number of instances on each machine -- each with its own JVM and its own distinct set of ports. Each CF instance is attached to its own IIS website and runs as its own Windows Service.
Within the past few weeks, we started getting similar "port in use" exceptions on startup, on our 32-bit machines as well as our 64-bit machines, all of which are running Windows Server 2003. I found several possible culprits and tried the following:
In jrun-jms.xml for each CF instance, there's an entry for the RMI transport layer that reads <port>0</port> -- which, according to the JRun documentation, means "choose a random port." I made that non-random and distinct per instance (in the 2600-2650 range) and restarted each instance. Things improved temporarily, perhaps coincidentally.
In the same file, under the entry for the TCPIP transport later, every instance defaulted to <port>2522</port> -- so I changed those to distinct ports per instance in the 2500-2550 range and restarted each instance. That didn't seem to help at all.
I tried researching whether ports in the 2500-3000 range might be used for any other purpose, and I couldn't find anything obvious, and besides, netstat wasn't telling me that any of my choices were in use.
I found something online about Windows designating ports from 1024 to 5000 as the "dynamic port" range, so I added 10000 to the port numbers I had set in jrun-jms.xml and restarted each instance again. Still didn't help.
I tried changing the port in jndi.properties, also by adding 10000 to the port numbers. Unfortunately this meant wiping out all my wsconfig connections to IIS and creating them again from scratch. I had to edit wsconfig_jvm.config as well, adding -DWSConfig.PortScanStartPort=12900 to java.args, so it could detect my CF instances. (By default it only scans ports 2900-3000. See bpurcell.org for details. It's an old post but still relevant.) So far so good!
My best guess is that Adobe (or MS Windows) changed the way some of its code grabs "random" ports. But all I know for sure so far is that the steps outlined above appear to have fixed the problem.
Have you verified that the services are in fact stopping? Task manager should show no instances of jrun.exe. You can also check to see what is bound to that port by opening a command window and running
netstat -a -b
This will list all your open ports, plus what program is using them. You can also use
netstat -a -o
Which does the same thing as the above, but will list the process id instead of the program name. You can then cross-reference those with task manager. You'll need to enable showing the PIDs in task manager by going to View->Select Columns and making sure PID is checked. My guess would be that the jrun processes are not shutting down in a timely fashion.
I'm pretty perplexed... I've got 5 different test computers, all relatively blank Windows XP machines running similar hardware specs. I run a silent install of the FireBird (Classic) database and my application. Some computers require "localhost:" (or 127.0.0.1) before the database location to make a connection, and some simply don't work at all! This is running the exact same software across the board. Does anybody have any suggestions as to what needs to happen to make the connection string universal, or what I could be doing wrong??
It's firebird version 2.1.1.17910 Classic
By the way, i tried connecting to the same database using FlameRobin (a small db management tool) and it worked just fine on the computers that don't connect.
Any more information necessary just let me know! Thanks a lot in advance
For anybody's future reference, the answer is in the services. Apparently it's not being registered as a service for some reason, and on the working computers, was at some point registered, probably through some sort of far earlier tests of Interbase is my best guess.
C:\Windows\System32\drivers\etc and opening up the file 'services' and adding the following line allows the server to run properly.
gds_db 3050/tcp
I'm not sure whether you are aware of that, but a connection string without "localhost:" or "127.0.0.1:" in front of the database name or alias will use the local protocol, which can't be used when connecting to Firebird Classic Server (see this link for more information). If a host name or IP address is given, then TCP port 3050 will be used for the connection.
If you have registered a server in FlameRobin, and did not leave the hostname field in the registration dialog blank, then the host name will be part of the connection string. That would explain why you can connect using FlameRobin.
As for the differences between the machines: You should first go to the Firebird Server Manager applet and make sure that the server is indeed running on all machines, and that the version is the same.
Does it have something to do with the hosts file on some of the computers? Or is that what you're referring to with your
Some computers require "localhost:" (or 127.0.0.1) before the database location...
comment?