I have a really odd issue with WMI that I'm running into on a few machines on our network.
I have a piece of software (.NET/C#) written that scans an IP range on a local network, and then uses WMI to query certain data about the machines (computer names, .NET framework versions, among other things). One issue I've run into recently is that a small subset of these machines will not respond to WMI connections made via their IP address- they simply throw an "RPC Server is Unavailable" exception as if WMI isn't running to begin with.
This occurs both with the C# application and with a vbscript application that attempts a simple query to return the computer's name:
if wscript.arguments.count >= 1 then
host = wscript.arguments(0)
end if
if host = "" or isnull(host) then host = "."
connectionStr = "winmgmts:{impersonationLevel=impersonate}!\\" & host & "\root\cimv2"
wscript.echo connectionStr
set objWMIService = GetObject(connectionStr)
set objCompName = objWMIService.ExecQuery("Select * from Win32_ComputerSystem")
for each x in objCompName
wscript.echo x.Name
next
This returns the following as results:
C:\>nslookup BROKENCOMPUTER
Address: 192.168.1.123
C:\>cscript testwmi.vbs 192.168.1.123
winmgmts:{impersonationLevel=impersonate}!\\192.168.1.123\root\cimv2
C:\testwmi.vbs(9, 1) Microsoft VBScript runtime error: The remote server machine does not exist or is unavailable: 'GetObject'
C:\>cscript testwmi.vbs BROKENCOMPUTER
winmgmts:{impersonationLevel=impersonate}!\\BROKENCOMPUTER\root\cimv2
BROKENCOMPUTER
I can still open a WMI connection if I refer to the computer by its host/computer name. I can also connect to other servers running on the machine via IP address (such as HTTP or RDP)- a request tp http://192.168.1.123 returns successfully.
To make things even weirder, the behavior isn't even consistent. Sometimes the connection to the IP will work correctly, and it happens in batches. To test this, I set up a script that repeatedly spammed a WMI request every 5 seconds to the computer in question and recorded the result (and trends of results). What I found was that all requests would fail or succeed for about a certain number of requests (180- a 15 minute interval) or a multiple of it. Example:
- Start script
- 35 successful requests in a row
- 180 failed requests in a row
- 180 successful requests
- 360 failed requests
- 180 successful requests
- 180 failed requests
- 900 successful requests
- etc etc
I then ran this script on two machines at the same time. What I found was the behavior between the two was similar (had several-minute-long-intervals of being able to connect and not being able to connect) but did not sync up between the two; there were periods where both could connect, periods where only one (or the other) could connect, and periods where neither could connect.
I know this is an incredibly weird and specific problem, and I don't really expect anyone to be able to insta-solve it, but I was wondering if anyone had any hints or direction? I've spoken to the network guys here and they're just as puzzled over the issue as I am.
I can add some perspective on this, in addition to the fine answer from MisterZimbu. Assuming Microsoft doesn't remove my comments on this article, see http://msdn.microsoft.com/en-us/library/windows/desktop/aa393720%28v=vs.85%29.aspx. Basically, Microsoft seems to be doing a reverse DNS lookup when IP addresses are passed into WMI. If your DNS isn't squeaky clean, you will get "unpredictable results", which is to say that you will be connecting to machines you didn't expect to connect to.
Adding the period to the IP address forces the reverse (or forward) lookup to fail, and then by some miracle, they actually use the IP address, and not the (potentially incorrect) hostname returned from DNS. It appears that appending a period to the IP address can be used in many contexts (UNC's, browser, etc.), but there are caveats and other failures you might encounter. Note that if you look at your DNS cache (ipconfig /displaydns) you will see the failed lookups when the period is appended, so it doesn't stop the OS from doing the lookup - it just ensures that the stale DNS entries won't be used.
Oddly enough, adding a "." to the end of the IP address when making the query corrects the issue. I assume this forces it to go through DNS resolution or something like that.
So connecting via
winmgmts:{impersonationLevel=impersonate}!\\192.168.1.123.\root\cimv2
seems to work correctly 100% of the time.
Still would be great to know what actually is the underlying cause of the issue though.
Related
I am currently developing a tool that automatically connects and authenticates users to certain wireless hotspots under given circumstances.
To test if the device is behind a captive portal i send a http request via wininet and check if it gets redirected (yes i am aware of NCSI but it does not work correctly in this case).
If i do that directly after i get the callback for a successfull wlan connection i receive error 12007 (name not resolved) which i assume is because of the ipconfig not being fully applied at that point. If i put in a Sleep() for 2-3 seconds i dont receive the error (since i have one of the faster devices in our hardware-lineup it might vary on other target devices).
Is there a way i can programmatically check if the config has been fully applied to the interface?
Target-OS is Windows 7
Retrying like Jon suggests is not really a feasable Option in this case since I have to enable a Hotspot registration Mode in the Firewall which closes again after a certain number of network operations which is why I would like to avoid that.
Normally for a situation like this, if you error is catchable, you would retry for a certain amount of time then give up (timeout) with the most recent error. This is simpler and the same logic the OS would be implementing anyways
So, in this case I would:
For X(default 30) seconds at most {
test if I can get a dns resolution
delay 1 second
}
Quite possibly the easiest solution would be to do a DNS lookup, using a randomly generated name within a domain that you control. E.g. 79BF2DA7-EE45-4E11-89A4-45EEF2838003.guid.example.com. This should of course fail, but it has to fail by returning a negative response from the DNS server. And that DNS server has to be reachable to return a negative response.
I have a principal database (server_A), mirror database (server_B), and a witness database (server_C). The databases are set up for automatic failover, that is, when server_A goes down or fails over, server_B assumes the role of the new principal database. The database quorum is set up correctly to the best of my knowledge.
I have written an application in c++ to connect to the database and get a value to ensure a true connection. The application detects when a failure occurs on the GetValue call and attempts to reconnect when the error occurs.
The issue is this:
When I have MULTIPLE connections to the database (two threads connected, once connected, it will get a value in a loop), when the failover occurs (stopping sql server on server A so server B will take over as principal), I detect the connection failure and destroy my connection and attempt to reconnect using the same connection string:
"Driver={SQL Native Client};Server=tcp:Server_A;Failover_Partner=tcp:Server_B;Database=SomeDatabase;Uid=SomeUser;Pwd=SomePassword;"
** NOTE **
I have verified that the failover has taken place by monitoring the databases.
Even though, the connection to the database has been properly disposed of, I cannot reconnect to the database until I restart the application, OR if I bring server_A back online (now acting as the mirror database) and then failover server_B (shutting down sql server) making server A the principal database again, the application can reconnect without having to completely close out.
Though I could manipulate the connection string to make server_B the new principal and server_A the new Failover_Partner, this is not an ideal solution as many more connections will be utilized.
Keep in mind, this ONLY happens with multiple connections to the database. If I run the application with only one connection, all is fine and I can reconnect just fine when the failover occurs.
EDIT: If I connect in the beginning with multiple threads, all is fine. When I shutdown SQL Server, and therefore a failover occurs, I can reconnect only when I go through and delete ALL objects and re-instantiate new objects. Also, I am using SQL Native Client 11.0 (ODBC). Thoughts?
A lot of what you're describing is consistent with the issue described in KB 2605597 "Time-out error when a mirrored database connection is created by the .NET Framework data provider for SQLClient."
The KB describes problems when the connection timeout is set to 15 seconds, I have anecdotally heard of similar problems when the connection timeout is set to 0 (which isn't a good idea for other reasons, mentioning just in case).
This hotfix is applied to the application servers. If you want to rule this out as a possible cause, you could test raising the timeout (like it says in the workaround sections of the post) to make sure it's not the issue.
Later thought: The other thing I notice that is unusual is that you're specifying the TCP protocol in the connection string and the failover partner name. It's not clear to me from the documentation that it's supported in the failover partner name. You might want to try removing that and specifying the network attribute instead. (Recommended here.)
I do understand that you believe the issue isn't these things due to the single / multiple connections issue you've tested out.
However, I think you're better off simplifying the connection string so it's as consistent as possible with the published examples and making sure it's not the issues that people have commonly hit with this first. (The retry issue happens when there is latency, which can make it somewhat sporadic.)
Ok I have found the answer.
I had to modify the hosts file because my application did not reside in the same domain as the databases. Therefore when trying to fail over, I could not reach the database with the instance name (which is what the failover partner was cached as). I changed the hosts file to resolve the instance name to the ip address of the machine and it all works now.
In a Linux/C++ TCP server I need to prevent a malicious client from opening multiple sockets, otherwise they could just open thousands of connections until the server crashes.
What is the standard way for checking if the same computer already has a connection on the server? If I do it based off of IP address wouldn't that mean two people in the same house couldn't connect to the server at the same time even if they are on different computers?
Any info helps!
Thanks in advance.
TCP in itself doesn't really provide anything other than the IP address for identifying clients. A couple of (non-exclusive) options:
1) Limit the number of connections from any IP address to a reasonable number, like 10 or 20 (depending on what your system actually does.) This way, it will prevent malicious DoS attacks, but still allow for reasonable usability.
2) Limit the maximum number of connections to something reasonable.
3) You could delegate this to a higher-layer solution. As a part of your protocol, have the client send a unique identifier that is generated only once (per installation, etc). This could be easily spoofed, however.
I believe 1 and 2 is how many servers handle it. Put them in config files, so they can be tuned depending on the scenario.
There is only IP address to base "is this the same sender", unless you have some sort of subscription/login system (but then someone can try to log in a gazillion times at once, since there must be some sort of handshake for logging in).
If two clients are using the same router (that uses NAT or some similar scheme), your server will see the same IP address, so allowing only one connection per IP address wouldn't work very well for "multiple users from the same home". This also applies if they are for example using a university network or a company network.
So depending on what you are supplying and how many clients you can expect from the same place, you may need to go a fair bit higher than 10. Of course, if you log when this happens, and you see a fair number of "looks like valid real users failing to get in", you can adjust the number.
It may also make sense to have some sort of "rolling average", so you accept X new connections per Y seconds from each IP address, rather than having a fixed maximum number. This is meaningful if connections last quite some time... For short duration connections, it's pretty pointless...
I need to write a win32 c/c++ application which will be able to determine whether the PC it's running on is connected to one of 2 networks. The first network is the company LAN (which has no internet connection) and the second network is a standalone switch with a single PC connected to it (the PC that the program is running on).
I'm pretty new to network programming but so far I have tried testing to see if a network drive which is held on our LAN can be mapped. This works fine if the PC is connected to the LAN, the drive mapping succeeds so so LAN detection is successful. However, if the PC is connected to the switch, this results in a VERY long timeout which is not a suitable as it will delay the program so much as to make it unusable.
Does anyone have any alternative suggestions?
I'm using c/c++ in VS 6.0
[Update]
Whilst trying a few different ideas and looking at some of the suggestions below I thought I should update with some additional information as many (if not all) of the suggestions I don't think will work.
(1) The aforementioned LAN has no external connections at all, it is completely isolated so no resolving of external DNS or pinging websites is possible.
(2) Hostname, MAC address, IP, Default Gateway, Subnet etc etc (basically everything you see in ipconfig -all) are all manually configured (not dynamic from the router) so checking any of these settings will return the same whether connected to the LAN or the switch.
(3) Due to point (2), any attempts to communicate with the switch seem to be unsuccessful, in fact almost all networking commands (ping, arp etc) seem to fail - I think due to the machine trying to connect to the LAN when it isn't there :-(
One thing I have found which works is pinging the default gateway IP which times out when connected to the switch. This is sort of ok as I can reduce the timeout of ping so it doesn't just hang for ages but it feels like a bit of a hack and I would certainly appreciate any better solutions.
Thanks
As far as TCP/IP is concerned there is no such thing as a LAN on WAN. There are a set of non-internet routable addresses like 192.168.x.x and 10.x.x.x but these are sometimes used by ISP short of IP addresses.
You best bet is to use Asynchronous APIs when making TCP/IP connections. WIN32 defines a whole buch of OVERLAPPED APIs for this purpose. This will prevent your application from grinding to a halt while waiting for a remote connection.
Alternatively put the socket stuff into another thread and then only notify the UI when the operation is done.
I would first try to differentiate between the two using information available locally--that is, from your computer. Does the output of ipconfig /all differ depending on which network you're connected to? If so, exploit that difference if you can.
Is it possible to get the MAC address of the standalone switch? Of the switch that controls the company LAN? That would be a sure way to tell. Unless somebody cloned the MAC address.
If you try using the existence or non-existence of some network service to determine which network you're connected to, you can never be sure. For example, if you failed to map that network drive, all you know is that the network drive isn't available. You can't say for certain that you're not connected to the company LAN. Same is true if you use ping. Lack of response from a particular machine means only that the machine didn't respond.
Various things you can look at for differentiation:
DNS domain name (GetComputerNameEx)
MAC address of gateway (ping it, then GetIpNetTable)
Routing table(do you have a gateway and default route on the company LAN)
WNet discovered network resources (WNetOpenEnum, WNetEnumResource)
Ability to resolve external hostnames (try a 5-10 names like www.google.com, www.microsoft.com and so on, if one resolves you should have internet)
You'll have to decide how many indicators are "enough" to decide you're on one or the other LAN though if tests fail. Then keep retrying until you have a definite result.
http://msdn.microsoft.com/en-us/library/aa366071%28v=VS.85%29.aspx has a lot of network related functions that you can experiment with to create further indicators.
I'm using getaddrinfo to do DNS queries from C++ on Windows. I used to use the Windows API DnsQuery and that worked fine, but when adding IPv6 support to my software I switched to getaddrinfo. Since then, I've seen the following:
My problem is that some times getaddrinfo take very long time to complete. The typical response from getaddrinfo takes just a few milliseconds, but roughly 1 time out of 10000, it takes longer time, in some cases around 15 seconds but there's been several cases when it takes several minutes.
I've run Wireshark on the server and analyzed my applications debug logs and see the following:
I call the function getaddrinfo.
15 seconds later, my machine queries the DNS server.
Some milliseconds later, I get the response from the DNS server.
The weird thing here is that the actual DNS query only takes a tenth of a second, but the time getaddrinfo actually executes is much longer.
The problem has been reported by many users, so it's not something specific to my machine.
So what does getaddrinfo do more than contact the DNS server?
Edit:
The problem has occurred with several addresses. If I try to reproduce the problem using these addresses, the problem does not occur.
I have done something stupid. Upon every DNS query, the etc/services is parsed. However, that doesn't explain a delay on several minutes. (thanks D.Shawley)
Edit 2
One type of DNS queries made by my software is anti-spam DNSBL queries. The log from one user showed me that the lookup for ip.address1.example.com always seemed to take exactly 2039 seconds, while the lookup for another.ip.address.example.com always took exactly 1324 seconds. The day after that, the lookups for those addresses were just fine. At first I thought that the DNS BL authors had put some kind of timeout on their side. But if this was the core problem, getaddrinfo should have timed out earlier?
Windows has a local daemon that does DNS caching. Your call to getaddrinfo() is getting routed to that daemon, which presumably is checking its cache before submitting the query to your DNS server.
See Windows Knowledge Base article 318803 for more information on disabling the cache.
[Edited]
It sounds to me as though your Windows Server 2003 instance is not configured correctly for IPv6. Once the IPv6 lookups timeout, it will fall back to IPv4. Knowledge Base articles that might help include:
Windows Server 2003 Deployment Guide >>> Configuring DNS for IPv6/IPv4 Coexistence
TechNet Library >>> Internet Protocol Version 6
TechNet Library >>> Using Windows Tools to Obtain IPv6 Configuration Information
Unfortunately, I don't have access to any Windows Servers, so I can't test/replicate this myself.