SQL-Server Connection Fails after Network Reconnect - c++

I am working on an update to an application that uses DAO to access an SQL Server. I know, but let's consider DAO a requirement for now.
The application runs all the time in the system tray and periodically performs SQL server operations. Since it is running all the time, and users of the application will be on laptops and transitioning between buildings, I've designed it to quietly transition between active and inactive states. When the database connection is successful operations resume.
I have one last issue before I release this update: When a connection is dropped, then reestablished, the SQL operations fail. This occurs only if I have specified the hostname in my connection string. If I use the IP, everything is fine (but I need to be able to use hostname).
Here is the behavior:
1) Everything working. Good network connection, database operations are fine.
2) Lost connection. Little 'x' appears on task bar icon, and nothing else. All ok.
3) Reconnect.
At step 3, I get an 'ODBC--call failed' error when I run the first query. Interestingly, the database is first opened without error.
If I skip step 1, and start the application when the connection is down, everything works fine in step 3, hostname or not.
I expect this is an issue with the DAO engine caching the DNS entry after the first connection, although the destination IP does not change so I'm not sure about that. I have tried flushing the windows DNS cache (from cmd prompt) to no effect. The same behavior occurs even when I'm using my local hostname with a local SQL server I set up for development. 127.0.0.1 has no problems.
I also tried to CoUninitialize() the DAO interface between active times, but I had trouble getting this to work. If someone thinks that would help I will work harder at it.
This behavior is the same in Windows XP or 7.
Thanks for anything you've got!
Edit: I should have mentioned - I am closing the database connection between the attempts, then reopening it with
m_pDb = m_pDaoEngine->OpenDatabase()

I ended up biting the bullet and converting the application to ADO. Everything works nicely now, and database operations are much faster to boot.

Related

Amazon Redshift: Queries never finish running after period idle

I am working on a new Amazon Redshift database that I recently started.
I am experiencing an issue where after I connect to the database, I can run queries without any issue. However, if I spend some time without running anything (like, 5 minutes), when I try running another query or command, ir never finishes.
I am using dBeaver Community 21.2.2 to interact with the connection, and it stays "Executing query" forever. The only way i can get it to work is by cancelling, disconnecting from the redshift, connecting again and then it executes correctly. Until I stop using for some minutes, and then it's happens all over again.
I tought this was a dBeaver issue, as we have a Meabase connected to this same cluster without any issues. But today, I tried manipulating this cluster with R using RJDBC, and the same thing happens: I can run queries, until I stop, and then when I try running something else it never stops, until I disconnect and connect again.
I'm sorry if I wasn't able to explain it clearly, I tried searching for simmilar issues but couldn't.
I suspect that the queries in question are not even being launched on the database. You can check this by reviewing svl_statementtext to see if the query is even being seen. Put a unique comment in the query to help determine if it is actually the query in question.
Since I've seen similar behavior before I'll write up a possible way this can happen. In this case the queries were not being seen by the database or the connection to the database was being dropped mid execution. The cause is network switches and their configurations.
Typical network connections are fairly quick - you ask for a web page and it is given to you. Connection is complete. When you click on a link a new connection is established and also end quickly. These network actions are atomic from a network connection point of view. However, database connections are different. One connection is made and many back and forth transmissions of data happen while the connection is open. No problem and with the right set of network configurations these connections can be open and idle for days.
The problem come in when the operators of the network equipment decide that connections that have no data flowing are "stale" after some fixed amount of time. They do this so that the network equipment can "forget" about these connections and focus on "active" connections. ISPs drop idle connections a lot so that they can handle the load of traffic and connections that flow through their equipment. This doesn't cause any issues for web pages and APIs but database connections get clobbered.
When this happens is look exactly like what you describe. Both sides (client and database) think that the connection is still active but the network equipment has dropped the connection. Nothing gets through but no notification is sent either party. You will likely see corresponding open sessions on the Redshift side for these dropped connections and the database is just waiting for the client to give a command on each of them. An administrator will need to go through and close (terminate) these sessions for them to go away.
Now the thing that doesn't align with experience is the speed at which these connections are being marked as "stale". In my case my ISP was closing connections that were idle for more than 30 min. You seem to be timing out much faster than this. In some cases corporate firewalls will be configured with short idle connection timeouts for routes out of the private network to the internet. So there are cases where the timeouts can be short. The networks at AWS do not have these timeouts so if your connections are completely within AWS then this isn't your answer.
To address this there are a few ways to go. The easy way is to set up a tunnel into AWS with "keep alive" packets sent every 30 sec or so. You will need an ec2 instance at AWS so it isn't cost free. Ssh tunneling is the usual tool for this and there are write-ups online for setting it up.
The hard way (but likely most correct way) is to work with network experts to understand when the timeout is happening and why. If the timeout cannot be changed then it may be possible to configure a different network topology for your use case. Network peering or VPN could address.
In some cases you may be able to not have jdbc or odbc connections at all. You see these protocols are valid but they are old and most networking doesn't work this way anymore which is why they suffer from these issues. Redshift Data API let's you issue SQL to redshift in a single package and check on completion later on. These API calls are each independent connections so there is no possibility of "timing out" between them. The downside is this process is not interactive and therefore not supported by workbenches.
So does this match what you have going on?

SQLAlchemy/Flask MSSQL Query hangs indefiniteley when run under Apache

My app uses a couple of different DBs on the same MSSQL server (read only). My problem is that one of the two MSSQL connections always works fine, whereas the other hangs indefinetely on the first query until Flask cuts off the connection. This, however, only happens most of the time when the app is running under Apache. When I run the flask test server, everything is fine.
I've surrounded the MSSQL query with logging messages and am therefore positive that the bug is in that particular query. It is just a simple lookup by primary key, like this:
db.query(Record).get(id)
The DBs are accessed through different engines whose URIs only differ by the database name.
My problem is that I have no idea on how to start debugging this. Any tips?
[EDIT] I've managed to get SQLAlchemy logging going under Apache. I've set echo=True on the engine, and it doesn't output anything at all. It just hangs.
Turns out that the problem doesn't have to do with Apache but with connection timeout. Whereas the MySQL engine gives an error message when trying to execute a query on an expired server connection, the MSSQL engine just silently stalls forever. Adding pool_recycle=3600 to create_engine() solved the issue.

Suddenly scheduled tasks are not running in coldfusion 8

I am using Coldfusion MX8 server and one of the scheduled task was running from 2 years but now suddenly from 01/12/2014 scheduled tasks are not running. When i browsed the file in browser then the file is running successfully without error.
I am not sure is there any updatation or license expiration problem. I am aware that mid of this year Adobe closed the support for coldfusion 8.
The first most common problem of this problem is external to the server. When you say you browsed to the file and it worked in a browser, it is very important to know if that test was performed on the server desktop. Knowing that you can browse to the file from your desktop or laptop is of small value.
The most common source of issues like this is a change in the DNS or network stack that is interfereing with resolution. For example, if the internal DNS serving your DMZ suddenly starts serving the "external" address - suddenly your server can't browse to your domain. Or if the IP served by the server for the domain in question goes from being 127.0.0.1 to some other IP that the server can't acces correctly due to reverse proxy or LB or some other rule. Finally, sometimes the Apache or IIS is altered so that an IP that previously was serviced (127.0.0.1 being the most common example) now does not respond.
If it is something intrinsic to the scheduler service then Frank's advice is pretty good - especially look for "proxy schduler" entries in the log - they can give you good clues. I would also log results of a scheduled task to a file. Then check the file. If it exists then your scheduled tasks ARE running - they are just not succeeding. Good luck!
I've seen the cf scheduling service crash in CF8. The rest of CF is unaffected.
Have you tried restarting the server?
Here are your concerns:
Your File (works since you tested it manually).
Your Scheduled Task (failed).
Your Coldfusion Application (Service) (any changes here)?
Your Server (what about here).
To test your problem create a duplicate task and schedule it. Leave the other one in place (maybe set your new one to run earlier). Use the same file too. See if it completes.
If it doesn't then you have a larger problem. Since the Coldfusion Server sits atop of the JVM there could be something happening there. Things just don't stop working unless something got corrupted or you got compromised. If you hardened your server by rearranging/renaming the file structure to make it more secure...It would break your task.
So going back: if your test schedule works then determine what is different between the two. Note you have logging capabilities. Logging abilities for CF8
If you are not directly incharge of maintaining this server, then I would recommend asking around and see if there was recent maintenance, if so, what was done to the server?

Auto failover multiple connections to mirror database when principal goes down

I have a principal database (server_A), mirror database (server_B), and a witness database (server_C). The databases are set up for automatic failover, that is, when server_A goes down or fails over, server_B assumes the role of the new principal database. The database quorum is set up correctly to the best of my knowledge.
I have written an application in c++ to connect to the database and get a value to ensure a true connection. The application detects when a failure occurs on the GetValue call and attempts to reconnect when the error occurs.
The issue is this:
When I have MULTIPLE connections to the database (two threads connected, once connected, it will get a value in a loop), when the failover occurs (stopping sql server on server A so server B will take over as principal), I detect the connection failure and destroy my connection and attempt to reconnect using the same connection string:
"Driver={SQL Native Client};Server=tcp:Server_A;Failover_Partner=tcp:Server_B;Database=SomeDatabase;Uid=SomeUser;Pwd=SomePassword;"
** NOTE **
I have verified that the failover has taken place by monitoring the databases.
Even though, the connection to the database has been properly disposed of, I cannot reconnect to the database until I restart the application, OR if I bring server_A back online (now acting as the mirror database) and then failover server_B (shutting down sql server) making server A the principal database again, the application can reconnect without having to completely close out.
Though I could manipulate the connection string to make server_B the new principal and server_A the new Failover_Partner, this is not an ideal solution as many more connections will be utilized.
Keep in mind, this ONLY happens with multiple connections to the database. If I run the application with only one connection, all is fine and I can reconnect just fine when the failover occurs.
EDIT: If I connect in the beginning with multiple threads, all is fine. When I shutdown SQL Server, and therefore a failover occurs, I can reconnect only when I go through and delete ALL objects and re-instantiate new objects. Also, I am using SQL Native Client 11.0 (ODBC). Thoughts?
A lot of what you're describing is consistent with the issue described in KB 2605597 "Time-out error when a mirrored database connection is created by the .NET Framework data provider for SQLClient."
The KB describes problems when the connection timeout is set to 15 seconds, I have anecdotally heard of similar problems when the connection timeout is set to 0 (which isn't a good idea for other reasons, mentioning just in case).
This hotfix is applied to the application servers. If you want to rule this out as a possible cause, you could test raising the timeout (like it says in the workaround sections of the post) to make sure it's not the issue.
Later thought: The other thing I notice that is unusual is that you're specifying the TCP protocol in the connection string and the failover partner name. It's not clear to me from the documentation that it's supported in the failover partner name. You might want to try removing that and specifying the network attribute instead. (Recommended here.)
I do understand that you believe the issue isn't these things due to the single / multiple connections issue you've tested out.
However, I think you're better off simplifying the connection string so it's as consistent as possible with the published examples and making sure it's not the issues that people have commonly hit with this first. (The retry issue happens when there is latency, which can make it somewhat sporadic.)
Ok I have found the answer.
I had to modify the hosts file because my application did not reside in the same domain as the databases. Therefore when trying to fail over, I could not reach the database with the instance name (which is what the failover partner was cached as). I changed the hosts file to resolve the instance name to the ip address of the machine and it all works now.

How to Stop A ColdFusion MX-7 Scheduled Task that has already started

I inherited a coldfusion MX7 application that has a long running process, which the user kicked by accident.
By my calculations, at the current rate, the job will run for 3 days.
There doesnt seem to be a way through the administrator interface to stop a job that is running.
The table that is being filled can be easily repopulated, so I would think stopping the coldfusion service wont effect anything except the table, which isnt a problem.
Am I right? Is that safe? Is there another way?
a one-time restart of the service should be fine. for the future, you may want to add a required url param or other such safety mechanism to keep this long process from accidentally going off.
Check to see if the task already has an explicit timeout out set
Explicit Timeout
Otherwise the default page time out is used
Server Settings
For newer versions of ColdFusion 8 and above, you can kill a running page with with the Server Monitor in the section labeled "Aborting unresponsive or troublesome requests/threads"
Using server monitor.
It also may be possible to stop the processing by killing the SQL Server Task:
Is there a task manager of sorts for SQL Server 2008 and on?