Has anyone come across memory leaks containing the macromedia.jdbc.sqlserverbase classes? Using MAT, I can see 30,000+ instances of the above classes retained size is 1.3GB, each one appears to map to a single instance of a (different) CFQuery. Ie, I can see the query SQL in there, and I can see the results (in a TDS object)
The app is kind of busy, but there is no way that there is 30,000+ line CFQuery tags.
We are not caching the CfQuery tags.
I'm really struggling to see the GC root that is holding onto these.
We use the driver that ships with CF for some very high traffic sites with heavy DB usage and do not see issues with it. The only issue I have seen is related to networking - where a port will auto-sync to get a different speed or duplex and leave connections sort of hanging. It only happens with certain switches but when it does it sort of behaves like that (a bunch of hanging connections). FYI - on a busy server you might have 30k+ dereferenced objects (instances of this or that) hanging around waiting for GC. That wouldn't be unusual. Does GC Recover these objects? Do you get your 1.3Gigs back?
Related
I have an application that I was written and working on the past few years. The application works as a strategy mmo server side.
The application uses the MySql c++ connector.
As you know the MySql can not be used with one connection by more than one thread. So, although my application is multi-thread but it has been designed to work with MySql from one thread.
Now I've decided to cut some of the functionalists from the main thread and put them in another thread. In fact I'm gonna create new threads for some of the tasks.
I know that it could be a little weird and wrong, and also I know that I shouldn't create lot's of connections to the database, but what happen if I forget the connection pool and create a connection for every query?
Is there anybody that know what will happen to the server application if i do this?
I don't exactly know what you mean when you ask "what will happen to the server application if i [sic] do this", but I'm assuming you are worried about performance, so I'll try and answer that.
There's a two-part article about the effects of SSL encryption on MySQL performance over on "MySQL Performance Blog" (here's part one and part two.) I suggest you give it a read.
The author is evaluating the performance overhead of using (built-in or otherwise) SSL encryption on MySQL query throughput and latency. This might not sound relevant, but as part of his benchmark, he also evaluates the effects of SSL on a program that is continually connecting to and disconnecting from the server as fast as possible.
By comparing the first chart in part 1 of the article (i.e. "Sysbench Read-Only - Throughput") with the third chart (i.e. "Connection Throughput") you'll see that repeatedly connecting to the server in a multithreaded environment will result in about 10x to 20x (from 30K-50K to less than 3K) reduction in the number of queries performed per second.
Even the third chart alone can tell you a lot. According to that graph, if you create a new connection per query, you will only be able to perform less than 3000 queries per second at the absolute maximum.
This might help you calculate an upper bound on the performance of your game server. If you connect to the database for each and every query, even if you use a lot of threads, you will only be able to make three thousand connections. If you need (even theoretically) more than 3K queries per second, you cannot use this method. You'll need a connection pool or one connection per thread or something.
Two things to note:
The 3K-connections-per-second number is when you do not do any queries at all. It you are actually performing queries, your number will be much worse.
The website I've linked to is (I think) run by some Percona developers (which is a MySQL replacement/alternative.) The benchmark numbers look sensible to me, but you might want to draw your own conclusions.
Sitecore.NET 6.6.0 (rev. 130404)
In our project we are using Sitecore.Search.IndexSearchContext to perform all of our queries. Specifically, we use IndexSearchContext.Searcher method to get access to the internal Lucene searcher and pass Lucene queries to it.
I have found out (via web articles and experimentation) that if we reuse the same IndexSearchContext instance to perform all of our queries, it's significantly faster than creating and destroying an IndexSearchContext for each query that gets executed.
I have also read that IndexSearchContext is not sensitive to index updates which are made after the IndexSearchContext was created. Because of this, I'm disposing the shared IndexSearchContext and creating a new one every 30 seconds so that queries would get latest results with only a 30 second delay. This approach requires me to carefully handle thread-safety of creation and dispose of the shared index searcher.
Is this a safe approach to do things? Is it discouraged to reuse an application-wide index searcher in sitecore?
thanks
I would suggest you hook up to the "publish:end" and "publish:end:remote" (in a multi server environment), and drop your IndexSearchContext when these events fire. Ultimately you're in a Sitecore environment, and only when new content publishes, should your index become out of date. This version of the truth is a bit simplified, admittedly, as I don't know the full extent of the application you're running.
To be honest I haven't seen any performance issues with spawning many IndexSearchContext. Unless you have an extreme use of it and need an extremely optimized environment, I would advise against it. I have seen a lot of problems with locked indexes and you also might run into some HTML cache issues (if used).
All in all it sounds a bit like premature optimization. However I do not know your complete setup and I may be wrong.
I have tried the approach and can confirm it works. I'm recreating the index searcher every 10 seconds and that has significantly improved the number of concurrent requests that can be handled. However, Sitecore's IndexSearchContext cannot be shared like this (it's intended to be created and destroyed in a single thread). What I did was instantiating a raw Lucene IndexSearcher and sharing that across the application.
I would like the BHO instances of my IE extension to be able to share common data. I just need them to share a couple of variables, so I am trying to find an easy solution for the problem.
The alternatives I can think of, from easier to more complex are:
1) Writing/reading data to/from the file system or to the registry, see MSDN article and Codeproject article. Question: is this information accessible from BHO instances running in different threads?
2) Developing a Windows Service or a background application that communicates with all BHO instances, see MSDN article. Problem: I have NO IDEA how to make this, or where to start with. I am worried about the user having to install to many things.
3) Providing IPC mechanisms so that the different BHO instances can communicate directly to each other. Like using the IGlobalInterfaceTable, see ookii article. Problem: Yes, you can store pointers in this IGlobalInterfaceTable and get cookies to access them back, but how can you share one cookie obtained in BHO Instance 1 with BHO Instance 2, so that the second instance can access the data inserted in the IGlobalInterfaceTable by the first one? Aren't we having here the same data sharing problem again?
Well, as you see, after a whole week looking for a solution I simply don't know how to start dealing with this problem. Any help would be greatly appreciated.
Often, Memory Mapped Files are used for this purpose. It's a non-trivial amount of work, however, as you must ensure that they are ACL'd properly to allow cross-process access (each tab may be in a different process) and work across multiple integrity levels.
1 sort of, except the place a normal web site can write is isolated from a trusted web site can access.
2 Writing a service is probably the easiest way, given the abundant amount of documentation on how to write a windows service (you even get an ATL project wizard if you use Visual C++), and your broker code can survive a tab processes crash or even a user log off.
3 Indeed you have the same sharing problem again, COM messages are blocked by UIPI unless you can change the message filter, but the messages used by COM are not documented. I would use something like named pipe/memory file mapping.
You need to host the communication broker code somewhere and only create it once. You can write something like how computers in a workgroup elect the master browser (kind of chatty), or have a broker process to do the communication work (e.g in a windows service).
I'm writing a project in C++/Qt and it is able to connect to any type of SQL database supported by the QtSQL (http://doc.qt.nokia.com/latest/qtsql.html). This includes local servers and external ones.
However, when the database in question is external, the speed of the queries starts to become a problem (slow UI, ...). The reason: Every object that is stored in the database is lazy-loaded and as such will issue a query every time an attribute is needed. On average about 20 of these objects are to be displayed on screen, each of them showing about 5 attributes. This means that for every screen that I show about 100 queries get executed. The queries execute quite fast on the database server itself, but the overhead of the actual query running over the network is considerable (measured in seconds for an entire screen).
I've been thinking about a few ways to solve the issue, the most important approaches seem to be (according to me):
Make fewer queries
Make queries faster
Tackling (1)
I could find some sort of way to delay the actual fetching of the attribute (start a transaction), and then when the programmer writes endTransaction() the database tries to fetch everything in one go (with SQL UNION or a loop...). This would probably require quite a bit of modification to the way the lazy objects work but if people comment that it is a decent solution I think it could be worked out elegantly. If this solution speeds up everything enough then an elaborate caching scheme might not even be necessary, saving a lot of headaches
I could try pre-loading attribute data by fetching it all in one query for all the objects that are requested, effectively making them non-lazy. Of course in that case I will have to worry about stale data. How would I detect stale data without at least sending one query to the external db? (Note: sending a query to check for stale data for every attribute check would provide a best-case 0x performance increase and a worst-caste 2x performance decrease when the data is actually found to be stale)
Tackling (2)
Queries could for example be made faster by keeping a local synchronized copy of the database running. However I don't really have a lot of possibilities on the client machines to run for example exactly the same database type as the one on the server. So the local copy would for example be an SQLite database. This would also mean that I couldn't use an db-vendor specific solution. What are my options here? What has worked well for people in these kinds of situations?
Worries
My primary worries are:
Stale data: there are plenty of queries imaginable that change the db in such a way that it prohibits an action that would seem possible to a user with stale data.
Maintainability: How loosely can I couple in this new layer? It would obviously be preferable if it didn't have to know everything about my internal lazy object system and about every object and possible query
Final question
What would be a good way to minimize the cost of making a query? Good meaning some sort of combination of: maintainable, easy to implement, not too aplication specific. If it comes down to pick any 2, then so be it. I'd like to hear people talk about their experiences and what they did to solve it.
As you can see, I've thought of some problems and ways of handling it, but I'm at a loss for what would constitute a sensible approach. Since it will probable involve quite a lot of work and intensive changes to many layers in the program (hopefully as few as possible), I thought about asking all the experts here before making a final decision on the matter. It is also possible I'm just overlooking a very simple solution, in which case a pointer to it would be much appreciated!
Assuming all relevant server-side tuning has been done (for example: MySQL cache, best possible indexes, ...)
*Note: I've checked questions of users with similar problems that didn't entirely satisfy my question: Suggestion on a replication scheme for my use-case? and Best practice for a local database cache? for example)
If any additional information is necessary to provide an answer, please let me know and I will duly update my question. Apologies for any spelling/grammar errors, english is not my native language.
Note about "lazy"
A small example of what my code looks like (simplified of course):
QList<MyObject> myObjects = database->getObjects(20, 40); // fetch and construct object 20 to 40 from the db
// ...some time later
// screen filling time!
foreach (const MyObject& o, myObjects) {
o->getInt("status", 0); // == db request
o->getString("comment", "no comment!"); // == db request
// about 3 more of these
}
At first glance it looks like you have two conflicting goals: Query speed, but always using up-to-date data. Thus you should probably fall back to your needs to help decide here.
1) Your database is nearly static compared to use of the application. In this case use your option 1b and preload all the data. If there's a slim chance that the data may change underneath, just give the user an option to refresh the cache (fully or for a particular subset of data). This way the slow access is in the hands of the user.
2) The database is changing fairly frequently. In this case "perhaps" an SQL database isn't right for your needs. You may need a higher performance dynamic database that pushes updates rather than requiring a pull. That way your application would get notified when underlying data changed and you would be able to respond quickly. If that doesn't work however, you want to concoct your query to minimize the number of DB library and I/O calls. For example if you execute a sequence of select statements your results should have all the appropriate data in the order you requested it. You just have to keep track of what the corresponding select statements were. Alternately if you can use a looser query criteria so that it returns more than one row for your simple query that ought to help performance as well.
Imagine to have a Desktop application - could be best described as record keeping where the user inserts/views the records - that relies on a DB back-end which will contain large objects' hierarchies and properties. How should data retrieval be handled?
Should all the data be loaded at start-up and stored in corresponding Classes/Structures for later manipulation or should the data be retrieved only at need, stored in mock-up Classes/Structures and then reused later instead of being asked to the DB again?
As far as I can see the former approach would require a bigger memory portion used and possible waiting time at start-up (not so bad if a splash screen is displayed), while the latter could possibly subject the user to delays during processing due to data retrieval and would require to perform some expensive queries on the database, whose results and/or supporting data structures will most probably serve no purpose once used*.
Something tells me that the solution lies on an in-depth analysis which will lead to a mixture of the two approaches listed above based on data most frequently used, but I am very interested in reading your thoughts, tips and real life experiences on the topic.
For discussion's sake, I'm thinking about C++ and SQLite.
Thanks!
*assuming that you can perform on Classes/Objects faster operations rather than have to perform complicated queries on the DB.
EDIT
Some additional details:
No concurrent access to the data, meaning only 1 user works on the data which is stored locally.
Data is sent back depending on changes made humanly - i.e. with low frequency. This is not necessarily true for reading data from the DB, where I can expect to have few peaks of lots of reads which I'd like to be fast.
What I am most afraid of is the user getting the feeling of slowness when displaying a complex record (because this has to be read in from the DB).
Use Lazy Load and Data Mapper (pg.165) patterns.
I think this question depends on too many variables to be able to give a concrete answer. What you should consider first is how much data you need to read from the database in to your application. Further, how often are you sending that data back to the database and requesting new data? Also, will users be working on the data concurrently? If so, loading the data initially is probably not a good idea.
After your edits I would say it's probably better to leave the data at the database. If you are going to be accessing it with relatively low frequency there is no reason to load up or otherwise try to cache it in your application at launch. Of course, only you know your application best and should decide what bits may be loaded up front to increase performance.
You might consider to user intermediate server (WCF) that will contain cached data from the database in memory, this way users don't have to go every time to the database. Also since it is only one access point to for all users if somebody changes/added record you can update cache as well. Static data can be reloaded every x hours (for example every hour). It still might not the best option, since data needs to be marshaled from Server to the Client, but you can use netTcp binding if you can, which is fast and small.