I am using sqlite from c++ and I want to implement a progress bar that will inform user about the progress of a search.
Using sqlite3_progress_handler I can set ca callback to be called every N virtual machine instructions. This is ok for an infinite progress bar that is notifying user the app is still working.
What I need is a progress from 0 -> 100%. Can this be done ?
I realize that this is a bit late, and that this question already has an accepted answer, but I think a little more information would be useful.
As the OP noted in the question, the sqlite3_progress_handler can be configured to call the callback function every N VM instructions. This is not a time progress monitor or a query statement progress monitor, but a monitor for the VM instructions that the query planner has calculated for the query (which will do in a pinch).
By prefacing the query with 'EXPLAIN QUERY PLAN' (or just 'EXPLAIN' for short) and stepping the results to get a count, you will know how many VM instructions are in the query plan. There's your 100% figure.
BE SURE to read the caveats on the SQLite.org website about the EXPLAIN QUERY PLAN command, especially the part about not relying on the output format. But for this situation, we're not concerned with the information in the results, but only the number of instructions.
As of SQLite version 3.24.0 (2018-06-04), the output format for the EXPLAIN QUERY PLAN command diverged significantly from the EXPLAIN command. EXPLAIN QUERY PLAN is no longer suitable for this use case; you have to specify the EXPLAIN command itself.
To be clear, the EXPLAIN command is not published as supporting the sqlite3_progress_handler() API. The fact that it can be used in this case is purely coincidental. You should always test that this works on whichever version of SQLite you are using.
It is not possible for the database to predict how much time (or how many VM instructions) a query will need.
Related
We just got a new Server at work that will be used only for running SAS code, and I've been asked to run some tests and make sure that it's performing better than our other servers. I'm not an expert at this so I want to make sure I avoid making naive mistakes that don't properly measure the performance of the server. My header looks like this:
options fullstimer;
%LET BenchStartTime = %sysfunc(datetime(),22.);
Which I use as a check for the "real time" report in the log. I have a vague understanding of the difference between "user cpu time" and "system cpu time", but if anyone wants to offer up additional information on that, that would be helpful.
Anyway, the main point of this post is that I want to know if there are any standard benchmark tests that I should be using in order to see if this new server is better than the old ones. Currently I'm using something I found online which is just appending a bunch of copies of sashelp.class (but I think this might be a bad idea because pulling from the C: and loading it into a different drive might be the same across all servers, right? If the C: is the slowest drive, doesn't that become the bottleneck?), and I'm also using code that basically generates a bunch of random data of a fixed size and comparing runtimes. Is this the correct approach? Is there something else I should be doing? How many times should I be running these benchmark tests to make sure that it isn't a fluke?
Thanks for your help!
I would test by doing the things that you normally do. If you run larger merges, then you're basically talking I/O; so just make a very large dataset, write it out, read it in, etc., and perform the same test on the other machine. Perform each test a few times each in a fresh SAS session.
Further, it sounds like you need to make sure the new server can handle multiple concurrent sessions. You can simulate this in part by submitting many connections from one computer by using SAS/CONNECT; that allows you to start multiple concurrent sessions. Then do so, perhaps I would create a script that starts a local SAS session, signs on to the server and rsubmits a job of a normal difficulty that takes maybe 5 to 10 minutes, and then run that script 20 times concurrently (you can script this or just double click a .bat file 20 times). See how it handles it as compared to the other server.
On SAS 9.2 and later you can use the %SASHOME%/SASFoundation/9.x/sasiotest.exe utility. You will find some further Guidance in Support note 51659.
Edit:
A similiar utility can be downloaded for Unix Plattform, details in Support note 51660.
I've got a web page loading pretty slowly, so I installed the Django Debug Toolbar. I'm pretty new at this, so I'm trying to figure out what I can do with it.
I can see the database did 264 queries in 205 ms. Looks kind of high. I'm pretty sure I can cut down on that by adding some indexes and just writing better queries. But my question is: What is a "good" number that should be trying to hit here? What is generally accepted as "fast enough" and further optimization isn't really worth it. 50ms? 20ms?
Also on this same page it's showing 2500ms in user CPU. That sounds terrible to me, and I'm surprised it's so much higher than the database, which I assumed was the bottleneck. Is this maybe an indication that I am trying to do too much in python code instead of at the database layer? Would reducing the number of SQL queries help with CPU? (Waiting between queries?). Again is there some well known target response time I should be aiming for.
I'm looking for a snappy response from my clients. Right now when I click around I can feel a "pregnant pause" before the pages load.
By default accessing related model fields results in one extra query per model per row. Look into select_related() and prefetch_related(), this usually cuts down number of queries and speeds things up by a lot. I think debug toolbar shows you the actual queries, if not, need to enable sql logs before doing any query optimizations. Once you cut down number of queries to a minimum (no extra queries per pow), look for the slowest query and use EXPLAIN sql syntax to see if indexes are being used, this is another area where it can get slow especially on big data.
Usually database is the bottleneck, unless you are doing some major looping in your code. If you believe python code is slow, then need to profile it, otherwise it's just guessing.
Is there a Progress profiling tool that allows me to see the queries executing against an OpenEdge database?
We're doing a migration from an OpenEdge database into a SQL database. In order to map the data correctly we'd like to run certain application reports on the OpenEdge database and see what database queries are being executed to retrieve the data.
Is this possible with some kind of Progress profiling tool (a la SQL Server Profiling)? Preferably free...
Progress is record oriented, not set oriented like SQL, so your reports aren't a single query or a set of queries, it is more likely a lot of record lookups combined with what you'd consider query-like operations.
Depending on the version you're running, there is a way to send a signal to the client to see what it is currently doing, however doing so will almost certainly not give you enough information to discern what's going on "under the hood."
Long story short, your options are to get a Dataserver product so you can attach the Progress client to an SQL database - this will enable you to use an SQL database w/out losing the Progress functionality. The second option is to get a copy of the program's source code to find out how the reports are structured.
Tim is quite right -- without the source code, looking at the queries is unlikely to provide you with much insight.
None the less there are some tools and capabilities that will provide information about queries. Probably the most useful for your purpose would be to specify something similar to:
-logentrytypes QryInfo -logginglevel 3 -clientlog "mylog.log"
at session startup.
You can use session triggers to identify almost anything done by any program, without modifying or having access to the source of those programs. Setting this up may be more work than is worth it for your purpose. We have a testing system built around this idea. One big flaw: triggers cannot be fired for CAN-FIND.
I have Neo4j v2.1.6 (default configuration) and Neo4j.rb v4.1.0. All queries are slow around 50ms. I have only 5 nodes in db.
For example:
User.find_by(person_id: 826268332)
CYPHER 47ms MATCH (n:`User`) WHERE (n.person_id = {n_person_id}) RETURN n LIMIT {limit_1} | {:n_person_id=>826268332, "limit_1"=>1}
Where can be a problem?
I'm one of the core maintainers of Neo4j.rb, along with Brian Underwood, who replied above. This is not exactly a full answer since we need to know more about your system to answer that, but I'm posting this here because it's too much for one comment.
My money is on something wrong with your DB or your system. We had a similar issue reported -- slow queries when working locally, no cause able to be determined -- for a user running Windows. See Neo4j.rb version 3.0 slow performance RoR, over 1024ms for all queries. We weren't able to pin it down. Locally, running that exact same query, I see 13ms the first time I run it and ~3ms every time after that. Indexing won't make a difference in a DB that small.
Ways to limit the chance of a problem and generally improve performance:
Use Ruby MRI 2.2.0
Use Neo4j 2.1.6 or 2.2.0
Use Mac or Linux, not Windows
Require the oj and oj_mimic_json gems in your app
You will see longer responses for a query like that if your db and app server are in two different networks.
Regarding the comment that this simple query is much faster in MongoDB and PostgreSQL: yes, it's going to be. Both of those return simple queries faster than Neo4j.rb for no fewer than two reasons:
The Ruby gems for connecting to those DBs do not use a REST interface, they use custom binary protocols.
Both of those are optimized for returning single records quickly, Neo is optimized for returning large groups of records quickly.
Before releasing Neo4j.rb 4.0, I did a ton of benchmarks against Postgres and MongoDB and found the same results: they crush us when returning single objects. (PostgreSQL is amazing technology general.) As soon as you start looking for related objects, though, things balance out, and as you add complexity, the difference becomes even more significant. I don't have any numbers to share, unfortunately, but I'll make a blog post about it sometime soon if I have some time.
That is strange. In the neo4j gem I often see simple queries run in around 1-5 ms.
For debugging, what if you did this?
User.where(yeti_person_id: 826268332).first
Also, what does this give you?
puts User.where(yeti_person_id: 826268332).to_cypher
We would like to be able to create intermediate releases of our software that would time-bomb or expire after a certain fixed time or number of uses that would not easily be manipulated. We are using Visual C++ with mixed native and managed assemblies.
I imagine we may need to rely on a registry tag but this seems to be insecure.
Can anyone offer some advice on how to do this?
I was working on a "trial-ware" solution a while back and it used a combination of registry keys, information stored in a flat-file at a certain position surrounded with junk data, and then also had an option to reach out to a webservice that would verify it back with the software creators.
However, as FrustratedWithFormsDesigner stated, there is no 100% fool-proof way to do this. There is always a way that a hacker can get around whatever precautions you put in place.
If you are using a database for the application, then it might be better to store a install (datetime) and a numberofusers (int) and then make code that checks those fields when the program is starting / loading / initing. If they are past a certain number or time (this could also be in the db) then exit the program.
This is very hard if not impossible to do in a foolproof way. In any event, there's nothing to stop somebody removing and reinstalling the software (you do support that, right?).
If you cannot limit the function of these intermediate releases (a much better incentive for people to move to official bits), it might be more trouble than it's worth to implement such a scheme.
Set a variable to a specific date in the program then every time the program is run access the system date and check if that date is equal to or greater than the specified date. If true then start the expiry process and display a message or alert panel to the user.
Have the binary download a tiny bit of code on startup from one of your servers.
Keep track of the activation counter on the server, when the counter reaches the limit, return a piece of code that displays the 'sorry!' message.
You could deploy it as a ClickOnce application with a certificate that expires at a certain date. If I recall correctly, the app will err on startup after that date.
A couple caveats:
The only option for the user may be to uninstall the app, which is a jerk move.
You will end up maintaining a ton of different deployments.
It will be a shock to the user as it will just happen without warning.