From the latest source code (not certain if it's C or C++) of MySQL, how does it do an autoincrement? I mean, is it efficient in that it stores like a metadata resource on the table where it last left off, or does it have to do a table scan to find the greatest ID in use in the table? Also, do you see any negative aspects of using autoincrement when you look at how it's implemented versus, say, PostgreSQL?
That will depend on which engine the database is using. InnoDB is storing the largest value in memory and not on disk. Very efficient. I would guess most engines would do something similar, but cannot guarantee it.
InnoDB's Auto Increment Is going to run the below query once when DB is loaded and store the variable in memory:
SELECT MAX(ai_col) FROM t FOR UPDATE;
Comparing that to PostgreSQL's complete lack of an auto_increment depends on how you would implement the field yourself. (At least it lacked it last time I used it. They may have changed) Most would create a SEQUENCE. Which appears to be stored in an in memory pseudo-table. I'd take InnoDBs to be a simpler better way. I'd guess InnoDB would be more efficient if they are not equal.
Related
I have read it over and over again that SQL, at its heart, is an unordered model. That means executing the same SQL query multiple times can return result-set in different order, unless there's an "order by" clause included. Can someone explain why can a SQL query return result-set in different order in different instances of running the query? It may not be the case always, but its certainly possible.
Algorithmically speaking, does query plan not play any role in determining the order of result-set when there is no "order by" clause? I mean when there is query plan for some query, how does the algorithm not always return data in the same order?
Note: Am not questioning the use of order by, am asking why there is no-guarantee, as in, am trying to understand the challenges due to which there cannot be any guarantee.
Some SQL Server examples where the exact same execution plan can return differently ordered results are
An unordered index scan might be carried out in either allocation order or key order dependant on the isolation level in effect.
The merry go round scanning feature allows scans to be shared between concurrent queries.
Parallel plans are often non deterministic and order of results might depend on the degree of parallelism selected at runtime and concurrent workload on the server.
If the plan has nested loops with unordered prefetch this allows the inner side of the join to proceed using data from whichever I/Os happened to complete first
Martin Smith has some great examples, but the absolute dead simple way to demonstrate when SQL Server will change the plan used (and therefore the ordering that a query without ORDER BY will be used, based on the different plan) is to add a covering index. Take this simple example:
CREATE TABLE dbo.floob
(
blat INT PRIMARY KEY,
x VARCHAR(32)
);
INSERT dbo.floob VALUES(1,'zzz'),(2,'aaa'),(3,'mmm');
This will order by the clustered PK:
SELECT x FROM dbo.floob;
Results:
x
----
zzz
aaa
mmm
Now, let's add an index that happens to cover the query above.
CREATE INDEX x ON dbo.floob(x);
The index causes a recompile of the above query when we run it again; now it orders by the new index, because that index provides a more efficient way for SQL Server to return the results to satisfy the query:
SELECT x FROM dbo.floob;
Results:
x
----
aaa
mmm
zzz
Take a look at the plans - neither has a sort operator, they are just - without any other ordering input - relying on the inherent order of the index, and they are scanning the whole index because they have to (and the cheapest way for SQL Server to scan the index is in order). (Of course even in these simple cases, some of the factors in Martin's answer could influence a different order; but this holds true in the absence of any of those factors.)
As others have stated, the ONLY WAY TO RELY ON ORDER is to SPECIFY AN ORDER BY. Please write that down somewhere. It doesn't matter how many scenarios exist where this belief can break; the fact that there is even one makes it futile to try to find some guidelines for when you can be lazy and not use an ORDER BY clause. Just use it, always, or be prepared for the data to not always come back in the same order.
Some related thoughts on this:
Bad habits to kick : relying on undocumented behavior
Why people think some SQL Server 2000 behaviors live on… 12 years later
Quote from Wikipedia:
"As SQL is a declarative programming language, SELECT queries specify a result set, but do not specify how to calculate it. The database translates the query into a "query plan" which may vary between executions, database versions and database software. This functionality is called the "query optimizer" as it is responsible for finding the best possible execution plan for the query, within applicable constraints."
It all depends on what the query optimizer picks as a plan - table scan, index scan, index seek, etc.
Other factors that might influence picking a plan are table/index statistics and parameter sniffing to name a few.
In short, the order is never guaranteed without an ORDER BY clause.
It's simple: if you need the data ordered then use an ORDER BY. It's not hard!
It may not cause you a problem today or next week or even next month but one day it will.
I've been on a project where we needed to rewrite dozens (or maybe hundreds) of queries after an upgrade to Oracle 10g caused GROUP BY to be evaluated in a different way than in had on Oracle 9i, meaning that the queries weren't necessarily ordered by the grouped columns anymore. Not fun and simple to avoid.
Remember that SQL is declarative language so you are telling the DBMS what you want and the DBMS is then working out how to get it. It will bring back the same results every time but may evaluate in a different way each time: there are no guarantees.
Just one simple example of where this might cause you problems is that new rows appear at the end of the table if you select from the table.... until they don't because you've deleted some rows and the DBMS decides to fill in the empty space.
There are an unknowable number of ways it can go wrong unless you use ORDER BY.
Why does water boil at 100 degrees C? Because that's the way it's defined.
Why are there no guarantees about result ordering without an ORDER BY? Because that's the way it's defined.
The DBMS will probably use the same query plan the next time and that query plan will probably return the data in the same order: but that is not a guarantee, not even close to a guarantee.
If you don't specify an ORDER BY then the order will depend on the plan it uses, for example if the query did a table scan and used no index then the result would be the "natural order" or the order of the PK. However if the plan determines to use IndexA that is based on columnA then the order would be in that order. Make sense?
I have been reading the Amazon DynamoDB documentation to compare Global Secondary Index (GSI) and Local Secondary Index (LSI). I am still unclear that in the below use case, does it matter to me what I use? I am familiar with things like LSI ought to use the same partition key etc.
Here is the use case:
I already know the sort key for my index.
My partition key is the same in both cases
I want to project ALL the attributes from original table onto my index
I know already prior to creating the table what index I need for my use case.
In the above use case, there is absolutely no difference apart from minor latency gain in LSI Vs GSI because LSI might end up in the same shard. I want to understand the Pro Vs Con in my use case.
Here are some questions that I am trying to find the answer to and I have not encountered a blog that is explicit about these:
Use GSI only because the partition key is different?
Use GSI even if the partition key is same but I did not know during table creation that I would need such an index?
Are there any other major reasons where one is superior than the other (barring basic stuff like count limit of 5 vs 20 and all).
There are two more key differences that are not mentioned. You can see a full comparison between the two index types in the official documentation.
If you use a LSI, you can have a maximum of 10 Gb of data per partition key value (table plus all LSIs). For some use cases, this is a deal breaker. Before you use a LSI, make sure this isn’t the case for you.
LSIs allow you to perform strongly consistent queries. This is the only real benefit of using a LSI.
The AWS general guidelines for indexes say
In general, you should use global secondary indexes rather than local secondary indexes. The exception is when you need strong consistency in your query results, which a local secondary index can provide but a global secondary index cannot (global secondary index queries only support eventual consistency).
You may also find this SO answer a helpful discussion about why you should prefer a GSI over a LSI.
I'm going to use the same value in lots of statements in the SQL Expression. So it is possible to declare and assign the value to a variable at the beginning of the query and refer the value by it?
(I'm writing an execution plan in WSO2 DAS)
This is not supported as of now. However, supporting this has been under discussion, hence this might be implemented in a future release.
If you want to store a value and use it in a query, the currently available ways are:
Putting that value into an indexed event table and then doing a join with the event table to read that value whenever required.
Indexed In-memory Event Table internally uses a Hash-Map, therefore you could use one to store your variables, in such a way that the key of the hashmap will be the name of your varaible and the value of the hashmap will be the value of your variable.
However I feel that above solution is too complicated for your requirement.
Using the Map Extension in Siddhi
I've posted a question yesterday and I solved this by using multi_map:
Having a composite key for hash map in c++
This works like a charm but the problem happens when the datasrt is big enough.
My data set is around 10M big, and it takes +350secs with ordered index, and 80secs with hashed index(unordered) for insertion.
This is a quite long time in comparison to map(pair, double) data structure which took only 25secs.
Does anyone have any idea improving the calc speed? Memory consumption is okay but the speed really matters to me.
Adding indices to a multi_index_containercomes at a price in insertion time: roughly speaking, if you have four indices insertion is as slow as inserting in four different one-index maps (in fact it's faster, as your figures show, since 80 < 4*25.)
In your particular case, you can get rid of the last index: just use the composite key stuff as your first index, since it will support lang1-only supports as well as (lang1,lang2) queries.
Have you considered using an actual database, like SQLite? When you get to wanting to have multiple indexes for elements and 10+million, that's generally the kind of thing you're looking for.
If an SQL-based database isn't useable, then you could use a non-SQL-based database. It's not the particular database that matters; just that you're using a database of some form.
I need to build a generic method in coldfusion to compare two query result sets... Any Ideas???
If you are looking to simply decide whether two queries are exactly alike, then you can do this:
if(serializeJSON(query1) eq serializeJSON(query2)) ...
This will convert both queries to strings and compare the strings.
If you're looking for more nuance, I believe Sergii's approach (convert to struct, compare keys) is probably the right approach. You could "guard" it by adding in simple checks first.... do the column lists match? Is the recordcount the same? That way, if either of those checks fail, you know that the queries can't possibly be equivalent and so it's safe to return false, thereby avoiding the performance hit of a full compare.
If I understand you correctly, you have two result sets with same structure but different datasets (like selecting with different clauses).
If this is correct, I believe that better (more efficient) way is to try to solve this task on the database level. Maybe with temporarily/cumulative tables and/or stored procedure.
Using CF it is almost definitely will need a ton of loops, which can be inappropriate for the large datasets. Though I did something like this for the small datasets using intermediate storage: converted one result set into the structure, and looped over the second query with checking the structure keys.