ColdFusion 9 - Top n random query results - coldfusion

I've got a series of queries that I do to get 5 results at random, the problem is that it is taking a while to get through them, mostly because it involves a loop to assign a rand value that I can order by (which Railo can do in-query)
I was wondering if anyone has dealt with this and knows of a way of speeding it up.
I'm below 200ms, which isn't bad but I'm sure it can be sped up.

You probably don't need to use QoQ at all.
One option might be to write your original query as:
SELECT TOP 5 whatever,you,need
FROM table
ORDER BY rand()
Update the syntax depending on which database server you're using.
Another option, which could be done for both regular queries and QoQ, would be:
select only the primary keys
shuffle the array (i.e. createObject("java","java.util.Collections").shuffle(Array))
use the first five items in the array to select the fields you need.
No looping or updating, just two simple selects.
Of course if your primary key is just an auto-incrementing integer, you might get away with SELECT MAX(Id) then use RandRange to pick your five items.

For Microsoft SQL Server (v2005+) this query syntax will get 5 random records:
SELECT TOP 5 *
FROM table
ORDER BY NEWID()

I'm on Railo (ColdFusion 9) and neither TOP nor NEWID() works in a Query of Query (QoQ). If you happen to fall into this use case, and you must act upon a QoQ, then here's a solution:
<cfquery name="randomizedQueryObject" dbtype="query" maxrows="10">
SELECT *, RAND() as rand
FROM someQueryObject
ORDER BY rand
</cfquery>
This returns 10 random items from a larger result set and works in a QoQ. Short and simple.

Related

Fastest way to select several inserted rows

I have a table in a database which stores items. Each item has a unique ID, which the DB generates upon insertion (auto-increment).
A user may perform a specific task that will add X items to the database, however my program (C++ server application using MySQL connector) should return the IDs that the database generated right away. For example, if I add 6 items, the server must return 6 new unique IDs to the client.
What is the fastest/cleanest way to do such thing? So far I have been doing INSERT followed by SELECT for each new item OR INSERT followed by last_insert_id, however if there are 50 items to add it will take a few seconds at least which is not good at all for user experience.
sql_task.query("INSERT INTO `ItemDB` (`ItemName`, `Type`, `Time`) VALUES ('%s', '%d', '%d')", strName.c_str(), uiType, uiTime);
Getting the ID:
uint64_t item_id { sql_task.last_id() }; //This calls mysql_insert_id
I believe you need to rethink your design slightly. Let's use the analogy of a sales order. With a sales order (or invoice #) the user gets an invoice number (auto_incr) as well as multiple line item numbers (also auto_inc).
The sales order and all of the line items are selected for insert (from the GUI) and the inserts are performed. First, the sales order row is inserted and its id is saved in a variable for subsequent calls to insert the line items. But the line items are then just inserted without immediate return of their auto_inc id values. The application is merely returned the sales order number in the end. How your app uses that sales order number in subsequent calls is up to you. But it does not need to be immediate to retrieve all the X or 50 rows at once, as it has the sales order number iced and saved somewhere. Let's call that sales order number XYZ.
When you actually need the information, an example call could look like
select lineItemId
from lineItems
where salesOrderNumber=XYZ
order by lineItemId
You need to remember that in a multi-user system that there is no guarantee of receiving a contiguous block of numbers. Nor should it matter to you, as they are all attached appropriately with the correct sales order number.
Again, the above is just an analogy, used for illustration purposes.
That's a common but hard to solve problem. Unsure for mysql, but PostreSQL uses sequences to generate automatic ids. Inserting frameworks (object relationnal mappers) use that when they expect to insert many values: they query directly the sequence for a bunch of IDs and then insert new rows using those already known IDs. That way, no need for an additional query after each insert to get the ID.
The downside is that the relation ID - insertion time can be non monotonic when different writers intermix their inserts. It is not a problem for the database, but some (poorly written?) program could expect it is.
As you ID is autoincremental, you can do only two SELECT queries - before and after INSERT queries:
SELECT AUTO_INCREMENT FROM information_schema.tables WHERE table_name = 'dbTable' AND table_schema = DATABASE();
--
-- INSERT INTO dbTable... (one or many, does not matter);
--
SELECT LAST_INSERT_ID() AS lastID;
This will give you the siquence between first and last inserted IDs. Then you can easily calculate how many they are.

Is it bad practice to use cfquery inside cfloop?

I am having an array of structure. I need to insert all the rows from that array to a table.
So I have simply used cfquery inside cfloop to insert into the database.
Some people suggested me not to use cfquery inside cfloop as each time it will make a new connection to the database.
But in my case Is there any way I can do this without using cfloop inside cfquery?
Its not so much about maintaining connections as hitting the server with 'n' requests to insert or update data for every iteration in the cfloop. This will seem ok with a test of a few records, but then when you throw it into production and your client pushes your application to look around a couple of hundred rows then you're going to hit the database server a couple of hundred times as well.
As Scott suggests you should see about looping around to build a single query rather than the multiple hits to the database. Looping around inside the cfquery has the benefit that you can use cfqueryparam, but if you can trust the data ie. it has already been sanatised, you might find it easier to use something like cfsavecontent to build up your query and output the string inside the cfquery at the end.
I have used both the query inside loop and loop inside query method. While having the loop inside the query is theoretically faster, it is not always the case. You have to try each method and see what works best in your situation.
Here is the syntax for loop inside query, using oracle for the sake of picking a database.
insert into table
(field1, field2, etc)
select null, null, etc
from dual
where 1 = 2
<cfloop>
union
select <cfqueryparam value="#value1#">
, <cfqueryparam value="#value2#">
etc
from dual
</cfloop>
Depending on the database, convert your array of structures to XML, then pass that as a single parameter to a stored procedure.
In the stored procedure, do an INSERT INTO SELECT, where the SELECT statement selects data from the XML packet. You could insert hundreds or thousands of records with a single INSERT statement this way.
Here's an example.
There is a limit to how many <CFQUERY><cfloop>... iterations you can do when using <cfqueryparam>. This is also vendor specific. If you do not know how many records you will be generating, it is best to remove <cfqueryparam>, if it is safe to do so. Make sure your data is coming from trusted sources & is sanitised. This approach can save huge amounts of processing time, because it is only make one call to the database server, unlike an outer loop.

ColdFusion (Railo) QoQ - Nulls Last

My understanding is that nulls last is not possible with QoQ. How do I trick coldfusion into sorting null values last whether i'm sorting the row ascending or descending?
I've tried using case in the SELECT and ORDER part of query, but looks like CF is not liking it (running on railo)
There may be better options, but one simple trick is to add a column representing sort priority. Assign records with non-null values a higher priority than null values. Then simply sort by the priority value first, then any other columns you want. Since the null values have a lower priority number, they will always sort last.
<!--- 1 - non-null values 2 - null values --->
SELECT 1 AS SortOrder, SomeColumn
FROM theQuery
WHERE SomeColumn IS NOT NULL
UNION ALL
SELECT 2 AS SortOrder, SomeColumn
FROM theQuery
WHERE SomeColumn IS NULL
ORDER BY SortOrder, SomeColumn ASC
(It is worth noting you could probably do something similar in your database query using order by instead of union.)
QoQ on both ColdFusion and Railo have a very limited SQL vocab, and there's nothing that deals with how to collate nulls. So as #Leigh has suggested, add another column - without any nulls - which represent the sorting you want.
Or, better, if possible deal with all this in the DB. Obviously this is not always possible (as the record set you're querying might not have come from a DB in the first place ;-)
...there was one more way, but it relies on values being NULL and not empty ''.
I'm going from memory here, but this is essentially it, using || only works if the value is non-null, so using the null values descending sort first, I get the values at the end.
<cfquery>
SELECT *, '1' || #sortCol# as isNull
FROM table
ORDER BY isNull desc, #sortCol#
</cfquery>
Note I'm not actually advocating the use of this and I'm not sure if CF would handle it the same way

Doctrine2 random query

I want to fetch 2 rows from my database randomly with doctrine2, and I can't manage to do it. I figured out that there is no possibility to do it easily with RAND(), but then which is the best solution?
And from the table I want to select rows which are for example for sale, I mark it with 1 in is_sale, so because of this I couldn't do it with simple offset.
Thanks
When asking this question on Twitter just now, I got pointed to this post about selecting random records. This guy makes a very valid point on the performance of using RAND(). I guess it's better to generate the random id's in the application, then select those records using Doctrine.

Size of data obtained from SQL query via ODBC API

Does anybody know how I can get the number of the elements (rows*cols) returned after I do an SQL query? If that can't be done, then is there something that's going to be relatively representative of the size of data I get back?
I'm trying to make a status bar that indicates how much of the returned data I have processed, so I want to be somewhere relatively close. Any ideas?
Please note that SQLRowCount only returns returns the number of rows affected by an UPDATE, INSERT, or DELETE statement; not the number of rows returned from a SELECT statement (as far as I can tell). So I can't multiply that directly to the SQLColCount.
My last option is to have a status bar that goes back and forth, indicating that data is being processed.
That is frequently a problem when you wan to reserve dynamic memory to hold the entire result set.
One technique is to return the count as part of the result set.
WITH
data AS
(
SELECT interesting-data
FROM interesting-table
WHERE some-condition
)
SELECT COUNT(*), data.*
from data
If you don't know beforehand what columns you are selecting
or use a *, like the example above,
then number of columns can be selected out of the USER_TAB_COLS table
SELECT COUNT(*)
FROM USER_TAB_COLS
WHERE TABLE_NAME = 'interesting-table'
SQLRowCount can return the number of rows for SELECT queries if the driver supports it. Many drivers dont however, because it can be expensive for the server to compute this. If you want to guarantee you always have a count, you must use COUNT(*), thus forcing the server into doing the potentially time consuming calculation (or causing it to delay returning any results until the entire result is known).
My suggestion would be to attempt SQLRowCount, so that the server or driver can decide if the number of rows is easily computable. If it returns a value, then multiply by the result from SQLNumResultCols. Otherwise, if it returns -1, use the back and forth status bar. Sometimes this is better because you can appear more responsive to the user.