mysql++ (mysqlpp): how to get number of rows in result prior to iteration using fetch_row through UseQueryResult - c++

Is there an API call provided by mysql++ to get the number of rows returned by the result?
I have code structured as follows:
// ...
Query q = conn.query(queryString);
if(mysqlpp::UseQueryResult res = query.use()){
// some code
while(mysqlpp::Row row = res.fetch_row()){
}
}
My previous question here will be solved easily if a function that returns the number of rows of the result. I can use it to allocate memory of that size and fill in as I iterate row by row.

In case anyone runs into this:
I quote the user manual:
The most direct way to retrieve a result set is to use Query::store(). This returns a StoreQueryResult object,
which derives from std::vector, making it a random-access container of Rows. In turn,
each Row object is like a std::vector of String objects, one for each field in the result set. Therefore, you can
treat StoreQueryResult as a two-dimensional array: you can get the 5th field on the 2nd row by simply saying
result[1][4]. You can also access row elements by field name, like this: result[2]["price"].
AND
A less direct way of working with query results is to use Query::use(), which returns a UseQueryResult object.
This class acts like an STL input iterator rather than a std::vector: you walk through your result set processing
one row at a time, always going forward. You can’t seek around in the result set, and you can’t know how many
results are in the set until you find the end. In payment for that inconvenience, you get better memory efficiency,
because the entire result set doesn’t need to be stored in RAM. This is very useful when you need large result sets.
A suggestion found here: http://lists.mysql.com/plusplus/9047
is to use the COUNT(*) query and fetch that result and then use Query.use again. To avoid inconsistent count, one can wrap the two queries in one transaction as follows:
START TRANSACTION;
BEGIN;
SELECT COUNT(*) FROM myTable;
SELECT * FROM myTable;
COMMIT;

Related

Fastest way to select a lot of rows based on their ID in PostgreSQL?

I am using postgres with libpqxx, and I have a table that we will simplify down to
data_table
{
bytea id PRIMARY KEY,
BigInt size
}
If I have a set of ID's in cpp, eg std::unordered_set<ObjectId> Ids, what is the best way to get the ID and the Size parameters out of data_table?
I have so far used a prepared statement:
constexpr char* preparedStatement = "SELECT size FROM data_table WHERE id = $1";
Then in a transaction I have called that prepared statement for every entry in the set, and retrieved the result for every entry in the set,
pqxx::work transaction(SomeExistingPqxxConnection);
std::unordered_map<ObjectId, uint32_t> result;
for (const auto& id : Ids)
{
auto transactionResult = transaction.exec_prepared(preparedStatement, ToPqxxBinaryString(id));
result.emplace(id, transactionResult[0][0].as<uint32_t>());
}
return result;
Because the set can contain tens of thousands of objects, and the table can contain millions, this can take quite some time to process, and I don't think it is a particularly efficient use of postgres.
I am pretty much brand new to SQL, so I don't really know if what I am doing is the right way to go about this, or if this is a much more efficient way.
E: For what it's worth the ObjectId class is basically a type wrapper over std::array<uint8_t, 32>, aka a 256 bit cryptographic hash.
The task as I understand it:
Get id (PK) and size (bigint) for "tens of thousands of objects" from a table with millions of rows and presumably several more columns ("simplified down").
The fastest way of retrieval is index-only scans. The cheapest way to get that in your particular case would be a "covering index" for your query by "including" the size column in the PK index like this (requires Postgres 11 or later):
CREATE TEMP TABLE data_table (
id bytea
, size bigint
, PRIMARY KEY (id) INCLUDE (size) -- !
)
About covering indexes:
Do covering indexes in PostgreSQL help JOIN columns?
Then retrieve all rows in a single query (or few queries) for many IDs at once like:
SELECT id, size
FROM data_table
JOIN (
VALUES ('id1'), ('id2') -- many more
) t(id) USING (id);
Or one of the other methods laid out here:
Query table by indexes from integer array
Or create a temporary table and join to it.
But do not "insert all those IDs one by one into it". Use the much faster COPY (or the meta-command \copy in psql) to fill the temp table. See:
How to update selected rows with values from a CSV file in Postgres?
And you do not need an index on the temporary table, as that one will be read in a sequential scan anyway. You only need the covering PK index I lined out.
You may want to ANALYZE the temporary table after filling it, to give Postgres some column statistics to work with. But as long as you get the index-only scans I am aiming for, you can skip that, too. The query plan won't get any better than that.
The id is a primary key and so is indexed, so my first concern would be query setup time. A stored procedure is precompiled, for instance. A second tack is to put your set in a temp table, possibly also keyed on the id, so the two tables/indexes can be joined in one select. The indexes for this should be ordered, like tree not hash, so they can be merged.

Coldfusion Query broken column structure

A coldfusion query's column can be referenced like a 2D array from what I know of my past experience. Sometimes though I get this issue.
create query from spreadsheet
put column names into an array
only get first element when trying to access the row
have to use a workaround
//imports is a query object after this function
var imports = convertSpreadsheetWOHeaders(spreadsheet, 1);
//this is used to give name values in the json struct, not to access the query columns
var jsonHeaders = ListToArray("number,constructiontype,description,site_address,parcel,permit_date,note_Applicant,note_Contractor,valuation,note_Bld_Fees,note_Other_Fees");
//this gives me ["col_1","col_2","col_3",,,etc]. used to access query columns
var columnHeaders = imports.getColumnNames();
writeDump(imports[columnHeaders[1]]);
writeDump(imports);
I am left with just the first element in column one. And I get of course:
Message: You have attempted to dereference a scalar variable of type class java.lang.String as a structure with members.
When trying to do this:
structInsert(jsonStruct,jsonHeaders[j],imports[columnHeaders[j]][i]);
However, this works:
writeDump(ListToArray(ArrayToList(imports[columnHeaders[1]],'|'),'|',true));
I first do a dump of imports["col_1"] and i get only the first element.
Then I do a dump of ListToArray(ArrayToList(imports["col_1"])) like you see in the above image and it gives me the whole column.
Why can I not access the column correctly in the first place?
The Real Problem:
I was originally trying to access the jsonHeaders list as an array without doing ListToArray() on it. After that my function worked.
BUT. This next part is helpful.
When attempting to access a query object, doing queryObject["columnName"] is considered a shortcut of queryObject["columnName"][1] so coldfusion will just give you the first element.
But when I said ListToArray(ArrayToList()) coldfusion sees that ArrayToList MUST take in an array so an exception is made and the column is returned as an array.
To get a column back as an array to work with you can do a couple things
ListToArray(ArrayToList(query["column"]));
ListToArray(valueList(query.column));
valueArray(query, "column");

Is there a way to get each row's value from a database into an array?

Say I have a query like the one below. What would be the best way to put each value into an array if I don't know how many results there will be? Normally I would do this with a loop, but I have no idea how many results there are. Would I need run another query to count the results first?
<CFQUERY name="alllocations" DATASOURCE="#DS#">
SELECT locationID
FROM tblProjectLocations
WHERE projectID = '#ProjectName#'
</CFQUERY>
Depending on what you want to do with the array, you can just refer to the column directly for most array operations, eg:
i = arrayLen(alllocations["locationID"]);
Using that notation will work for most array operations.
Note that this doesn't "create an array", it's simply a matter that a query columns - a coldfusion.sql.QueryColumn object is close enough to a CFML array for CF to be able to convert it to one when an array is needed. Hence the column can be passed to an array function.
What one cannot do is this:
myArray = q["locationID"];
This is because by default CF will treat q["locationID"] as a string if it can, and the string value is what's in the first row of the locationID column in the q query. It's only when an array is actually required will CF convert it to an array instead. This is basically how loose-typing works.
So if you just need to pass your query column to some function that expects an array, you can use the syntax above. If you want to actually put the column into a variable, then you will need to do something like this:
myArray = listToArray(valueList(q.localtionID));
NB: make sure you use <cfqueryparam> on your filter values instead of hard-coding them into your SQL statement.
myquery.column.toArray() is also a good undocumented choice.
Since you're only retrieving 1 field value from the query, you could use ValueList() to convert the query results into a comma-delimited list of locationIds, then use listToArray() to change that list into an array.
If you were retrieving multiple field values from the query, then you'd want to loop through the query, copy all the field values from the given row into a struct, and then add that struct to an array using arrayAppend().
(If you're not familiar with these functions, you can look them up in the Adobe docs or on cfquickdocs.com).

Disable QSql(Relational)TableModel's prefetch/caching behaviour

For some (well, performance) reason, Qt's "model" classes only fetch 256 rows from the database so if you want to append the row to the end of the recordset, you, apparently, must do something along the lines of
while (model->canFetchMore()) {
model->fetchMore();
}
This does work and when you do model->insertRow(model->rowCount()) afterwards, the row is, indeed, appended after the last row of the recorset.
There are various other problems related to this behaviour, such as when you insert or remove rows from the model, the view that renders it gets redrawn with only 256 rows showing and you must manually make sure, that the missing rows are fetched again.
Is there a way to bypass this behaviour altogether? My model is very unlikely to display more, than, say, 1000 rows, but getting it to retrieve those 1000 rows seems to be a royal pain. I understand, that this is a great performance optimization if you have to deal with larger recordsets, but for me it is a burden rather than a boon.
The model needs to be writable so I can't simply use QSqlQueryModel instead of QSqlRelationalTableModel.
From the QSqlTableModel documentation:
bool QSqlTableModel::insertRecord ( int row, const QSqlRecord & record )
Inserts the record after row. If row is negative, the record will be appended to the end.
Calls insertRows() and setRecord() internally.
Returns true if the row could be inserted, otherwise false.
See also insertRows() and removeRows().
I've not tried yet, but I think that it's not necessary to fetch the complete dataset to insert a record at the end.

Size of data obtained from SQL query via ODBC API

Does anybody know how I can get the number of the elements (rows*cols) returned after I do an SQL query? If that can't be done, then is there something that's going to be relatively representative of the size of data I get back?
I'm trying to make a status bar that indicates how much of the returned data I have processed, so I want to be somewhere relatively close. Any ideas?
Please note that SQLRowCount only returns returns the number of rows affected by an UPDATE, INSERT, or DELETE statement; not the number of rows returned from a SELECT statement (as far as I can tell). So I can't multiply that directly to the SQLColCount.
My last option is to have a status bar that goes back and forth, indicating that data is being processed.
That is frequently a problem when you wan to reserve dynamic memory to hold the entire result set.
One technique is to return the count as part of the result set.
WITH
data AS
(
SELECT interesting-data
FROM interesting-table
WHERE some-condition
)
SELECT COUNT(*), data.*
from data
If you don't know beforehand what columns you are selecting
or use a *, like the example above,
then number of columns can be selected out of the USER_TAB_COLS table
SELECT COUNT(*)
FROM USER_TAB_COLS
WHERE TABLE_NAME = 'interesting-table'
SQLRowCount can return the number of rows for SELECT queries if the driver supports it. Many drivers dont however, because it can be expensive for the server to compute this. If you want to guarantee you always have a count, you must use COUNT(*), thus forcing the server into doing the potentially time consuming calculation (or causing it to delay returning any results until the entire result is known).
My suggestion would be to attempt SQLRowCount, so that the server or driver can decide if the number of rows is easily computable. If it returns a value, then multiply by the result from SQLNumResultCols. Otherwise, if it returns -1, use the back and forth status bar. Sometimes this is better because you can appear more responsive to the user.