Does Alia support insert-batch - clojure

Cassaforte has an insert-batch function for inserting multiple rows into a cassandra CQL
table in one go.
I've recently switched to Alia and I'm wondering if it offers the same? I can't see anything immediately in the documentation, and (hayt/values ..) seems to only support a single row insertion at one time.

Alia supports CQL batch inserts through the Hayt DSL.
(alia/execute
session
(hayt/batch
(hayt/queries
(hayt/insert ...)
(hayt/insert ...)
(hayt/insert ...))
As per the CQL spec, only DML statements are supported:
http://www.datastax.com/documentation/cql/3.0/cql/cql_reference/batch_r.html

Related

Best practice for using Force_Index on spanner

I have a client application which querys data in Spanner..
Lets say I have a table with 10 columns and my client application can search on a combination of columns.. Lets say I've added 5 indexes to optimise searching.
According to https://cloud.google.com/spanner/docs/sql-best-practices#secondary-indexes
it says:
In this scenario, Spanner automatically uses the secondary index SingersByLastName when executing the query (as long as three days have passed since database creation; see A note about new databases). However, it's best to explicitly tell Spanner to use that index by specifying an index directive in the FROM clause:
And also https://cloud.google.com/spanner/docs/secondary-indexes#index-directive suggests
When you use SQL to query a Spanner table, Spanner automatically uses any indexes that are likely to make the query more efficient. As a result, you don't need to specify an index for SQL queries. However, for queries that are critical for your workload, Google advises you to use FORCE_INDEX directives in your SQL statements for more consistent performance.
Both links suggest YOU (The developer) should be supplying Force_Index on yours queries.. This means I now need business logic in my client to say something like:
If (object.SearchTermOne)
queryBuilder.IndexToUse = "Idx_SearchTermOne"
This feels like I'm essentially trying to do the job of the optimiser by setting the index to use.. It also means if I add an extra index I need a code change to make use of it
So what are the best practices when it comes to using Force_Index in spanner queries?
The best practice is to use the Force_Index as described in the documentation at this time.
This feels like I'm essentially trying to do the job of the optimiser by setting the index to use..
I feel the same.
https://cloud.google.com/spanner/docs/secondary-indexes#index-directive
Note: The query optimizer requires up to three days to collect the databases statistics required to select a secondary index for a SQL query. During this time, Cloud Spanner will not automatically use any indexes.
As noted in this note, even if an amount of data is added that would allow the index to function effectively, it may take up to three days for the optimizer to figure it out.
Queries during that time will probably be full scans.
If you want to prevent this other than using Force_Index, you will need to run ANALYZE DDL manually.
https://cloud.google.com/blog/products/databases/a-technical-overview-of-cloud-spanners-query-optimizer
But none of this changes the fact that we are essentially trying to do the optimizer's job...

What is LATEST_ON syntax in QuestDB?

I'm using QuestDB and SQL for the first time, and I stumbled upon the LATEST_ON syntax used in QuestDB. Can someone explain it's usage and where to use it?
Quoted from the docs:
For scenarios where multiple time series are stored in the same table, it is relatively difficult to identify the latest items of these time series with standard SQL syntax. QuestDB introduces LATEST ON clause for a SELECT statement to remove boilerplate clutter and splice the table with relative ease.
For more information visit the official documentation
LATEST ON is to find the latest record for each unique time series in a table. See this page for some examples: https://questdb.io/docs/reference/sql/latest-on/
It gives you the latest available record for each combination of the PARTITION BY values, according to the ON timestamp
Maybe easier to understand with an example. If you go to https://demo.questdb.io you can execute this query
select * from trades latest on timestamp
partition by symbol, side
It will then show you the latest existing row for each combination of Symbol and Side. If you wanted to do this using standard SQL, you would probably have to use a window function, something like this
select * from
(select *
,ROW_NUMBER() over (partition by Symbol, Side
order by timestamp DESC) AS RowNumber
from trades where timestamp > '2022-10-01') t
where t.RowNumber = 0
Latest on retrieves the latest entry by timestamp for a given key or combination of keys, for scenarios where multiple time series are stored in the same table.
Check this link for some examples: https://questdb.io/docs/reference/sql/latest-on/

Why there are so many duplicate SQL functions in Redshift?

In Amazon Redshift, I found that there are many SQL functions with different names but does exactly the same thing. The Redshift document even mentions those as synonym functions.
For eg: STRPOS Function has two other synonym function - CHARINDEX Function and POSITION Function. All three do the exact same thing - Return the position of a substring within a specified string.
What is the reason for having three functions for exact same task? Is there any performance difference among these?
Possibly to make it more compatible with other forms of SQL.
For example, CHARINDEX is a command from Microsoft SQL Server, whereas POSITION is a command from PostgreSQL (upon which Amazon Redshift is based).
Given that Redshift is from Amazon, I would guess that the answer is "because customers asked for it!"

Does Superset support query/join multiple tables at a time?

It is said in the superset documentation that it is impossible to query/join multiple tables.
Can I query/join multiple tables at one time?
Not directly no. A Superset SQLAlchemy datasource can only be a single table or a view.
But from my experience I can do that without any problems. Is it outdated documentation, or something that I don't yet understand?
yes,superset sql lab support join multiple tables one time.
select * from charles.m_jdbc_3 m left join druid.druid_supervisors d on m.id=d.id
where m.id={{id_value}} and m.value={{value_key}}
It is not possible to do joins simply..
https://github.com/apache/incubator-superset/issues/875

borland builder c++ oracle question

I have a Borland builder c++ 6 application calling Oracle 10g database. Operating over a LAN. When the application in question makes a simple db select e.g.
select table_name from element_tablenames where element_id = 10023842
the following is recorded as happening in Oracle (from the performance logs)
select table_name
from element_tablenames
where element_id = 10023842
then immediately (and not from C++ source code but perhaps deeper)
select table_name, element_tablenames.ROWID
from element_tablenames
where element_id = 10023842
The select statement is only called once in the TADODbQuery object, yet two queries are being performed - one to parse and the other adds the ROWID for executon.
Over a WAN and many, many queries this is obviously a problem to the user.
Does anyone know why this might be happening, can someone suggest a solution?
Agree with Robert.
The ROWID uniquely identifies a row in a table so that the returned record can be applied back to the database with any changes (or as a DELETE).
Is there a way to identify a particular column (or set of columns) as a primary key so that it can be used to identify a row without using a ROWID.
I don't know exactly where the RowID is coming from, it could be either the TAdoQuery implementation or the Oracle Driver. But I am sure I found the reason.
From the Oracle docs:
If the database table does not contain a primary key, the ROWID must be selected explicitly when populating DataTable.
So I suspect your Table does not have a primary key, either add one or add the rowid.
Either way this will solve the duplicate query problem.
Since you are concerned about performance. In general
Using TAdoQuery you can set the CursorType to optimize different behaviors for performance. This article covers this from a TAdoQuery perspective. MSDN also has an article that covers it from from a general ADO Perspective. Finally the specifications from the Oracle Driver can be useful.
I would recommend setting the Cursor to either as they are the only supported by Oracle
ctStatic - Bi-directional query produced.
ctOpenForwardOnly - Unidirectional query produced, fastest but can't call Prior
You can also play with CursorLocation to see how it effects your speed.