Creating an index on ARRAY<STRING(MAX)> in Google Cloud Spanner

Creating an index on ARRAY<STRING(MAX)> in Google Cloud Spanner - google-cloud-platform

I'm trying to create an index on the AlbumTokens column in my Google Cloud Spanner test database and I get a mysterious error referencing an index option that is not currently documented:
CREATE INDEX AlbumTokens
ON Albums (
AlbumTokens
)
>>> Index AlbumTokens references ARRAY AlbumTokens, but is not declared as DISTINCT_ARRAY_ELEMENT index.
Is it possible to do this? If so, how?
I'm using the sample schema with an ARRAY<STRING> column added on:
CREATE TABLE Singers (
SingerId INT64 NOT NULL,
FirstName STRING(1024),
LastName STRING(1024),
SingerInfo BYTES(MAX),
) PRIMARY KEY (SingerId)
CREATE TABLE Albums (
SingerId INT64 NOT NULL,
AlbumId INT64 NOT NULL,
AlbumTitle STRING(MAX),
AlbumTokens ARRAY<STRING(MAX)>,
) PRIMARY KEY (SingerId, AlbumId),
INTERLEAVE IN PARENT Singers ON DELETE CASCADE

You can't create an index using an Array as a key:
Disallowed types
These cannot be of type ARRAY:
A table's key columns.
An index's key columns.
You can include the Array in the index via the STORING keyword to return the array without joining to the primary table, but you can't scan on i

Related

Oracle APEX 21.2.0 display image Primary Key

I want to display image in report column. My table has composite primary key: ID and DATE.
When I add these columns in BLOB attributes as Primary Key Column 1 and Primary Key Column 2 report can not find data because of DATE column. Is it a problem in date format, or something else?

I'd suggest you to use only one column as a primary key column (a sequence or - if your database version supports it - an identity column).
Combination of [ID, DATE] you currently have can then be set to unique key (set both columns NOT NULL to "mimic" what primary key would do).
Why? Although your data model probably is just fine, certain Apex functionalities "suffer" from such things and prefer having a single-column primary keys.

how to insert/update data in sql database using azure databricks notebook jdbc

I got lots of example to append/overwrite table in sql from AZ Databricks Notebook. But no single way to directly update, insert data using query or otherway.
ex. I want to update all row where (identity column)ID = 1143, so steps which I need to taken care are
val srMaster = "(SELECT ID, userid,statusid,bloburl,changedby FROM SRMaster WHERE ID = 1143) srMaster"
val srMasterTable = spark.read.jdbc(url=jdbcUrl, table=srMaster,
properties=connectionProperties)
srMasterTable.createOrReplaceTempView("srMasterTable")
val srMasterTableUpdated = spark.sql("SELECT userid,statusid,bloburl,140 AS changedby FROM srMasterTable")
import org.apache.spark.sql.SaveMode
srMasterTableUpdated.write.mode(SaveMode.Overwrite)
.jdbc(jdbcUrl, "[dbo].[SRMaster]", connectionProperties)
Is there any other sufficient way to achieve the same.
Note : Above code is also not working as SQLServerException: Could not drop object 'dbo.SRMaster' because it is referenced by a FOREIGN KEY constraint. , so it look like it drop table and recreate...not at all the solution.

You can use insert using a FROM statement.
Example: update values from another table in this table where a column matches.
INSERT INTO srMaster
FROM srMasterTable SELECT userid,statusid,bloburl,140 WHERE ID = 1143;
or
insert new values to rows where one of the existing column value matches
UPDATE srMaster SET userid = 1, statusid = 2, bloburl = 'https://url', changedby ='user' WHERE ID = '1143'
or just insert multiple values
INSERT INTO srMaster VALUES
(1, 10, 'https://url1','user1'),
(2, 11, 'https://url2','user2');
In SQL Server, you cannot drop a table if it is referenced by a FOREIGN KEY constraint. You have to either drop the child tables before removing the parent table, or remove foreign key constraints.
For a parent table, you can use the below query to get foreign key constraint names and the referencing table names:
SELECT name AS 'Foreign Key Constraint Name',
OBJECT_SCHEMA_NAME(parent_object_id) + '.' + OBJECT_NAME(parent_object_id) AS 'Child Table'
FROM sys.foreign_keys
WHERE OBJECT_SCHEMA_NAME(referenced_object_id) = 'dbo' AND
OBJECT_NAME(referenced_object_id) = 'PARENT_TABLE'
Then you can alter the child table and drop the constraint by its name using the below statement:
ALTER TABLE dbo.childtable DROP CONSTRAINT FK_NAME;

Is it a good idea to have an index on a boolean?

I have a table with a boolean field, IsNew, that indicates whether or not the corresponding entity is new. I want to periodically query for all entities in a particular state. What are the implications of having index on boolean (or enum)? Will it create a hotspot? Any limitations on QPS?

A secondary index is implemented internally as a table that has a primary key based on the declared secondary index key, plus whatever indexed table keys weren't mentioned in the secondary index explicitly. So, say you have a table like this:
CREATE TABLE UserThings (
UserId INT64 NOT NULL,
ThingId INT64 NOT NULL,
...
IsNew BOOL NOT NULL,
...
) PRIMARY KEY(UserId, ThingId), ...
And you create an index like this:
CREATE INDEX UserThingsByIsNew ON UserThings(IsNew, ThingId)
That'll create an internal table that looks something like this:
CREATE TABLE UserThingsByStatus_Index (
IsNew BOOL,
ThingId INT64 NOT NULL,
UserId INT64 NOT NULL,
) PRIMARY KEY(new, ThingId, UserId), ...
So, when you update rows of UserThings to change the value of the IsNew column, it will delete the old row in UserThingsByIsNew_Index, and insert an additional row. This will tend to create a lot of churn in the index if the IsNew value of rows is changing at a high frequency. This might not be a problem at all, but you will only really know by testing your scenario under a real-world workload for a sustained time.
If you don't update the IsNew field of entities too frequently, then you probably won't have any hot-spotting problems. That's why I mentioned earlier that Cloud Spanner also appends the original table keys to the keys of the index: assuming that your original table rows are well-distributed by the table's keys, then the portion of the index for IsNew=true and IsNew=false, respectively, will have a similar distribution, and shouldn't cause a hotspot.

Slow Selection Query even after indexing the table (sqlite and c++)

Create tables
I have a database composed of two tables:
ENTITE_CANDIDATE
VARIATIONS
Tables are created by using the following queries:
CREATE TABLE IF NOT EXISTS ENTITE_CANDIDATE (ID INTEGER PRIMARY KEY NOT NULL, ID_KBP TEXT NOT NULL, wiki_title TEXT, type TEXT NOT NULL);"
CREATE TABLE IF NOT EXISTS VARIATIONS (ID INTEGER PRIMARY KEY NOT NULL, ID_ENTITE INTEGER, NAME TEXT, TYPE TEXT, LANGUAGE TEXT, FOREIGN KEY(ID_ENTITE) REFERENCES ENTITE_CANDIDATE(ID));"
Table ENTITE_CANDIDATE is composed of 818,742 records
Table VARIATIONS is composed of 154,716,653 records
Index tables
I indexed the previous tables by using the following queries:
`CREATE INDEX var_id ON VARIATIONS (ID, ID_ENTITE, NAME);`
`CREATE INDEX entity_id ON ENTITE_CANDIDATE (ID, wiki_title);`
Retrieve information
I want to retrieve from table VARIATIONS the following records:
"SELECT ID, ID_ENTITE, NAME FROM VARIATIONS WHERE NAME=foo ;"
Every select query is taking around 5.414931 seconds. I know the table contains a very large number of records. But can I make the retrieval faster? Am I indexing correctly the tables?

The documentation says:
the index might be used if the initial columns of the index … appear in WHERE clause terms.
This query uses only the NAME column to search, so the var_id index cannot be used. (That index is useful only for lookups that use ID, which is mostly useless because the ID column is already indexed as PRIMARY KEY.)

sql Column with multiple values (query implementation in a cpp file )

I am using this link.
I have connected my cpp file with Eclipse to my Database with 3 tables (two simple tables
Person and Item
and a third one PersonItem that connects them). In the third table I use one simple primary and then two foreign keys like that:
CREATE TABLE PersonsItems(PersonsItemsId int not null auto_increment primary key,
Person_Id int not null,
Item_id int not null,
constraint fk_Person_id foreign key (Person_Id) references Person(PersonId),
constraint fk_Item_id foreign key (Item_id) references Items(ItemId));
So, then with embedded sql in c I want a Person to have multiple items.
My code:
mysql_query(connection, \
"INSERT INTO PersonsItems(PersonsItemsId, Person_Id, Item_id) VALUES (1,1,5), (1,1,8);");
printf("%ld PersonsItems Row(s) Updated!\n", (long) mysql_affected_rows(connection));
//SELECT newly inserted record.
mysql_query(connection, \
"SELECT Order_id FROM PersonsItems");
//Resource struct with rows of returned data.
resource = mysql_use_result(connection);
// Fetch multiple results
while((result = mysql_fetch_row(resource))) {
printf("%s %s\n",result[0], result[1]);
}
My result is
-1 PersonsItems Row(s) Updated!
5
but with VALUES (1,1,5), (1,1,8);
I would like that to be
-1 PersonsItems Row(s) Updated!
5 8
Can somone tell me why is this not happening?
Kind regards.

I suspect this is because your first insert is failing with the following error:
Duplicate entry '1' for key 'PRIMARY'
Because you are trying to insert 1 twice into the PersonsItemsId which is the primary key so has to be unique (it is also auto_increment so there is no need to specify a value at all);
This is why rows affected is -1, and why in this line:
printf("%s %s\n",result[0], result[1]);
you are only seeing 5 because the first statement failed after the values (1,1,5) had already been inserted, so there is still one row of data in the table.
I think to get the behaviour you are expecting you need to use the ON DUPLICATE KEY UPDATE syntax:
INSERT INTO PersonsItems(PersonsItemsId, Person_Id, order_id)
VALUES (1,1,5), (1,1,8)
ON DUPLICATE KEY UPDATE Person_id = VALUES(person_Id), Order_ID = VALUES(Order_ID);
Example on SQL Fiddle
Or do not specify the value for personsItemsID and let auto_increment do its thing:
INSERT INTO PersonsItems( Person_Id, order_id)
VALUES (1,5), (1,8);
Example on SQL Fiddle

I think you have a typo or mistake in your two queries.
You are inserting "PersonsItemsId, Person_Id, Item_id"
INSERT INTO PersonsItems(PersonsItemsId, Person_Id, Item_id) VALUES (1,1,5), (1,1,8)
and then your select statement selects "Order_id".
SELECT Order_id FROM PersonsItems
In order to achieve 5, 8 as you request, your second query needs to be:
SELECT Item_id FROM PersonsItems
Edit to add:
Your primary key is autoincrement so you don't need to pass it to your insert statement (in fact it will error as you pass 1 twice).
You only need to insert your other columns:
INSERT INTO PersonsItems(Person_Id, Item_id) VALUES (1,5), (1,8)

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Creating an index on ARRAY<STRING(MAX)> in Google Cloud Spanner - google-cloud-platform

You can't create an index using an Array as a key: Disallowed types These cannot be of type ARRAY: A table's key columns. An index's key columns. You can include the Array in the index via the STORING keyword to return the array without joining to the primary table, but you can't scan on i

Related

Oracle APEX 21.2.0 display image Primary Key

how to insert/update data in sql database using azure databricks notebook jdbc

Is it a good idea to have an index on a boolean?

Slow Selection Query even after indexing the table (sqlite and c++)

sql Column with multiple values (query implementation in a cpp file )

Categories

Resources