find unused and duplicate indexes in in-memory OLTP database - in-memory-database

I have a SQL Server 2016 in-memory OLTP database. I want to know how can I get the list of unused & duplicate hash index & non-clustered indexes there.

You can use the following query that maintains stats about your memory-optimized indexes: -
SELECT ix.index_id, ix.name, scans_started, rows_returned
FROM sys.dm_db_xtp_index_stats ixs JOIN sys.indexes ix ON ix.object_id=ixs.object_id AND ix.index_id=ixs.index_id
WHERE ix.object_id=object_id('**<table_name>**')
The rows_returned and scans_started are good indicators. More info here:
https://learn.microsoft.com/en-us/sql/relational-databases/system-dynamic-management-views/sys-dm-db-xtp-index-stats-transact-sql

Related

What is the best practice for loading data into BigQuery table?

Currently I'm loading data from Google Storage to stage_table_orders using WRITE_APPEND. Since this load both new and existed order there could be a case where same order has more than one version the field etl_timestamp tells which row is the most updated one.
then I WRITE_TRUNCATE my production_table_orders with query like:
select ...
from (
SELECT * , ROW_NUMBER() OVER
(PARTITION BY date_purchased, orderid order by etl_timestamp DESC) as rn
FROM `warehouse.stage_table_orders` )
where rn=1
Then the production_table_orders always contains the most updated version of each order.
This process is suppose to run every 3 minutes.
I'm wondering if this is the best practice.
I have around 20M rows. It seems not smart to WRITE_TRUNCATE 20M rows every 3 minutes.
Suggestion?
We are doing the same. To help improve performance though, try to partition the table by date_purchased and cluster by orderid.
Use a CTAS statement (to the table itself) as you cannot add partition after fact.
EDIT: use 2 tables and MERGE
Depending on your particular use case i.e. the number of fields that could be updated between old and new, you could use 2 tables, e.g. stage_table_orders for the imported records and final_table_orders as destination table and do
a MERGE like so:
MERGE final_table_orders F
USING stage_table_orders S
ON F.orderid = S.orderid AND
F.date_purchased = S.date_purchased
WHEN MATCHED THEN
UPDATE SET field_that_change = S.field_that_change
WHEN NOT MATCHED THEN
INSERT (field1, field2, ...) VALUES(S.field1, S.field2, ...)
Pro: efficient if few rows are "upserted", not millions (although not tested) + pruning partitions should work.
Con: you have to explicitly list the fields in the update and insert clauses. A one-time effort if schema is pretty much fixed.
There are may ways to de-duplicate and there is no one-size-fits-all. Search in SO for similar requests using ARRAY_AGG, or EXISTS with DELETE or UNION ALL,... Try them out and see which performs better for YOUR dataset.

Need to create Indices (Index) for Sqlite3 Database

I am using C++ - sqlite3 for creating a Database. The database is regularly updated and after point of time becomes very large in size (in GB's) which make it very slow when we execute query (via C++ syntax).
I read in sqlite site that for large Database tables we can create Index tables for optimization and speeding up database queries. I now have successfully written an index syntax just after creating my database.
sqlQuery << "CREATE INDEX IF NOT EXISTS 'IdxNode_Val' ON node_values (aliasDevice,aliasProperty,sourceTimestamp);";
int rc = sqlite3_exec(this->db, sqlQuery.str().c_str(), 0, 0, &zErrMsg);
Query:
The question is since my database gets updated on a daily basis (new entries added), will the index table also be updated automatically or should I have to write a syntax to update the index table just after my "insert into database" syntax?
Thanks & best Regards
rG
The database automatically updates indexes.

Does Cloud Spanner support a TRUNCATE TABLE command?

I want to clear all the values from a table. It has a few secondary indexes. I tried to do this via committing a transaction with Mutation.delete("MyTable", KeySet.all()) (see docs here). But I got an error:
error:INVALID_ARGUMENT: io.grpc.StatusRuntimeException: INVALID_ARGUMENT: The transaction contains too many mutations.
How can I efficiently clear my table contents?
Cloud Spanner does not support such a truncate command. If your table had no secondary indexes, then you could specify KeySet.all() in your Delete as specified above, but this can fail if your table has secondary indexes and is large.
The best way to do what you want is to issue an updateDdl RPC including the following statements:
1) For each secondary index on MyTable, include a corresponding DROP INDEX statement
2) DROP TABLE MyTable
3) If necessary, re-create your table and indexes via the CREATE TABLE and CREATE INDEX statements, respectively.
Note that you are allowed and encouraged to include all of these statements in a single updateDdl RPC. The advantage of this is that it gives you atomic ("all-or-nothing") semantics.

Cassandra: How to query the complete data set?

My table has 77k entries (number of entries keep increasing this a high rate), I need to make a select query in CQL 3. When I do select count(*) ... where (some_conditions) allow filtering I get:
count
-------
10000
(1 rows)
Default LIMIT of 10000 was used. Specify your own LIMIT clause to get more results.
Let's say the 23k rows satisfied this some_condition. The 10000 count above is of the first 10k rows of these 23k rows, right? But how do I get the actual count?
More importantly, How do I get access to all of these 23k rows, so that my python api can perform some in-memory operation on the data in some columns of the rows. Are there a some sort pagination principles in Cassandra CQL 3.
I know I can just increase the limit to a very large number but that's not efficient.
Working Hard is right, and LIMIT is probably what you want. But if you want to "page" through your results at a more detailed level, read through this DataStax document titled: Paging through unordered partitioner results.
This will involve using the token function on your partitioning key. If you want more detailed help than that, you'll have to post your schema.
While I cannot see your complete table schema, by virtue of the fact that you are using ALLOW FILTERING I can tell that you are doing something wrong. Cassandra was not designed to serve data based on multiple secondary indexes. That approach may work with a RDBMS, but over time that query will get really slow. You should really design a column family (table) to suit each query you intend to use frequently. ALLOW FILTERING is not a long-term solution, and should never be used in a production system.
you just have to specify limit with your query.
let's assume your database is containing under 1 lack records so if you will execute below query it will give you the actual count of the records in table.
select count(*) ... where (some_conditions) allow filtering limit 100000;
Another way is to write python code, the cqlsh indeed is python script.
use
statement = " select count(*) from SOME_TABLE"
future = session.execute_async(statement)
rows = future.result()
count = 0
for row in rows:
count = count + 1
the above is using cassandra python driver PAGE QUERY feature.

Partitioning a table in sybase-select query

My main concern:
I have an existing table with huge data.It is having a clustered index.
My c++ process has a list of many keys with which it checks whether the key exists in the table,
and if yes, it will then check the row in the table and the new row are similar. if there is a change the new row is updated in the table.
In general there will less changes. But its huge data in the table.
S it means there will be lot of select queries but not many update queries.
What I would I like to achieve:
I just read about partitioning a table in sybase here.
I just wanted to know will this be helpful for me, as I read in the article it mentions about the insert queries only. But how can I improve my select query performance.
Could anyone please suggest what should I look for in this case?
Yes it will improve your query (read) performance so long as your query is based on the partition keys defined. Indexes can also be partitioned and it stands to reason that a smaller index will mean faster read performance.
For example if you had a query like select * from contacts where lastName = 'Smith' and you have partitioned your table index based on first letter of lastName, then the server only has to search one partition "S" to retrieve its results.
Be warned that partitioning your data can be difficult if you have a lot of different query profiles. Queries that do not include the index partition key (e.g. lastName) such as select * from staff where created > [some_date] will then have to hit every index partition in order to retrieve it's result set.
No one can tell you what you should/shouldn't do as it is very application specific and you will have to perform your own analysis. Before meddling with partitions, my advice is to ensure you have the correct indexes in place, they are being hit by your queries (i.e. no table scans), and your server is appropriately resourced (i.e got enough fast disk and RAM), and you have tuned your server caches to suit your queries.