Database having 5 users
All users are running queries on database and we need to find what are the things all users doing like
(
LOGIN_NAME,
QUERY_START_TIME,
QUERY_END_TIME,
total_elapsed_time,
QUERY
)
The short answer is to query sys.dm_pdw_exec_requests.
Short answer, because it only holds the last 10,000 queries in a ring buffer.
Otherwise, use Azure Monitor and log the DMVs to storage for longer term analytics.
Related
I am trying to find the amount of data queried per statement from AWS Lambda on Redshift, but all I can find is amount of data queried per query ID. There are multiple lambdas which I am running but I can't seem to relate the lambdas to the query ID.
I tried to look up the documentation on AWS Redshift system views, but there doesn't seem to be any tables which contain these values.
So there are a few ways to do this. First off the Lambda can find its session id with PG_BACKEND_PID(). This can be reported out / logged from the Lambda to report all statements from this session. Or you can add a unique comment to to all the queries coming from Lambda and you can search on this in svl_statementtext. Or you can do both. Once you have the query id and session id you look at the query statistics (SVL_QUERY_REPORT or other catalog tables).
Be aware that query ids and session ids repeat over time so also check the date to make sure you are not seeing a query from some time ago.
On current project we have a webapp with analytics module. The users select some filters and based on those filters table or graph is shown. We want the module to be responsive, so when the users select the filters it can get data in matters of seconds.
User filters are querying a large table ~1,000,000,000 rows and 20 columns (for a few years it should grow 2x/year in rows). 18 out of 20 columns are filtrable. And mostly there will be SELECT + WHERE queries.
We are not sure, should we use Data Warehouses or classical DBs.
Current reasearch suggests we should discuss between Clickhouse, DynamoDB, Snowflake, BigQuery or Redshift. Has anyone had similar use cases and which database solution would you recommend?
Since you are using the database for analytics purposes, it is recommended to use a OLAP ( Redshift)..
an OLAP database is designed to process large datasets quickly to answer questions about data.
You can compare the pricing here
https://medium.com/2359media/redshift-vs-bigquery-vs-snowflake-a-comparison-of-the-most-popular-data-warehouse-for-data-driven-cb1c10ac8555
Currently we have a problem with loading data when updating the report data with respect to the DB, since it has too many records and it takes forever to load all the data. The issue is how can I load only the data from the last year to avoid taking so long to load everything. As I see, trying to connect to the COSMO DB in the box allows me to place an SQL query, but I don't know how to do it in this type of non-relational database.
Example
Power BI has an incremental refresh feature. You should be able to refresh the current year only.
If that still doesn’t meet expectations I would look at a preview feature called Azure Synapse Link which automatically pulls all Cosmos DB updates out into analytical storage you can query much faster in Azure Synapse Analytics in order to refresh Power BI faster.
Depending on the volume of the data you will hit a number of issues. First is you may exceed your RU limit, slowing down the extraction of the data from CosmosDB. The second issue will be the transforming of the data from JSON format to a structured format.
I would try to write a query to specify the fields and items that you need. That will reduce the time of processing and getting the data.
For SQL queries it will be some thing like
SELECT * FROM c WHERE c.partitionEntity = 'guid'
For more information on the CosmosDB SQL API syntax please see here to get you started.
You can use the query window in Azure to run the SQL commands, or Azure Storage Explorer to test the query, then move it to Power BI.
What is highly recommended is to extract the data into a place where is can be transformed into a strcutured format like a table or csv file.
For example use Azure Databricks to extract, then turn the JSON format into a table formatted object.
You do have the option of using running Databricks notebook queries in CosmosDB, or Azure DataBricks in its own instance. One other option would to use change feed to send the data and an Azure Function to send and shred the data to Blob Storage and query it from there, using Power BI, DataBricks, Azure SQL Database etc.
In the Source of your Query, you can make a select based on the CosmosDB _ts system property, like:
Query ="SELECT * FROM XYZ AS t WHERE t._ts > 1609455599"
In this case, 1609455599 is the timestamp which corresponds to 31.12.2020, 23:59:59. So, only data from 2021 will be selected.
Is there any system tables in Google Bigquery to check all the current running queries? I am looking similar to V$SQL table and v$Session tables in Oracle.
You can query the INFORMATION_SCHEMA.JOBS_BY_* view to retrieve real-time metadata about BigQuery jobs. This view contains currently running jobs, as well as the last 180 days of history of completed jobs.
for example
SELECT
job_id,
creation_time,
query
FROM `region-us`.INFORMATION_SCHEMA.JOBS_BY_USER
WHERE state != "DONE"
Note: Valid states include PENDING, RUNNING, and DONE.
We have sysdig running on our WSO2 API gateway machine and we notice that it fires a large number of SQL queries to the database for a minute, than waits a minute and repeats.
The query looks like this:
Every minute it goes wild, waits for a minute and goes wild again with a request of the following format:
SELECT REG_PATH, REG_USER_ID, REG_LOGGED_TIME, REG_ACTION, REG_ACTION_DATA
FROM REG_LOG
WHERE REG_LOGGED_TIME>'2016-02-29 09:57:54'
AND REG_LOGGED_TIME<'2016-03-02 11:43:59.959' AND REG_TENANT_ID=-1234
There is no load on the server. What is causing this? What can we do to avoid this?
screen shot sysdig api gateway process
This particular query is the result of the registry indexing task that runs in the background. The REG_LOG table is being queried periodically to retrieve the latest registry actions. The indexing task cannot be stopped. However, one can configure the frequency of the indexing task through the following parameter that is in the registry.xml. See [1] for more information.
indexingFrequencyInSeconds
If this table is filled up, one can clean the data using a simple SQL query. However, when deleting the records, one must be careful not to delete all the data. The latest records of each resource path should be left in the REG_LOG table since reindexing of data requires at least one reference of each resource path.
Also, if required, before clearing up the REG_LOG table, you can take a dump of the data in case you do not want to loose old records. Hope this answer provides information you require.
[1] - https://docs.wso2.com/display/Governance510/Configuration+for+Indexing