I have inventory_history table that contains 2 millions of data, on which I am performing uncached lookup.
From source table, I am retrieving last 3 months data, which is around 300 thousand rows.
My mapping contains single lookup and is uncached (inventory_history). Lookup overide is used to retrieve data from inventory_history table, the condition columns are indexed and not using any unwanted columns.
But I see the t/m busy percentage is 100% and is like below 100. Lookup override query is executing well in database. This mapping is taking forever time. How can I tune the performance.
Don't know where the problem exists... Any suggestions ?
SELECT
SUM(CASE WHEN UPPER(GM) = 'B' and UNITS> 100 THEN A.QTY/B.UNITS ELSE QTY END) AS QTY,
A.TDATE as TDATE,
A.TDATE_ID as TDATE_ID,
A.DIST_ID as DIST_ID,
A.PRODID as PROD_ID
FROM
HUSA_ODS.INVENTORY_HISTORY A,
HUSA_ODS.PRODUCT B
WHERE
A.PROD_ID = B.PROD_ID
AND TCODE = '10' AND
DISTID = ?DISTID_IN?
AND A.PROD_ID = ?PROD_ID_IN?
AND TDATE <= ?PERIOD_DATE_IN?
GROUP BY
TDATE,
TDATE_ID,
DIST_ID,
A.PROD_ID
ORDER BY
TDATE DESC,
DIST_ID,
A.PROD_ID , TDATE--
Here output columns are QTY and TDATE
Uncached lookup would hit the database for each of your 300 thousand rows coming from source. Use a cached lookup and see if you can filter out some of the data in your lookup query.
Related
I am trying to create on the fly currencies conversion from many currencies input ( InvoicesHeaders rows have differents currencies , so each row have an amount and the currency code for this amount) and many currencies output ( each affiliate want to see figures with it's own currency ).
Therefore I end up in a many to many, join between the InvoiceTable and the currency table. To join them I create in SQL a concatenated field with the day and the currency code.
Then ( reusing tutorial from internet ) I create a calculation doing a lookup from the Invoice to the rate.
Amount adj:=SUMX(Invoices,Invoices[TotalInvoiceAmount]/LOOKUPVALUE(ExchangeRatesPerDay[Rate],ExchangeRatesPerDay[ToCurrencyConcatenatedday112],Invoices[CurrencyCodeConcatenateInvoiceDate112]))
However, when I am trying to use this measure in excel (filtering on one currency at the time of course ) I am getting an error message saying many rows where pass but only one was expected.
From the error message, it looks like the lookup is getting multiple values which is strange because in the excel I am filtering on one currency. Therefore for each combination of day+currencycode there is only one row. I check the SQL using this query
with cte as (
SELECT [RateTypeName]
,[FromCurrency]
,[ToCurrency]
,[StartDate]
,[Rate]
,[EndDate]
,[ConversionFactor]
,[RateTypeDescription]
,[dday]
,[dday112]
,[ToCurrencyConcatenatedday112]
,[FromCurrencyConcatenatedday112]
, count(*) over (partition by [ToCurrencyConcatenatedday112],FromCurrency ) as co
FROM [stg].[ExchangeRatesPerDay]
)
select * from cte where co>1
And it doesn't return any record.
I will appreciate any idea you may have.
Regards
Vincent
I don't understand why my answer have been deleted. Anynay I am posting it back : I found this website https://www.kasperonbi.com/currency-conversion-in-dax-for-power-bi-and-ssas/ that provide an answer. I have been using this logic in production for over a month and it work great.
I have a customers table with ID's and some datetime columns. But those ID's have duplicates and i just want to Analyse distinct ID values.
I tried using groupby but this makes the process very slow.
Due to data sensitivity can't share it.
Any suggestions would be helpful.
I'd suggest using ROW_NUMBER() This lets you rank the rows by chosen columns and you can then pick out the first result.
Given you've shared no data or table and column names here's an example based on the Adventureworks database. The technique will be the same, you partition by whatever makes the group of rows you want to deduplicate unique (ProductKey below) and order in a way that makes the version you want to keep first (Children, birthdate and customerkey in my example).
USE AdventureWorksDW2017;
WITH CustomersOrdered AS
(
SELECT S.ProductKey, C.CustomerKey, C.TotalChildren, C.BirthDate
, ROW_NUMBER() OVER (
PARTITION BY S.ProductKey
ORDER BY C.TotalChildren DESC, C.BirthDate DESC, C.CustomerKey ASC
) AS CustomerSequence
FROM dbo.FactInternetSales AS S
INNER JOIN dbo.DimCustomer AS C
ON S.CustomerKey = C.CustomerKey
)
SELECT ProductKey, CustomerKey
FROM CustomersOrdered
WHERE CustomerSequence = 1
ORDER BY ProductKey, CustomerKey;
you can also just sort the columns with date column an than click on id column and delete duplicates...
I have a data warehouse where a lot of values are stored as coded values. Coded columns store a numeric value that relates to a row on the CODE_VALUE table. The row on the CODE_VALUE table contains descriptive information for the code. For example, the ADDRESS table has a Address_TYPE_CD column.Address type can be home/office/postal address etc . The output from selecting these columns would be a list of numbers as 121234.0/2323234.0/2321344.0. So we need to query the code_value table to get their descriptions.
We have created a function which hits the code_value table and gets the description for these codes. But when I use the function to change codes to their description it takes almost 15 minutes for a query that otherwise takes a few seconds . So I was thinking of loading the table permanently in Cache. Any suggestions how can this be dealt with??
A solution being used by another system is described below
I have been using Cerner to query the database, which uses User access routines to convert these code_values and are very quick. Generally they are written in C++. The routine is using the global code cache to look up the display value for the code_value that is passed to it. That UAR never hits Oracle directly. The code cache does pull the values from the Code_Value table and load them into memory. So the code cache system is hitting Oracle and doing memory swapping to cache the code values, but the UAR is hitting that cached data instead of querying the Code_Value table.
EXAMPLE :
Person table
person_id(PK)
person_type_cd
birth_dt_cd
deceased_cd
race_cd
name
Visit table
visit_id(PK)
person_id(FK)
visit_type_cd
hospital_cd
visit_dt_tm
disch_dt_tm
reason_visit_cd
address code_value
address_id(PK)
person_id(FK)
address_type_cd
street
suburb_cd
state_cd
country_cd
code_value table
code_value
code_set
description
DATA :
code_value table
code_value code_set description
visit_type :
121212 12 admitted
122233 12 emergency
121233 12 outpatient
address_type :
1234434 233 home
23234 233 office
343434 233 postal
ALTER function [dbo].[getDescByCv](#cv int)
returns varchar(80)
as begin
-- Returns the code value display
declare #ret varchar(80)
select #ret = cv.DESCRIPTION
from CODE_VALUE cv
where cv.code_value = #cv
and cv.active_ind = 1
return isnull(#ret, 0)
end;
Final query :
SELECT
v.PERSON_ID as PersonID
, v.ENCNTR_ID as EncntrID
, [EMR_DWH].dbo.[getDispByCv](v.hospital_cd) as Hospital
, [EMR_DWH].dbo.[getDispByCv](v.visit_type_cd) as VisitType
from visit v
SELECT
v.PERSON_ID as PersonID
, v.ENCNTR_ID as EncntrID
, [EMR_DWH].dbo.[getDispByCv](v.hospital_cd) as Hospital
, [EMR_DWH].dbo.[getDispByCv](v.visit_type_cd) as VisitType
, [EMR_DWH].dbo.[getDispByCv](n.person_type) as PersonType
, [EMR_DWH].dbo.[getDispByCv](v.deceased_cd) as Deceased
, [EMR_DWH].dbo.[getDispByCv](v.address_type_cd) as AddressType
, [EMR_DWH].dbo.[getDispByCv](v.country_cd) as Country
from visit v
,person p
,address a
where v.visit_id = 102288.0
and v.person_id = p.person_id
and p.person_id = a.person_id
I am trying to get data insights of my calendar through visualizations in PowerBI. I am able to get almost all data from my outlook calendar using in-house API in PowerBI. I intend to find how many conflicting meetings I have per week, but I couldn't find any flag column for that. I'm trying to use time slicers to generate a what-if parameter to calculate a flag, but it doesn't work. Is there any way I can track conflicting meetings?
The data I have relative to meetings is as below -
You could add a Calculated Column to the dataset, with a formula like this:
Conflicting =
VAR StartDate = 'Calendar'[Start]
VAR EndDate = 'Calendar'[End]
VAR IDCurrent= 'Calendar'[Id]
RETURN
IF (
COUNTROWS(
FILTER (
ALL('Calendar');
'Calendar'[Start] < EndDate &&
'Calendar'[End] > StartDate &&
'Calendar'[Id] <> IDCurrent
)
) > 0; TRUE(); FALSE())
This formula checkes if there are different rows within the same date range.
You can adjust the date comparions based on your needs. I've got the logic from this post and removed the equal signs, to prevent contiguous items marked as overlapping.
The Id column is the Unique Identifier (like a unique, primairy key) automaticly provided by Exchange Online. The filter on Id <> IDCurrent makes sure you're not mark the current row as overlapping, e.g. it searches for all rows exept the current one.:
Result:
Edit: The formula above results in a true/false value. You can easily remove the if statement, to count the conflicting appointements, but remember that the value will be counted twice (or more); for each conflicting appointment.
This seemingly simple query (1 join) takes many hours to run even though the table containes less than 1.5 million rows...
I have Product items which have a one-to-many relationship with RetailerProduct items and I would like to find all Product items whose related RetailerProducts do not contain any instances of retailer_id=1.
There are about 1.5 million rows in Product, and about 1.1 million rows in RetailerProduct with retailer_id=1 (2.9 million in total in RetailerProduct)
Models:
class Product(models.Model):
...
upc = models.CharField(max_length=96, unique=True)
...
class RetailerProduct(models.Model):
...
product = models.ForeignKey('project.Product',
related_name='retailer_offerings',
on_delete=models.CASCADE,
null=True)
...
class Meta:
unique_together = (("retailer", "retailer_product_id", "retailer_sku"),)
Query:
Product.objects.exclude(
retailer_offerings__retailer_id=1).values_list('upc', flat=True)
Generated SQL:
SELECT "project_product"."upc" FROM "project_product"
WHERE NOT ("project_product"."id" IN
(SELECT U1."product_id" AS Col1 FROM "project_retailerproduct" U1
WHERE (U1."retailer_id" = 1 AND U1."product_id" IS NOT NULL))
)
Running that query takes hours.
An EXPLAIN in the psql shell renders:
QUERY PLAN
---------------------------------------------------------------------------------------------------
Seq Scan on project_product (cost=0.00..287784596160.17 rows=725892 width=13)
Filter: (NOT (SubPlan 1))
SubPlan 1
-> Materialize (cost=0.00..393961.19 rows=998211 width=4)
-> Seq Scan on project_retailerproduct u1 (cost=0.00..385070.14 rows=998211 width=4)
Filter: ((product_id IS NOT NULL) AND (retailer_id = 1))
(6 rows)
I wanted to post the EXPLAIN ANALYZE but it's still running.
Why is the cost so high for Seq Scan on project_product? Any optimization suggestions?
1.1 million rows in RetailerProduct with retailer_id=1 (2.9 million in total in RetailerProduct)
You are selecting 1.1 million rows out of 2.9 million rows. Even if you had an index on retailer_id it wouldn't be much use here. You are looking at almost half the table here. This will need a full table scan.
Then let us recall that WHERE NOT IN type queries are generally slow. In your case you are comparing the product_id column against 1.1 million rows. Having done that you are actually fetching the rows, which probably amounts to several hundred thousand rows. You might want to consider a LIMIT but even then the query probably wouldn't be a whole lot faster.
So this is not a query that can easily be optimized. You might want to use a completely different query. Here is an example raw query
SELECT "project_product"."upc" FROM "project_product" LEFT JOIN
(SELECT product_id FROM "project_retailerproduct"
WHERE retailer_id = 1)
AS retailer
ON project_product.id = retailer.product_id WHERE
WHERE retailer.product_id IS NULL