Synapse SQL on-demand firstrow skipping more than just the 1st row - azure-sqldw

Hi have noticed that when you set firstrow = 2 the result set has misisng rows.
This can be easily noticed:
The query below (querying a public data source) returns 41165. Setting firstrow = 3 return 41119 (my expectation is that it should only have 1 row less).
Interestingly, changing the query to select count(*) has expected behaviour (i.e. rowcount will decrease by 1 if firstrow is incremented).
I noticed the issues after troubleshooting a sum funtion which returned less than i was expecting.
select COUNT(c1)
from openrowset(
bulk 'https://pandemicdatalake.blob.core.windows.net/public/curated/covid-19/ecdc_cases/latest/ecdc_cases.csv',
format = 'csv',
parser_version = '2.0',
firstrow = 2) as rows

Thank you for raising this, we are aware of this issue.
Fix for this will land soon.
In the meantime, you can use parser_version = '1.0'.
Try using this query:
select COUNT(date_rep)
from openrowset(
bulk 'https://pandemicdatalake.blob.core.windows.net/public/curated/covid-19/ecdc_cases/latest/ecdc_cases.csv',
format = 'csv',
parser_version = '1.0',
firstrow = 3
) WITH (
[date_rep] datetime2,
[day] smallint,
[month] smallint,
[year] smallint,
[cases] smallint,
[deaths] smallint,
[countries_and_territories] VARCHAR (100)
) AS [r]

Related

Power BI Calculated Column based on max date and a filter

I have a table that looks as follows:
I am trying to do a calculated column, so that it is 1 when Attribute 1 is 'Actual' and is at the latest available date (eg 3 Nov in this example). The following DAX calculated column does not work, would anyone know why?
LastDateFilter =
VAR MaxDate =
CALCULATE (
MAX ( 'Table'[Date]),'Table'[Attribute 1]="Actual")
RETURN
IF ('Table'[Date] = MaxDate, 1 ,0)
If I understood properly your question, you want a result like the image that you have provided.
I have changed some names and dates but keeping the structure.
This is my approach:
LastDateFilter =
VAR MaxDate =
CALCULATE (
MAX ( 'Tabla'[Date]),FILTER('Tabla','Tabla'[At1]="Actual" && NOT
ISBLANK('Tabla'[At2])))
RETURN
IF (('Tabla'[Date] == MaxDate &&'Tabla'[At1]=="Actual"), 1 ,0)
I put filter for clarify but is not necessary...
If you add data it works:

DAX Measure for SQL NOT EXISTS

I would like to add an incremental refresh for one of our biggest transactional tables.
This transactional table has this structure:
Order
Value
Index1
Index2
100
5
1
0
101
5
2
0
102
6
3
0
103
2
4
0
103
3
5
4
104
4
6
0
Order: The order number
Value: The order value
Index1: Row Index total
Index2: Row Index, which should be replaced
As you can see in order 103, there are two rows in (Index1: 4 &5). The booking with Index1 = 5, is the correction booking. That means that the row with Index1= 4 should be filtered out.
The SQL code I am using to filter all false entries is this one:
SELECT DBA1.Order,
DBA1.Value
FROM AZ.SC DBA1
WHERE NOT EXISTS (SELECT *
FROM AZ.SC DBA2
WHERE DBA2.INDEX2 = DBA1.INDEX1)
Since this SQL will not allow "Query Folding" which is necessary for PBI Incremental refresh, I need somehow an approach within a DAX Measure, that will filter also all false entries. But how?
Please let me know if you need further information.
If your idea is to import all of the rows and later filter the fact table keeping only the updated ones, a possible solution is to add a calculated column to be used in any measure that uses the fact table, stating if the row is to be considered or not. This can be achieved for instance with the following DAX code
IsValid =
VAR CurrentIndex = CALCULATETABLE( VALUES( SC[Index1] ) )
VAR RemovedIndexes = ALL( SC[Index2] )
RETURN
IF (
ISEMPTY( INTERSECT(CurrentIndex, RemovedIndexes) ),
1,
0
)
Otherwise, if the idea is to compute a calculated table with the older rows filtered out a possible implementation is
SC Filtered =
VAR AllIndexes = ALL( SC[Index1] )
VAR RemovedIndexes = ALL( SC[Index2] )
VAR ValidIndexes = EXCEPT( AllIndexes, RemovedIndexes )
RETURN
SUMMARIZECOLUMNS(
SC[Order],
SC[Value],
TREATAS( ValidIndexes, SC[Index1] )
)
But this might waste a lot of memory, since it almost duplicates the fact table.

Exasol Update Table using subselect

I got this statement, which works in Oracle:
update table a set
a.attribute =
(select
round(sum(r.attribute1),4)
from table2 p, table3 r
where 1 = 1
and some joins
)
where 1 = 1
and a.attribute3 > 10
;
Now I would like to do the same statement in Exasol DB. But I got error [Code: 0, SQL State: 0A000] Feature not supported: this kind of correlated subselect (Session: 1665921074538906818)
After some research, I found out you need to write the query in following syntax:
UPDATE table a
set a.attribute = r.attribute2
FROM table a, table2 p, table3 r
where 1 = 1
and some joins
and a.attribute3 > 10;
The problem is I can't take sum of r.attribute2. So I get unstable set of rows. Is there any way to do the first query in Exasol DB?
Thanks for help guys!
Following SQL UPDATE statement will work for cases if JOIN between table1 and table2 are 1-to-1 (or if there is a 1-to-1 relation between target table and resultset of JOINs)
In this case target table val column is updated otherwise an error is returned
UPDATE table1 AS a
SET a.val = table2.val
FROM table1, table2
WHERE table1.id = table2.id;
On the other hand, if the join is causing multiple returns for single table1 rows, then the unstable error raised.
If you want to sum the column values of the multiplying rows, maybe following approach can help
First sum all rows of table2 in bases of table1 and use this sub-select as a new temp table, then use this in UPDATE FROM statement
UPDATE table1 AS a
SET a.val = table2.val
FROM table1
INNER JOIN (
select id, sum(val) val from table2 group by id
) table2
ON table1.id = table2.id;
I tried to solve the issue using two tables
In your case probably you will use table2 and table3 in the subselect statement
I hope this is the answer you were looking for

Filter existing table to another table without adding measures or column on existing table

I want to create a table based on input table.
Input table is:
The new table filters the input table to show the last entry of every day.
I have tried working with measure but sometimes cant tell if it is working right until I graph it in pivot tables which is not so bad but sometimes just doesn't show me what I need to see exactly.
I have tried this measure:
History_Daily Efficiency =
VAR LastDailyEfficiency =
GENERATE(
VALUES ('Table_Full'[Cell]),
CALCULATETABLE (
TOPN (
1,
GROUPBY (
'Table_Full',
'Table_Full'[Date],
'Table_Full'[Time],
'Table_Full'[Efficiency]
),
'Table_Full'[Date], DESC,
'Table_Full'[Time], DESC,
'Table_Full'[Efficiency], ASC
)
)
)
RETURN
CALCULATE (
AVERAGE('Table_Full'[Efficiency]),
TREATAS( LastDailyEfficiency, 'Table_Full'[Cell], 'Table_Full'[Date], 'Table_Full'[Time], 'Table_Full'[Efficiency]),
'Table_Full'[Efficiency] < 80
)
But I got this:
I would like to see this as the output:
You can create a new table:
LastDayCount = GROUPBY(Table_Full;Table_Full[lob/Part Number];Table_Full[Date];"LastDate";MAXX(CURRENTGROUP(); Table_Full[DateTime]))
This will create a table with the last DateTime of the day.
Next we add a column giving us the max of that particular last datetime of the day. I noticed that you have more the same entries, the logic below takes the max part count at the end of the day when more than one entry.
Count =
CALCULATE(MAX(Table_Full[Part Count]);
FILTER(Table_Full;LastDayCount[Table_Full_lob/Part Number] = Table_Full[lob/Part Number]
&& LastDayCount[LastDate] = Table_Full[DateTime]))
End result:

How to find the sum of values present in one table and missing in other table

I am trying to find the sum of the values corresponding to a key that is present in Table 1 but not in Table 2.
These tables have been created based on some filters and represent the values on 2 different dates.
The two different dates are chosen from 2 different date tables which have an inactive relationship, specially created for this purpose.
I want to create a measure that finds out the sum.
Below is the syntax I have used:
Difference =
VAR
Table1 = CALCULATETABLE(VALUES('TableA'[Id]), 'TableA'[Type] = "ABC", ALL('Date'), USERELATIONSHIP('Date'[As of Date], Previous_Date[Previous_Date]),USERELATIONSHIP('Date'[As of Date], 'TableA'[As of Date]))
VAR
Table2 = CALCULATETABLE(VALUES('TableA'[Id]), 'TableA'[Type] = "ABC", USERELATIONSHIP('Date'[As of Date], 'TableA'[As of Date]))
RETURN
IF(AND(VALUES('TableA'[Id]) IN Table1 , NOT(VALUES('TableA'[Id])) IN Table2),
CALCULATE(SUM('TableA'[Values])),0)
It is error-free. However, when I drop the measure on a KPI visual, I am getting the following message:
Please tell me what is wrong with the syntax. Also, please let me know if there is any better code that can be written.
Kindly help.
The reason you are getting an error is that it expects a single value for x when you write x IN Table1 but using the VALUES function can return a list rather than a single value.
I'd try something more like this after the RETURN:
SUMX (
'TableA',
IF ( 'TableA'[Id] IN Table1 && NOT ( 'TableA'[Id] IN Table2 ), 'TableA'[Value] )
)
This iterates through each row of TableA and checks if the Id value in each row is in the tables you calculated. If the condition is met, it adds the Value. Otherwise IF returns a blank (which is treated the same as 0 in a sum) since there is no third argument.