Does a referenced query result in fetching the data again? - powerbi

In Power query I have a query fetching data from tblSales and performing row filtering on amount>100.
I want to reuse this query for another query. There are 2 options i have: duplicate or reference.
In duplicate the steps from the query are brought into the new query.
Where as in reference, the new query references the original query.
I right clicked on the original query and selected reference. Then added additional row filters. This new query that is generated, does it reuse the data fetched by original query?

Yes it will run the query 'again'.
If you want to only load the data once, you need to wrap your base query in one of the Buffer functions.
Table.Buffer()
Binary.Buffer()
List.Buffer()
See here for more details
In this scenario, the data is loaded twice:
In this Scenario, the data is loaded once ( Table.Buffer() on the base table ):

Related

How to use a query in another query?

If I want to use a query in another query what should I use :
reference
duplicate
and what's the difference between the 2 functions ?
I'm not sure where you see "related" but the other two show up if you right-click a query in the Power Query Editor.
Duplicate is like copy and pasting a query. It creates another query with all of the same steps and code. It's independent and modifying one query doesn't affect the other.
Reference does not reproduce the query. The new query starts exactly where the referenced query ended and is therefore dependent on it. If you change the referenced query, then the new query is affected too.
Duplicate creates a new copy with all the existing steps. The new copy will be isolated from the original query. You can make changes in the original or the new query, and they will NOT affect each other.
Reference, on the other hand, is a new copy with only one single step: getting data from the original query.

Power BI : Convert Duplicate Table into a Reference Table

Is there an automated way to convert a Duplicate Table(With all its steps) into a Reference Table preserving all the steps in Query Editor ?
Short answer, not really, but it's possibly trivial to do manually for one query.
Reference Table and Duplicate Table are GUI operations, which like other GUI operations, simply insert M code into the query. You can see the entire query in the Advanced Editor.
Reference Table just inserts the name of the other query; the effect is branching the data processing pipelines. If you change the original query, it affects all downstream queries.
Duplicate Table copies all of the steps; the effect is creating a separate query. You can change them at any point later. There is no link to where the steps came from even if they aren't changed.
So, it seems that you just want to convert duplicated steps to references. There is no automated way of doing it. But if you know two queries start with the same steps, try this: Duplicate to a base query and remove final steps that are not in common. Mark the new query to not load to the report by: Click All Properties; Uncheck Enable load to report. Then you can replace the duplicated initial steps in the other queries with a reference by a step like Source = BaseQuery in the Advanced Editor.
Also, if you find yourself duplicating steps in the middle of a query, you can create a query used as a function.

Detecting delta records for nightly capture?

I have an existing HANA warehouse which was built without create/update timestamps. I need to generate a number of nightly batch delta files to send to another platform. My problem is how to detect which records are new or changed so that I can capture those records within the replication process.
Is there a way to use HANA's built-in features to detect new/changed records?
SAP HANA does not provide a general change data capture interface for tables (up to current version HANA 2 SPS 02).
That means, to detect "changed records since a given point in time" some other approach has to be taken.
Depending on the information in the tables different options can be used:
if a table explicitly contains a reference to the last change time, this can be used
if a table has guaranteed update characteristics (e.g. no in-place update and monotone ID values), this could be used. E.g.
read all records where ID is larger than the last processed ID
if the table does not provide intrinsic information about change time then one could maintain a copy of the table that contains
only the records processed so far. This copy can then be used to
compare the current table and compute the difference. SAP HANA's
Smart Data Integration (SDI) flowgraphs support this approach.
In my experience, efforts to try "save time and money" on this seemingly simple problem of a delta load usually turn out to be more complex, time-consuming and expensive than using the corresponding features of ETL tools.
It is possible to create a Log table and organize columns according to your needs so that by creating a trigger on your database tables you can create a log record with timestamp values. Then you can query your log table to determine which records are inserted, updated or deleted from your source tables.
For example, following is from one of my test trigger codes
CREATE TRIGGER "A00077387"."SALARY_A_UPD" AFTER UPDATE ON "A00077387"."SALARY" REFERENCING OLD ROW MYOLDROW,
NEW ROW MYNEWROW FOR EACH ROW
begin INSERT
INTO SalaryLog ( Employee,
Salary,
Operation,
DateTime ) VALUES ( :mynewrow.Employee,
:mynewrow.Salary,
'U',
CURRENT_DATE )
;
end
;
You can create AFTER INSERT and AFTER DELETE triggers as well similar to AFTER UPDATE
You can organize your Log table so that so can track more than one table if you wish just by keeping table name, PK fields and values, operation type, timestamp values, etc.
But it is better and easier to use seperate Log tables for each table.

Kettle PDI how to pass multiple parameters not used in Table Input

I'm converting data from one database to another with a slightly different structure. In my flow at some point I need to read data from the first database filtering on the id coming from previous steps.
This is the image of my flow:
In the step "ZtlBus note" the query is:
SELECT e.*,UNIX_TIMESTAMP(v.dataInserimento)*1000 as timestamp
FROM verbale_evento ve JOIN evento e ON ve.eventi_id=e.id
WHERE ve.Verbale_id=? AND e.titolo='Note verbale'
Because I've just one parameter, in the previous step I use a Select values step. Unfortunately, after the Table input I need others fields coming from previous steps (Audit step) as marked in the picture.
I'm wondering how I can pass these fields after Table input. Some advice is appreciated.
if you use the "Database Join" step instead the input table step you will be able to keep the previous values of your transformation.

Fetching data from large BigQuery table in python

What I have is a BigQuery table(>5mil rows).
I need to fetch this data in batches and process it inside AppEngine, python.
The only way to fetch from a table that I know is to run SELECT query on this table and then iterate the result using tokens fetch_data returns.
It looks like this:
query = u"""\
SELECT url FROM %s
""" % (query_table)
query_job = client.run_async_query(str(uuid.uuid4()), query)
query_job.begin()
wait_for_job(query_job, 1)
query_results = query_job.results()
rows, total_rows, next_token = query_results.fetch_data(max_results=per_page, page_token=page_token)
This works on smaller tables, but on larger ones like mine it asks to allow large requests and specify target table. But this makes no sense to me. For to simply fetch data from a table I have to copy it to another table?
What you are running into is described in this documentation. In summary, apart from the limit on how much data can be fetched at a time, there is a point where your results become "large results." This is when your results are more than 128MB compressed as described here. When your results are classified as large, you can only store the result of a query in a table in Big Query.
Unfortunately I'm not sure there's a nice way to do what you want without reducing how many rows you are retrieving at once. What you'll likely need to do is explore the exporting data documentation for big query.
You should use tabledata.list API for fetching data from table.
Using parameters (startIndex or pageToken) and maxResults you can control size of page you fetch.
I think this is exactly what you need link, as far as I understood you can't get a large result of a query but you can get the entire table data to your app no mater how big it is, thats why you need to put the large result in a table and then get this table data to your app and do whatever you want with it
good luck :)