Stop Iterative API Pagination Requests When Data "Caught Up"?

Stop Iterative API Pagination Requests When Data "Caught Up"? - powerbi

I've created a minimal reproducible example below for testing, using the free paginated API endpoint https://api.instantwebtools.net/v1/passenger?page=0&size=10 for anyone to use. This endpoint is perfect because it closely resembles my real use case in what I'm looking for.
I'm using an iterative loop process to update page=0 with page=i in the URL endpoint during the iteration to get multiple pages of data. The response of each API call contains a page of 10 records with no primary key or index of any kind.
What I'm looking for is every time I run this query/code, I want to stop the request iteration if we start seeing duplicate results that are already in the previously saved table. My use case makes thousands of API calls, and the latest information is always retrieved first, so I'm looking to just add the latest records from the API to the top of my table, and once we start receiving data that's already in the table, we can stop requesting as we don't have to make the remaining calls anymore.
Specifically, I need a way to determine if all 10 records returned from the current query are already in the previously stored records, and stop the iteration if they are. Since there is no primary key or index of any kind, I think a good way to do this would be to compare the row values.
A pseudo-workflow would look something like:
Run the code once to retrieve and store some records.
Run the code again.
Upon retrieval of the current response, compare the records in the response with the records previously saved in the table:
If the response records are NOT in the table yet, add them to the top of the existing dataset and continue to the next iteration
If the response records ARE in the table, stop the iteration.
An idea I have here to mitigate a double for loop, because this would be helpful to my use case, is to:
Check how many records are in the current response (this example API will return 10 all the time, however my use case is generally between 48-52, it changes randomly, so make this dynamic)
Iterate through the currently saved table with a step size == the length of the response, and check if the current response's "chunk" of records appears in the table (so if we start at j=0in the table, and the response has 9 records in it, compare rows j:j+(9-1) from the table with the current response rows, and if they match, we can end the iteration).
This would be especially helpful for me because sometimes records are retroactively added to some previous datetimes, so to just stop iteration when only one duplicate row is found would not be enough. Being able to compare a chunk of roughly 50 rows for equality in the table, I can THEN assume that I can stop making any more requests as that could mark the end of any preceding changes.
Here's the code to get you started, this performs the iterative requesting process successfully, but need to work in the "checking for duplicate row chunks" logic:
let
// Define some variables
_current_page = 0,
_base_url = "https://api.instantwebtools.net/v1/passenger?page=",
_end_url = "&size=10",
_start_url = _base_url & Number.ToText(_current_page) & _end_url,
_iterations = 5,
// Define a function to retrieve each page's response from the API
FnGetOnePage = (input_url) as record =>
let
// Random sleep function re: API limits
sleep = Function.InvokeAfter( () => 2 + 2, #duration(0,0,0,1)),
// Get the JSON response of the current page
Source = Json.Document(Web.Contents(input_url)),
// Return the "data"
res = [Data=Source]
in
res,
// Iterate through the _iterations, gathering each page's data
GeneratedList = List.Generate(
()=>[i=0, res = FnGetOnePage(_start_url)],
each [i]<_iterations and [res][Data]<>null,
each [i=[i]+1, res = FnGetOnePage(_base_url & Number.ToText(i) & _end_url)],
each [res][Data]),
// Unpack the data
#"Converted to Table" = Table.FromRecords(GeneratedList),
#"Expanded data" = Table.ExpandListColumn(#"Converted to Table", "data"),
#"Expanded data1" = Table.ExpandRecordColumn(#"Expanded data", "data", {"_id", "name", "trips", "airline", "__v"}, {"data._id", "data.name", "data.trips", "data.airline", "data.__v"}),
#"Expanded data.airline" = Table.ExpandListColumn(#"Expanded data1", "data.airline"),
#"Expanded data.airline1" = Table.ExpandRecordColumn(#"Expanded data.airline", "data.airline", {"id", "name", "country", "logo", "slogan", "head_quaters", "website", "established"}, {"data.airline.id", "data.airline.name", "data.airline.country", "data.airline.logo", "data.airline.slogan", "data.airline.head_quaters", "data.airline.website", "data.airline.established"}),
#"Changed Type" = Table.TransformColumnTypes(#"Expanded data.airline1",{{"totalPassengers", Int64.Type}, {"totalPages", Int64.Type}, {"data._id", type text}, {"data.name", type text}, {"data.trips", Int64.Type}, {"data.airline.id", Int64.Type}, {"data.airline.name", type text}, {"data.airline.country", type text}, {"data.airline.logo", type text}, {"data.airline.slogan", type text}, {"data.airline.head_quaters", type text}, {"data.airline.website", type text}, {"data.airline.established", Int64.Type}, {"data.__v", Int64.Type}})
in
#"Changed Type"

Related

Create idex with repeated data

I have data from an external source that is downloaded in csv format. This data shows the interactions from several users and doesn't have an id column. The problem I'm having is that I'm not able to use index because multiple entries represent interactions and processes. The interactions would be the group of processes a specific user do and the process represents each actions taken in a specific interaction. Any user could repeat the same interaction at any time of day. The data looks likes this:
User1 has 2 processes but there were 3 interactions. How can I assign an ID for each interaction having into consideration that there might be multiple processes for a single user in the same day. I tried grouping them in Power Query but it groups the overall processes and I'm not able to distinguish the number of interactions. Is it better to do it in Dax?
Edit:
I notice that it is hard to understand what I need but I think this would be a better way to see it:
Process 2 are the steps done in an interaction. Like in the column in yellow I need to add an ID taking in to consideration where an interaction start and where it ends.

I'm not exactly sure I follow what you describe. It looks to me like user1 has 4 interactions--Processes AA, AB, BA, and BB--but you say 3.
Still, I decided to take a shot at providing an answer anyway. I started with a CSV file set up like you show.
Then brought the CSV into Power Query and, just to add a future point of reference so that you could follow the Id assignments better, I added an index column that I called startingIndex.
Then I added a custom column combining the processes that I understand actually define an interaction.
Then I grouped everything by users and Interactions into a column named allData.
Then I added a custom column to copy the column that was created from the earlier grouping, to sort the tables within it, and to add an index to each table within it. This essentially indexed each user's interaction group. (Because all of your interactions occur on the same date(s), the sorting doesn't help much. But I did it to show where you could do it if you included datetime info instead of just a date.)
Then I added a custom column to copy the column that was created earlier to add the interactions index, and to add an Id item within each table within it. I constructed each Id by combining the user, interactions, and interactionIndex for each.
Then I selected the latest column I had created (complexId) and removed all other columns.
Last, I expanded all tables without including the Interactions and Index columns. (The Index column was the index used for the interactions within the groups and no longer needed.) I included the startingIndex column just so you could see where items originally were at the start, in comparison to their final Id.

Given your new example, to create the Interaction ID you show, you only need the first two columns of the table. If not part of the original data, you can easily generate the third column (Process2))
It appears you want to increment the interaction ID whenever the Process changes
Please read the comments in the M code and explore the Applied Steps to better understand the algorithm:
M Code
let
//be sure to change table name in next row to your real table name
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{
{"User", type text},
{"Process", type text},
{"Process2", type text}
}),
//add an index column
idx = Table.AddIndexColumn(#"Changed Type", "Index", 0, 1, Int64.Type),
//Custom column returns the Index if
// the current Index is 0 (first row) or
// there has been no change in user or process comparing current/previous row
// else return null
#"Added Custom" = Table.AddColumn(idx, "Custom",
each if [Index]=0
then 0
else
if [Process] <> idx[Process]{[Index]-1} or [User] <> idx[User]{[Index]-1} then [Index]
else null),
#"Removed Columns" = Table.RemoveColumns(#"Added Custom",{"Index"}),
//Fill down the custom column
// now have same number for each interactive group
#"Filled Down" = Table.FillDown(#"Removed Columns",{"Custom"}),
//Group by the "filled down" custom column with no aggregation
#"Grouped Rows" = Table.Group(#"Filled Down", {"Custom"}, {
{"all", each _, type table [User=nullable text, Process=nullable text, Process2=nullable text, Custom=number]}
}),
//add a one-based Index column to the grouped table
#"Added Index" = Table.AddIndexColumn(#"Grouped Rows", "Interaction ID", 1, 1, Int64.Type),
#"Removed Columns1" = Table.RemoveColumns(#"Added Index",{"Custom"}),
//Re-expand the table
#"Expanded all" = Table.ExpandTableColumn(#"Removed Columns1", "all",
{"User", "Process", "Process2"}, {"User", "Process", "Process2"})
in
#"Expanded all"
Source
Results

Power Query - trying to remove a dynamic column that's created/updated when I access external data source

I have a table that's generated when I pull data from an accounting software - the example columns are months/years in the format as follows (It pulls all the way to current day, and the last month will be partial month data):
Nov_2020
Dec_2020
Jan_2021
Feb_1_10_2021 (Current month, column to remove)
... So on and so forth.
My goal I have been trying to figure out is how to use the power query editor to remove the last column (The partial month) - I tried messing around with the text length to no avail (The goal being to remove anything with text length >8, so the full months data would show but the last month shouldn't). I can't just remove based on a text filter, because if someone were to pull the data 1 year from now it would have to account for 2021/2022.
Is this possible to do in PQ? Sorry, I'm new to it so if I need to elaborate more I can.. Thanks!

You can do this with Table.SelectColumns where you use List.Select on the Table.ColumnNames.
= Table.SelectColumns(
PrevStep,
List.Select(Table.ColumnNames(PrevStep), each Text.Length(_) <= 8)
)

Although both Alexis Olson's and Justyna MK's answers are valid, there is another approach. Since it appears that you're getting data for each month in a separate column, what you will surely want to do is unpivot your data, that is transform those columns into rows. It's the only sensible way to get a good material for analysis, therefore, I would suggest to unpivot the columns first, then simply filter out rows containing the last month.
To make it dynamic, I would use unpivot other columns option - you select columns and it will transform remaining columns into row in such a way that two columns will be created - one that will contain column names in rows and the other one will contain values.
To illustrate what I mean by unpivoting, when you have data like this:
You're automatically transforming that into this:

You can try to do it through Power Query's Advanced Editor. Assign the name of the last column to LastColumn variable and then use it in the last step (Removed Columns).
let
Source = Excel.Workbook(File.Contents(Excel file path), null, true),
tblPQ_Table = Source{[Item="tblPQ",Kind="Table"]}[Data],
#"Changed Type" = Table.TransformColumnTypes(tblPQ_Table,{{"Nov_2020", Int64.Type}, {"Dec_2020", Int64.Type}, {"Jan_2021", Int64.Type}, {"Feb_1_10_2021", Int64.Type}}),
LastColumn = List.Last(Table.ColumnNames(#"Changed Type")),
#"Removed Columns" = Table.RemoveColumns(#"Changed Type",{LastColumn})
in
#"Removed Columns"

Define report in PowerBI where columns change over time

I have a HTML file that is generated daily. Over the past few years we have added a couple of columns to the HTML table in the file. What I want to do is generate some reports that trend over time based on that HTML, so I want to define a single query for a report, but get a null/default value when the column isn't present in the source.
I have a list of report dates that are available and then I can add copies of the report data to a master report. The data source however fails to load if the column isn't present in the older reports. Essentially I read a date from one HTML file as an input, then modify the fetch URL per row for the source to get the historical data.
Is it possible to generate this report without retrospectively changing the old data and adding the column that is missing? I couldn't see how to do this easily.

I've never done anything like that, but maybe something like this will help.
It checks if a column exists and if it does not then create a new column with a default value.
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMlTSUUpUitWJVjICspLALGMgK1kpNhYA", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [AA = _t, BB = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"AA", Int64.Type}, {"BB", type text}}),
#"Custom1" = if Table.HasColumns(#"Changed Type",{"MissingColumn1"}) = false
then Table.AddColumn(#"Changed Type","MissingColumn1", each "<n/d>")
else #"Changed Type"
,#"Custom2" = if Table.HasColumns(#"Custom1",{"MissingColumn2"}) = false
then Table.AddColumn(#"Custom1","MissingColumn2", each "<n/d>")
else #"Custom1"
,#"Custom3" = if Table.HasColumns(#"Custom2",{"MissingColumn3"}) = false
then Table.AddColumn(#"Custom2","MissingColumn3", each "<n/d>")
else #"Custom2"
in
#"Custom3"
If this is a suitable solution depends on how data are fetched from the source.

How to find Country lat, long based on Country Name on Power BI?

I'm trying to find the Country lat, long to visualize the Country in the world map on Power BI.
Please suggest me the procedure to find lat, long on PowerBI or any APIs available from PowerBI tool.

First, we have to use any API service to get lat-long
Create bingmapsportal account
Here I'm using Microsoft bing maps API services Go to BingMapsPortal account to SignUP account if you already don’t have
After SingUp we have login it will redirect to dashboard
Generate key
Once we reached dashboard page we have to generate key to use restful api services
Once keys is ready then refer the document to find the api to get the lat and long based on given country
We use below give url to get lat and long in xml format
http://dev.virtualearth.net/REST/v1/Locations/india?o=xml&key=AjvYaTSLr8dsu4eqeDt0OigOZ_xuTkdVMUQCDMc0gcDPm
Use virtualearth API service to get lat and long of the location
Once data is available then we have to convert that into tabular form
Create Invoke Custom function
If we need to get multiple countries' dashboard then we have to write custom invoke functions such as given below and save.
= (location) =>
let
Source = Xml.Tables(Web.Contents("http://dev.virtualearth.net/REST/v1/Locations/"&location&")?o=xml&key=AjvYaTSLr8dsu4eqeDt0OigOZ_xuTkdVMUQCDMc0gcDPmj2m57iWiwasSDZSCoNG")),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Copyright", type text}, {"BrandLogoUri", type text}, {"StatusCode", Int64.Type}, {"StatusDescription", type text}, {"AuthenticationResultCode", type text}, {"TraceId", type text}}),
ResourceSets = #"Changed Type"{0}[ResourceSets],
ResourceSet = ResourceSets{0}[ResourceSet],
#"Changed Type1" = Table.TransformColumnTypes(ResourceSet,{{"EstimatedTotal", Int64.Type}}),
Resources = #"Changed Type1"{0}[Resources],
#"Expanded Location" = Table.ExpandTableColumn(Resources, "Location", {"Name", "Point", "BoundingBox", "EntityType", "Address", "Confidence", "MatchCode", "GeocodePoint"}, {"Location.Name", "Location.Point", "Location.BoundingBox", "Location.EntityType", "Location.Address", "Location.Confidence", "Location.MatchCode", "Location.GeocodePoint"}),
#"Location Point" = #"Expanded Location"{0}[Location.Point],
#"Changed Type2" = Table.TransformColumnTypes(#"Location Point",{{"Latitude", type number}, {"Longitude", type number}})
in
#"Changed Type2"
Use lat long to visualize maps
Use that custom invoke function to get multiple lat long by creating new custom column in table
Later we have to convert embedded table data to column data
To show Country and count legend without mouse over we have created custom legend column
Using the below query
Syntax:
State Count COLUMN = 'Table'[State]&" - "&CALCULATE(SUM('Table'[Count]), ALLEXCEPT('Table', 'Table'[State]))
Once data is ready on the table then we have to drag and drop the proper value on location, legend, values.

How to compare two different table values with string contatct in PowerBI [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 5 years ago.
Improve this question
I have two different file shows in below table, one is bugtracker and another is bugtracker (2)
Now I want to compare two statuses.
If the status is different, then count it

If all you're really asking for is a True or False comparison as to whether the 'Assigned User Name' and 'Status' of one table's record equals the 'Assigned User Name' and 'Status' of the other table's matching record, then using DAX's if should work.
Assuming you've already matched and merged your "BugTracker" and "BugTracker (2)" table's records in order to get the table you have shown above, and the merged table's name is "BugTrackerMerged", you could just add a column with this DAX command:
Column = if(BugTrackerMerged[Status]=BugTrackerMerged[Status2],TRUE(),FALSE())
Note that I named the second status column 'Status2', instead of 'Status'. Both status columns cannot have the same name.
If you haven't already merged the table's records, you'll need to do that first. I find it easiest to do that with Power Query (Power BI's Edit Queries feature).
(I apologize up front if the following is too detailed. Not knowing your level of Power Query expertise, I figured I'd simplify discussion via step-by-step tutorial. It's more straightforward than it "looks".)
In order to merge the two tables ("BugTracker" and "BugTracker (2)"), you'll need a common keyfield for matching and merging. For this situation, I assume your first record in "BugTracker" should match and merge with the first record of "BugTracker (2)", your second record in "BugTracker" should match and merge with the second record of "BugTracker (2)", and so on. Therefore, just add an index to each table.
For BugTracker, in Power Query select the "BugTracker" query:
Then click the "Add Column" tab, and then "Index Column". (That will add the index to the "BugTracker" table.)
Do the same for "BugTracker (2)".
With common indexes for both "BugTracker" and "BugTracker (2)" you can match and merge the two tables. Click the "Home" tab, then the drop-down arrow beside "Merge Queries", then "Merge Queries as New".
In the window that pops up, make the selections necessary so it looks like this and click "OK":
This creates a new query, likely called "Merge". At this point, I renamed that query to "BugTrackerMerged".
If you select that new query (now named "BugTrackerMerged") and click on "Source", under "Applied Steps"...
You'll see this code in the formula bar:
= Table.NestedJoin(BugTracker,{"Index"},#"BugTracker (2)",{"Index"},"NewColumn",JoinKind.FullOuter)
In that code, change "NewColumn" to "BugTracker (2)" to rename the column that is generated. (You could rename it as a separate step if your prefer, but I thought this approach was "cleaner".
Then click the button, to the right of the "BugTracker (2)" column's title...
...to expand the tables in the column. You'll see a pop-up window like this:
Leaving the settings like shown here will expand (bring in) all the columns from the secondary table of the earlier merge. (That secondary table was "BugTracker (2)".) Using the original column name as prefix will help you keep straight which "Status" and "Assigned User Name" info comes from which table.
At this point, you have the merged info. You could go one step further here and do the True/False comparison here too as well, if you like. To do that, just add a new custom column with some code: click the "Add Column" tab, and the "Custom Column" button:
Then, in the pop-up window, add this code:
if [Status]&[Assigned User Name]=[#"BugTracker (2).Status"]&[#"BugTracker (2).Assigned User Name"] then "True" else "False"
Like this:
You'll get a table like this:
Your data has a lot of "Trues" up front. You can easily see that there are also "Falses" though, by using the column's filter button.
Here's my Power Query (M) code for my three queries:
BugTracker:
let
Source = Excel.Workbook(File.Contents("C:\Users\MARC_000\Desktop\sample\Rowdata Programming 15 July 2017 (2).xlsx"), null, true),
BugTracker_Sheet = Source{[Item="BugTracker",Kind="Sheet"]}[Data],
#"Changed Type" = Table.TransformColumnTypes(BugTracker_Sheet,{{"Column1", type text}, {"Column2", type text}}),
#"Promoted Headers" = Table.PromoteHeaders(#"Changed Type", [PromoteAllScalars=true]),
#"Added Index" = Table.AddIndexColumn(#"Promoted Headers", "Index", 0, 1)
in
#"Added Index"
BugTracker (2):
let
Source = Excel.Workbook(File.Contents("C:\Users\MARC_000\Desktop\sample\Rowdata Programming 18 July 2017.xlsx"), null, true),
BugTracker_Sheet = Source{[Item="BugTracker",Kind="Sheet"]}[Data],
#"Changed Type" = Table.TransformColumnTypes(BugTracker_Sheet,{{"Column1", type text}, {"Column2", type text}}),
#"Promoted Headers" = Table.PromoteHeaders(#"Changed Type", [PromoteAllScalars=true]),
#"Added Index" = Table.AddIndexColumn(#"Promoted Headers", "Index", 0, 1)
in
#"Added Index"
BugTrackerMerged:
let
Source = Table.NestedJoin(BugTracker,{"Index"},#"BugTracker (2)",{"Index"},"BugTracker (2)",JoinKind.FullOuter),
#"Expanded BugTracker (2)" = Table.ExpandTableColumn(Source, "BugTracker (2)", {"Status", "Assigned User Name", "Index"}, {"BugTracker (2).Status", "BugTracker (2).Assigned User Name", "BugTracker (2).Index"}),
#"Added Custom" = Table.AddColumn(#"Expanded BugTracker (2)", "Custom", each if [Status]&[Assigned User Name]=[#"BugTracker (2).Status"]&[#"BugTracker (2).Assigned User Name"] then "True" else "False")
in
#"Added Custom"

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js