Combine or Simplify Web Service Queries in Power BI

Combine or Simplify Web Service Queries in Power BI - powerbi

I was wondering if there was a flexible way to setup multiple web services (API) queries in Power BI. The web service I am using is only capable of getting me one day of data for one location per query and I need daily data for 10 locations. Which means on a standard 31 day month I would need to setup 310 queries. The data I am interested in are the Final LMPs and the website I am pulling from is https://webservices.iso-ne.com/docs/v1.1/. An example of a working query in PowerBI that is grabbing Final LMP data for just 02/01/2020 for location 4152 is:
https://webservices.iso-ne.com/api/v1.1/hourlylmp/rt/final/day/20200201/location/4152.xml

For such case I would setup the following items:
A data source (e.g. manually via 'Enter Data') with 10 individual locations
A data source with the required date range (e.g. something like that)
A data source that combines the the first two (cross join e.g. an example)
Add a column (to source 3) that calls a 'custom function' that for each combined record (in step 3) makes an API request to retrieve the data. The function should take two parameters (location and the day) as the input
Thanks,
Mau

You'll need to create 10 queries, for each of the locations, you can then use a variable to drive the date of extraction.
In the advanced editor after the 'let' add
var_date = Date.ToText(Date.From(DateTime.FixedLocalNow()), "yyyyMMdd")
Which today will return 20200221. When the query runs it will evaluate it and run it for the right day.
https://webservices.iso-ne.com/api/v1.1/hourlylmp/rt/final/day/20200201/location/4152.xml
Change to in your source to something like:
"https://webservices.iso-ne.com/api/v1.1/hourlylmp/rt/final/day/" & var_date & "/location/4152.xml"
So it should be like this:
If you need to offset the date try DateTime.FixedLocalNow() -1 to get the previous day.

Related

Groupby existing attribute present in json string line in apache beam java

I am reading json files from GCS and I have to load data into different BigQuery tables. These file may have multiple records for same customer with different timestamp. I have to pick latest among them for each customer. I am planning to achieve as below
Read files
Group by customer id
Apply DoFn to compare timestamp of records in each group and have only latest one from them
Flat it, convert to table row insert into BQ.
But I am unable to proceed with step 1. I see GroupByKey.create() but unable to make it use customer id as key.
I am implementing using JAVA. Any suggestions would be of great help. Thank you.

Before you GroupByKey you need to have your dataset in key-value pairs. It would be good if you had shown some of your code, but without knowing much, you'd do the following:
PCollection<JsonObject> objects = p.apply(FileIO.read(....)).apply(FormatData...)
// Once we have the data in JsonObjects, we key by customer ID:
PCollection<KV<String, Iterable<JsonObject>>> groupedData =
objects.apply(MapElements.via(elm -> KV.of(elm.getString("customerId"), elm)))
.apply(GroupByKey.create())
Once that's done, you can check timestamps and discard all bot the most recent as you were thinking.
Note that you will need to set coders, etc - if you get stuck with that we can iterate.
As a hint / tip, you can consider this example of a Json Coder.

Create new metric in Google Data Studio to Count Only Branded KW Impression

I'm trying to get a count of my branded/unbranded impressions in Google Data Studio and running into a snag when I try to create a new field:
Case
when REGEXP_MATCH(Query,'will enter branded KWs here')
then count(Impressions)
End
This, of course, doesn't work, but wondering if there is any workaround to get what I'm trying to. Any ideas?

It can be achieved by Reaggregation (Blending a Data Source with itself); The Sample Google Search Console Data Source has disabled field editing (thus unable to test the proposed solution), however, adding a link to a Similar Solution for a Google Analytics Post, which includes a GIF of the process.
0) Disaggregation
The metrics in Google Analytics, Ads, etc are currently pre aggregated (the reason it's coloured blue), thus it needs to be disaggregated in order to perform the calculation required. There are two ways: One is using a Data Extract and the other is a Data Blend (used in this example: #1 below)
1) Blended Data Source (Reaggragation)
Create a Blended Data Source by using the following fields:
- Join Key #1: Date
- Join Key #2: Query
- Metric: Impressions
2) Time Series Chart
- Dimension: Date
- Metric: Impressions_Branded
3) Impressions_Branded (Calculated Field)
CASE
WHEN REGEXP_MATCH(Query, 'will enter branded KWs here') THEN Impressions
END

Kibana: can I store "Time" as a variable and run a consecutive search?

I want to automate a few search in one, here are the steps:
Search in Kibana for this ID:"b2c729b5-6440-4829-8562-abd81991e2a0" which will return me a bunch of logs. Of these logs I need to take the first and the last timestamp:
I now would like to store these two data FROM: September 3rd 2019, 21:28:22.155, TO: September 3rd 2019, 21:28:23.524 in 2 variables
Run a second search in Kibana for the word "fail" in between these two variable of time
How to automate the whole process without need of copy/paste and running a second query?
EDIT:
SHORT STORY LONG: I work in a company that produce a software for autonomous vehicles.
SCENARIO: A booking is rejected and we need to understand why.
WHERE IS THE PROBLE: I need to monitor just a few seconds of logs on 3 different machines. Each log is completely separated, there is no relation between the logs so I cannot write a query in discover, I need to run 3 separated queries.
EXAMPLE:
A booking was rejected, so I open Chrome and I search on "elk-prod.myhost.com" for the BookingID:"b2c729b5-6440-4829-8562-abd81991e2a0" and I have a dozen of logs returned during a range of 2 seconds (FROM: September 3rd 2019, 21:28:22.155, TO: September 3rd 2019, 21:28:23.524).
Now I need to know what was happening on the car so I open a new Chrome tab and I search on "elk-prod.myhost.com" for the CarID: "Tesla-45-OU" on the time range FROM: September 3rd 2019, 21:28:22.155, TO: September 3rd 2019, 21:28:23.524
Now I need to know why the server which calculate the matching rejected the booking so I open a new Chrome tab and I search for the word CalculationMatrix always on the time range FROM: September 3rd 2019, 21:28:22.155, TO: September 3rd 2019, 21:28:23.524
CONCLUSION: I want to stop to keep opening Chrome tabs by hand and automate the whole thing. I have no idea around what time the book was made so I first need to search for the BookingID "b2c729b5-6440-4829-8562-abd81991e2a0", then store the timestamp of first and last log and run a second and third query based on those timestamps.
There is no relation between the 3 logs I search so there is no way to filter from the Discover, I need to automate 3 different query.

Here is how I would do it. First of all, from what I understand, you have three different indexes:
one for "bookings"
one for "cars"
one for "matchings"
First, in Discover, I would create three Saved Searches, one per index pattern. Then in Visualize, I would create a Vertical bar chart on the bookings saved search (Bucket X-Axis by date_histogram on the timestamp field, leave the rest as is). You'll get a nice histogram of all your booking events bucketed by time.
Finally, I would create a dashboard and add the vertical bar chart + those three saved searches inside it.
When done, the way I would search according to the process you've described above is as follows:
Search for the booking ID b2c729b5-6440-4829-8562-abd81991e2a0 in the top filter bar. In the bar chart histogram (bookings), you will see all documents related to the selected booking. On that chart, you can select the exact period from when the very first booking document happened to the very last. This will adapt the main time picker at the top and the start/end time will be "remembered" by Kibana
Remove the booking ID from the top filter (since we now know the time range and Kibana stores it). Search for Tesla-45-OU in the top filter bar. The bar histogram + the booking saved search + the matchings saved search will be empty, but you'll have data inside the second list, the one for cars. Find whatever you need to find in there and go to the next step.
Remove the car ID from the top filter and search for ComputationMatrix. Now the third saved search is going to show you whatever documents you need to see within that time range.
I'm lacking realistic data to try this out, but I definitely think this is possible as I've laid out above, probably with some adaptations.

Kibana does work like this (any order is ok):
Select time filter: https://www.elastic.co/guide/en/kibana/current/set-time-filter.html
Add additional criteria for search like for example field s is b2c729b5-6440-4829-8562-abd81991e2a0.
Add aditional criteria for search like for example field x is Fail.
Additionaly you can view surrounding documents https://www.elastic.co/guide/en/kibana/current/document-context.html#document-context
This is how Kibana works.
You can prepare some filters beforehands, save them and then use them if you want to automate the process of discovering somehow.
You can do that in Discover tab in Kibana using New/Save/Open options.
Edit:
I do not think you can achieve what you need in Kibana. As I mentioned earlier one option is to change the data that is comming to Elasticsearch so you can search for it via discover in Kibana. Another option could be builiding for example Java application, that is using Elasticsearch - then you can write algorithm that returns the data that you want. But i think it's a big overhead and I recommend checking the data first.
Edit: To clarify - you can create external Java let's say SpringBoot application that uses Elasticsearch - all the data that you need is inside it.
But in this option you will not use Kibana at all.
You can export the result to csv or what you want in the code.
SpringBoot application can ask ElasticSearch for whatever it needs, then it would be easy to store these time variables inside of Java code.
EDIT: After OP edited question to change it dramatically:
#FrancescoMantovani Well the edited version is very different from where you first posted here How to automate the whole process without need of copy/paste and running a second query? and search for word fail in a single shot. In accepted answer you are still using a three filters one at a time so it is not one search, but three.
What's more if you would use one index, and send data from multiple hosts via filebeat you don't even to have to create this dashboard to do that. Then you can you can select the exact period from when the very first document happened to the very last regarding filter and then remove it and add another filter that you need - it's simple as that. Before you were writing about one query,
How to automate the whole process without need of copy/paste and
running a second query?
not three. And you don't need to open new tab in Chrome each time you want to change filter just organize the data by for example using filebeat as mentioned before.
There is no relation between the 3 logs
From what you wrote the realation exist and it is time.
If the data is in for example three diferent indicies (cause documents don't have much similiar data) you can do it like that:
You change them easily in dicover see:
You can go to discover select index 1 search, select time range that you need, when you change index the time range is still the one you selected, you only need to change filter - you will get what you need.

Pentaho PDI get SQL SUM() with conditions

I'm using Pentaho PDI 7.1. I'm trying to convert data from Mysql to Mysql changing the structure of data.
I'm reading the source table (customers) and for each row I've to run another query to calculate the balance.
I was trying to use Database value lookup to accomplish it but maybe is not the best way.
I've to run a query like this to get the balance:
SELECT
SUM(
CASE WHEN direzione='ENTRATA' THEN -importo ELSE +importo END
)
FROM Movimento WHERE contoFidelizzato_id = ?
I should set the parameter taking it from the previous step. Some advice?

The Database lookup value may be a good idea, especially if you are used to database reasoning, but it may result in many queries which may not be the most efficient.
A more PDI-ish style would be to make the query like:
SELECT contoFidelizzato_id
, SUM(CASE WHEN direzione='ENTRATA' THEN -importo ELSE +importo END)
FROM Movimento
GROUP BY contoFidelizzato_id
and use it as the info source of a Lookup Stream Step, like this:
An even more PDI-ish style would be to divert the source table (customer) in two flows : one in which you keep the source rows, and one that you group by contoFidelizzato_id. Of course, you need a formula, or a Javascript, or to put a formula in the SQL of the Table input to change the sign when needed.
Test to know which strategy is better in your case. You'll soon discover that the PDI is very good at handling large data.

Fetching data from large BigQuery table in python

What I have is a BigQuery table(>5mil rows).
I need to fetch this data in batches and process it inside AppEngine, python.
The only way to fetch from a table that I know is to run SELECT query on this table and then iterate the result using tokens fetch_data returns.
It looks like this:
query = u"""\
SELECT url FROM %s
""" % (query_table)
query_job = client.run_async_query(str(uuid.uuid4()), query)
query_job.begin()
wait_for_job(query_job, 1)
query_results = query_job.results()
rows, total_rows, next_token = query_results.fetch_data(max_results=per_page, page_token=page_token)
This works on smaller tables, but on larger ones like mine it asks to allow large requests and specify target table. But this makes no sense to me. For to simply fetch data from a table I have to copy it to another table?

What you are running into is described in this documentation. In summary, apart from the limit on how much data can be fetched at a time, there is a point where your results become "large results." This is when your results are more than 128MB compressed as described here. When your results are classified as large, you can only store the result of a query in a table in Big Query.
Unfortunately I'm not sure there's a nice way to do what you want without reducing how many rows you are retrieving at once. What you'll likely need to do is explore the exporting data documentation for big query.

You should use tabledata.list API for fetching data from table.
Using parameters (startIndex or pageToken) and maxResults you can control size of page you fetch.

I think this is exactly what you need link, as far as I understood you can't get a large result of a query but you can get the entire table data to your app no mater how big it is, thats why you need to put the large result in a table and then get this table data to your app and do whatever you want with it
good luck :)

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js