AWS IOT ANALYTICS - amazon-web-services

am trying to fetch data from iot analytics(AWS) from my java sdk, I have created channels and pipeline and data are in the datasets
does anyone have idea about aws iot analytics data fetch mechanism?

AWS IoT Analytics distinguishes between raw data stored in channels, processed data stored in datastores and queried data stored in data sets.
As part of creating the dataset with CreateDatasetContent [1], you'll write your SQL query which runs against your datastore and produces the result set stored in your dataset. This guy can either be run ad-hoc or periodically every x hours. After you created the dataset successfully, you can get the query result via the GetDatasetContent API [2].
Please note that the CreateDatasetContent API is async, meaning you'll need to wait until the query ran successfully. By default, GetDatasetContent will always return you the latest successful result which might be empty directly after creating the dataset since the query hasn't finished yet. In order to get the current state of your query, you can pass the optional version=$LATEST parameter to the GetDatasetContent call. This will give you more information about the currently running query or whether it failed to execute.
Hope this helps
[1] https://docs.aws.amazon.com/iotanalytics/latest/APIReference/API_CreateDatasetContent.html
[2] https://docs.aws.amazon.com/iotanalytics/latest/APIReference/API_GetDatasetContent.html

Related

Problem adding duplicate object in Google Storage using PutGCSObject processor in Nifi

I am using Nifi to send data from Pub/Sub queue to Cloud storage. I'm using the ConsumeGCPubSub processor to fetch data from the queue and the PutGCSObject processor to add Cloud Storage in Nifi. But the PutGCSObject processor is sending duplicate data in Cloud Storage.
I also see that this data has the same MD5 Hash code in its Cloud Storage records. What could be causing this and how can I fix it?
I double checked:
pub/sub messages is not duplicated.
When I send 30 piece of data, there are come exactly 30 pieces in Nifi
I checked my google storage have different data. But there was not..
When I examine it, the number of data coming from the queue and exiting the PutGCSObject processor as success is the same, but I see that the data is written over and over again. When I looked into NiFi Data Provenance, I found that there are multiple data with the same FlowFile UUID.
You should have connected the success criterion on the terminate side to the processor.

is there any way to read multiple data alerts in power bi, using flow, or some other way?

is there a way to read data alerts in power bi using some sort of python code or something else? i want to be able to gather multiple data alerts for a specified account, then integrate them into an adaptive card.
flow doesn't seem to be able to do this for me, using flow i would need to create multiple flow apps to read one at a time and then somehow write the data somewhere that i can read later. this creates a availability problem for me, since i wouldn't want to be creating a new flow app every time i have a new powerbi alert.
Thanks for any suggestions.
You can read multiple data alerts in one logic app/power-automate if you use "When a HTTP request is received" as the trigger of the flow. You can specify the required data for multiple data alerts as the request body of the request.
For example, you set the "When a HTTP request is received" trigger as POST method. And then define the request body json schema for the data you want to input.
Then you can use the data which input in the request body in your python code to gather multiple data alerts.

Is there a way to tell when AWS Amplify Datastore is initialized or ready to be queried?

I have an application that needs to update the UI with the results of an Amplify Datastore query. I am making the query as soon as the component mounts/renders, but the results of the query are empty even though I know there is available data. If I add a timeout of 1 second or greater before making the query, then the query returns the expected data. My hunch is that this is because the query is returning an empty set of data before the response from the delta sync table, which shows there is data to be fetched, is returned.
Is there any type of event provided by Datastore that would allow me to wait until the data store is initialized or has data to query before making the query?
I understand that I could use the .observe functionality of datastore for a similar effect, but this is currently not an option.
First, if you do not use the Datastore start method then sync from the backend starts when the first query is submitted. Queries are run against the local store so data won't be there yet.
Second, Datastore publishes events on the amplify hub so that you can monitor changes, such as a set of data being synced, Datastore being ready and even Datastore being ready and all data synced locally.
See the documentation on Datastore.start
and the documentation for Datastore events for more information.

Optimize data load from Azure Cosmos DB to Power BI

Currently we have a problem with loading data when updating the report data with respect to the DB, since it has too many records and it takes forever to load all the data. The issue is how can I load only the data from the last year to avoid taking so long to load everything. As I see, trying to connect to the COSMO DB in the box allows me to place an SQL query, but I don't know how to do it in this type of non-relational database.
Example
Power BI has an incremental refresh feature. You should be able to refresh the current year only.
If that still doesn’t meet expectations I would look at a preview feature called Azure Synapse Link which automatically pulls all Cosmos DB updates out into analytical storage you can query much faster in Azure Synapse Analytics in order to refresh Power BI faster.
Depending on the volume of the data you will hit a number of issues. First is you may exceed your RU limit, slowing down the extraction of the data from CosmosDB. The second issue will be the transforming of the data from JSON format to a structured format.
I would try to write a query to specify the fields and items that you need. That will reduce the time of processing and getting the data.
For SQL queries it will be some thing like
SELECT * FROM c WHERE c.partitionEntity = 'guid'
For more information on the CosmosDB SQL API syntax please see here to get you started.
You can use the query window in Azure to run the SQL commands, or Azure Storage Explorer to test the query, then move it to Power BI.
What is highly recommended is to extract the data into a place where is can be transformed into a strcutured format like a table or csv file.
For example use Azure Databricks to extract, then turn the JSON format into a table formatted object.
You do have the option of using running Databricks notebook queries in CosmosDB, or Azure DataBricks in its own instance. One other option would to use change feed to send the data and an Azure Function to send and shred the data to Blob Storage and query it from there, using Power BI, DataBricks, Azure SQL Database etc.
In the Source of your Query, you can make a select based on the CosmosDB _ts system property, like:
Query ="SELECT * FROM XYZ AS t WHERE t._ts > 1609455599"
In this case, 1609455599 is the timestamp which corresponds to 31.12.2020, 23:59:59. So, only data from 2021 will be selected.

API Gateway generating 11 sql queries per second on REG_LOG

We have sysdig running on our WSO2 API gateway machine and we notice that it fires a large number of SQL queries to the database for a minute, than waits a minute and repeats.
The query looks like this:
Every minute it goes wild, waits for a minute and goes wild again with a request of the following format:
SELECT REG_PATH, REG_USER_ID, REG_LOGGED_TIME, REG_ACTION, REG_ACTION_DATA
FROM REG_LOG
WHERE REG_LOGGED_TIME>'2016-02-29 09:57:54'
AND REG_LOGGED_TIME<'2016-03-02 11:43:59.959' AND REG_TENANT_ID=-1234
There is no load on the server. What is causing this? What can we do to avoid this?
screen shot sysdig api gateway process
This particular query is the result of the registry indexing task that runs in the background. The REG_LOG table is being queried periodically to retrieve the latest registry actions. The indexing task cannot be stopped. However, one can configure the frequency of the indexing task through the following parameter that is in the registry.xml. See [1] for more information.
indexingFrequencyInSeconds
If this table is filled up, one can clean the data using a simple SQL query. However, when deleting the records, one must be careful not to delete all the data. The latest records of each resource path should be left in the REG_LOG table since reindexing of data requires at least one reference of each resource path.
Also, if required, before clearing up the REG_LOG table, you can take a dump of the data in case you do not want to loose old records. Hope this answer provides information you require.
[1] - https://docs.wso2.com/display/Governance510/Configuration+for+Indexing