Remove old statistics from MySql in WSO2 BAM - wso2

I am running the WSO2 API Manager which is then posting it's stats to BAM. However the usage stats per subscriber per API call is taking ages and eating up large amount of CPU. I am guessing since unlike most calls it is not date based the data is to large and I was wondering as I use MySql as the DB for BAM if there is a way to wipe data from it. I have found a few ways to clear the Casandra on BAM but nothing about also clearing the statistics from MySql
Thanks

If you don't need your old data you can simply drop the column families related to apim_stat in cassandra.
And then you can delete data from apim_stat tables
API_DESTINATION_SUMMARY
API_FAULT_SUMMARY
API_REQUEST_SUMMARY
API_RESPONSE_SUMMARY
API_Resource_USAGE_SUMMARY
API_THROTTLED_OUT_SUMMARY
API_VERSION_USAGE_SUMMARY

Related

Power BI Embedded Approach for 100s of SQL Targets

I'm trying to find the best approach to delivering a BI solution to 400+ customers which each have their own database.
I've got PowerBI Embedded working using service principal licensing and I have the PowerBI service connected to my data through the On Premise Data Gateway.
I've build my first report pointing to 1 of the customer databases. Which works lovely.
What I want to do next, when embedding the report, is to tell PowerBI, for this session, to get the database from a different database.
I'm struggling to find somewhere where this is explained, or to understand if this is even possible.
I'm trying to avoid creating 400+ WorkSpaces or 400+ Data Sets.
If someone could point me in the right direction, it would be appreciated.
You can configure the report to use parameters and these parameters can be used to configure the source for your dataset:
https://www.phdata.io/blog/how-to-parameterize-data-sources-power-bi/
These parameters can be set by the app hosting the embedded report:
https://learn.microsoft.com/en-us/rest/api/power-bi/datasets/update-parameters-in-group
Because the app is setting the parameter, each user will only see their own data. Since this will be a live connection, you would need to think about how the underlying server can support the workload.
An alternative solution would be to consolidate the customer databases into a single database (just the relevant tables) and use row level security to restrict access for each customer. The advantage to this design is that you take the burden off of the underlying SQL instance and push it into a PBI dataset that is made to handle huge datasets with sub-second response times.
More on that here: https://learn.microsoft.com/en-us/power-bi/enterprise/service-admin-rls

ODBC Equivalent of DBMS_ALERT in Oracle

Is there anything (system procedure,function or other) in SQL Server that will provide the functionality of DBMS_ALERT package of ORACLE (and DBMS_PIPE respectively)?
I work in a plant and I'm using an extension-product of SQL-Server called InSQL Server by Wonderware which is specialized in gothering data from plant controllers and HumanMachineInterface(SCADA) software.
This system can record events happening in the plant (like a high-temperature alarm, for example). It stores sensor values in extension tables of SQL Sever, and other less dense information in normal SQL Server tables.
I want to be able to alert some applications running on operator PCs that an event has been recorded in the database.
An after insert trigger in the events table seems to be a good place to put something equivalent to DBMS_ALERT (if it exists), to wake up other applications that are waiting for the specific alert and have the operators type in some data.
In other words - I want to be able to notify other processes (that have connection to SQL Server) that something has happened in the database.
All Wonderware (InSQL but now called Aveva) Historian data is stored in the history blocks EXCEPT for the actual tag storage configuration and dedicated event data. The time series data for analog, discrete and strings is NOT in SQL tables at all - unless someone is doing custom configuration to. create tables of their own.
Where are you wanting these notifications to come up? Even though the historical data is NOT stored in SQL tables, Wonderware has extensive documentation on how to use SQL queries to appropriately retrieve data (check for whatever condition you are looking for)
You can easily build a stored procedure and configure it for a maintenance plan.
But are you just trying to alarm (provide notification) on the scada itself?
Or are you truly utilizing historical data (looking for a data trend - average, etc.)?
Or trying to send the notification to non-scada interfaces?
Depending on your specific answer, the scada itself should probably be able to do it.
But there is software that already does this type of thing Win-911, SeQent, Scadatec are a couple in the OT space. But also things like Hip Link or even DeskAlert which can connect to any SQL via it's own API.
So where does the info need to go (email, text, phone, desktop app...) and what is the real source of the data>

Optimize data load from Azure Cosmos DB to Power BI

Currently we have a problem with loading data when updating the report data with respect to the DB, since it has too many records and it takes forever to load all the data. The issue is how can I load only the data from the last year to avoid taking so long to load everything. As I see, trying to connect to the COSMO DB in the box allows me to place an SQL query, but I don't know how to do it in this type of non-relational database.
Example
Power BI has an incremental refresh feature. You should be able to refresh the current year only.
If that still doesn’t meet expectations I would look at a preview feature called Azure Synapse Link which automatically pulls all Cosmos DB updates out into analytical storage you can query much faster in Azure Synapse Analytics in order to refresh Power BI faster.
Depending on the volume of the data you will hit a number of issues. First is you may exceed your RU limit, slowing down the extraction of the data from CosmosDB. The second issue will be the transforming of the data from JSON format to a structured format.
I would try to write a query to specify the fields and items that you need. That will reduce the time of processing and getting the data.
For SQL queries it will be some thing like
SELECT * FROM c WHERE c.partitionEntity = 'guid'
For more information on the CosmosDB SQL API syntax please see here to get you started.
You can use the query window in Azure to run the SQL commands, or Azure Storage Explorer to test the query, then move it to Power BI.
What is highly recommended is to extract the data into a place where is can be transformed into a strcutured format like a table or csv file.
For example use Azure Databricks to extract, then turn the JSON format into a table formatted object.
You do have the option of using running Databricks notebook queries in CosmosDB, or Azure DataBricks in its own instance. One other option would to use change feed to send the data and an Azure Function to send and shred the data to Blob Storage and query it from there, using Power BI, DataBricks, Azure SQL Database etc.
In the Source of your Query, you can make a select based on the CosmosDB _ts system property, like:
Query ="SELECT * FROM XYZ AS t WHERE t._ts > 1609455599"
In this case, 1609455599 is the timestamp which corresponds to 31.12.2020, 23:59:59. So, only data from 2021 will be selected.

Logstash and looking up additional data from a relational table?

I have mobile app log data being posted daily (eventually it will be a data stream). I am looking at different solutions for processing this log data and providing analytics. I am considering using logstash/elasticsearch/kibana combination, but we have additional data on our users stored in a redshift database. So in addition to the mobile data, I would like to pull in additional data from redshift about the user at the time of interaction with mobile app.
However, I've read in some places that doing an actual database query through logstash isn't feasible, but you can use a dictionary file to do a lookup of each user.
I have two questions regarding this approach
Is there a limit to have large this lookup file can be? Mine would be < 500K records so I'd imagine it would be fine?
Can the process of making the the lookup file from redshift tables be fully automated (ideally though aws services) - i.e. each night the lookup table is refreshed and posted to logstash, and then used for breakouts in Kibana
The way we're currently doing it is processing a daily jason file with a lambda function, posting it to s3 and then reading it into a redshift table. This data is then processed into sessions and joined up with other tables to generate the final dataset to be used for visualization. This is currently done in Tableau but we are exploring other options (such as quicksight, or possibly the ELK stack)
Just trying to figure out what solution is going to be scalable to clickstream data and will be the most useful down the line.
Thanks!
logstash 7 has a jdbc_streaming filter plugin for dynamically adding stuff to your events, as well as the jdbc_static filter for static stuff.
As you found, you can also use the translate filter. The man page says they've tested "very large" datasets up to 100,000 entries, so your dataset may require some testing. The good part about this filter is that it will reload the data when it detects a change, so you can publish the data on your own schedule (e.g. cron) without restarting logstash. Be on the lookout for events that don't get the translated value, which might be a sign that your publishing frequency should be updated.

Cronjob: Web Service query

I have a cronjob that runs every hours and parse 150,000+ records. Each record is summarized individually in a MySQL tables. I use two web services to retrieve the user information.
User demographic (ip, country, city etc.)
Phone information (if landline or cell phone and if cell phone what is the carrier)
Every time I get 1 record I check if I have information and if not I call these web services. After tracing my code I found out both of these calls takes 2 to 4 seconds and it makes my cronjob very slow and I can't compile statistics on time.
Is there a way to make these web service faster?
Thanks
simple:
get the data locally and use mellissa data:
for ip: http://w10.melissadata.com/dqt/websmart/ip-locator.htm
for phone: http://www.melissadata.com/fonedata.html
you can also cache them using memcache or APC which will make it faster since he does not have to request the data from the api or database.
A couple of ideas... if the same users are returning, caching the data in another table would be very helpful... you would only look it up once and have it for returning users. Upon re-reading the question it looks like you are doing that.
Another option would be to spawn new threads when you need to do the look-ups. This could be a new thread for each request, or if this is not feasible you could have n service threads ready to do the look-ups and update the results.