I'm able to configure WSO2 BAM data source WSO2_CARBON_DB to work with Oracle DB, but I'm not able to do the same with other data sources.
Is it possible to disable Cassandra and make WSO2 BAM works only with Oracle DB, including all stored data (configuration / input data / analyzed data and so on)?
For the stat store, we use cassandra. It gives high read/write performance than RDBMS. You can not configure RDBMS instead of Cassandra.
For other DB related operations you can use any RDBMS( mssql/mysql..).
Eg: registry/user store
Related
I have one single Django web application deployed on Azure with a transactional SQL DB i.e. PostgreSQL.
Within the Django application, every day this historical data needs to be accessed (eg: to show the pattern over a period of years, months etc.) from the ADLS.
However, the ADLS will only return a single/multiple Files, and my application needs an intermediate such as Azure Synapse to convert this unstructured data into Structured DB in order to perform Queries on this historical data to show it within the web application.
Question. A) Would Azure Synapse fulfil this 'unstructured to structured conversion' requirement, or is there another Azure alternative.
Question. B) Since Django is inherently tied to ORM (Object Relation Mapping), would there be any compatibility issues between the web app's PostgreSQL and Azure Synapse (i.e. ArrayField, JSONField etc.)
This entire exercise is being undertaken in order to store older historical data in a large repository and also access/query data from that ADLS repository whenever required.
Please guide what Azure alternatives may work in this case.
You need to break down your problem. For each piece you have multiple choices with different cost implications and complexity of implementation and amount of control/flexibility you get.
Question. A) Would Azure Synapse fulfil this 'unstructured to structured conversion' requirement, or is there another Azure alternative.
Synapse Serverless SQL Pool lets you query JSON files from Datalake without a physical DB. It's only compute no storage.
This is for infrequent access to large datasets, because every query goes and parses the data in Datalake.
If you want you can also COPY INTO some_table all the data from files and then perform queries more efficiently on some_table (which is stored in DB, with indices, partitions, ...) using a dedicated Synapse SQL Pool.
E.g. following JSON
{
"_id":"ahokw88",
"type":"Book",
"title":"The AWK Programming Language",
"year":"1988",
"publisher":"Addison-Wesley",
"authors":[
"Alfred V. Aho",
"Brian W. Kernighan",
"Peter J. Weinberger"
],
"source":"DBLP"
}
Can be queried with following SQL:
SELECT
JSON_VALUE(jsonContent, '$.title') AS title
, JSON_VALUE(jsonContent, '$.publisher') as publisher
, jsonContent
FROM OPENROWSET
(
BULK 'json/books/*.json',
DATA_SOURCE = 'SqlOnDemandDemo'
, FORMAT='CSV'
, FIELDTERMINATOR ='0x0b'
, FIELDQUOTE = '0x0b'
, ROWTERMINATOR = '0x0b'
)
WITH
( jsonContent varchar(8000) ) AS [r]
WHERE
JSON_VALUE(jsonContent, '$.title') = 'Probabilistic and Statistical Methods in Cryptology, An Introduction by Selected Topics'
Question. B) Since Django is inherently tied to ORM (Object Relation Mapping), would there be any compatibility issues between the web app's PostgreSQL and Azure Synapse (i.e. ArrayField, JSONField etc.)
Synapse offers good old JDBC drivers, so as long as your ORM layer can use a JDBC source you should be good to go. Remember that underlying data source (Synapse) is meant for MPP and not transactional processing. So inserting 1000 rows in a for loop using INSERT INTO... would take 1000 seconds, but querying 10 million rows using a SELECT ... statement would probably take less than 100. So know what you do with it.
Does Synapse have to be configured with both the App DB and ADLS in a pipeline system through Azure Data Factory? And is this achievable for a PostgreSQL DB? Since I could not Azure docs that talk specifically about PostgreSQL DB <---> ADLS connections. – Simran 14 hours ago
You're mixing things here. You can NOT use Synapse to give a single view of data across two data sources: 1) PostgreSQL, 2) ADLS.
Only source for Serverless is ADLS.
You can do this using Data Factory, which would allow you to create two data sources (ADLS and PostgreSQL), read from them, merge them to produce a new data set, write the output to some output data sink like PostgreSQL. Your Django code then would be able to read this from PostgreSQL as usual.
Understand the cost and performance implications of each piece before you make a decision:
Serverless SQL Pool
Dedicated SQL pool
Data Factory
I have configured API Manager 2.0.0 & API Manager Analytics Pack to use MySQL databases.
For each server, there exists a WSO2AM_STATS_DB. I have given these differing names on my MySQL server. I have also pointed my datasources in master-datasources.xml(for APIM) & stats-datasources.xml(for Analytics) to the relevant databases.
I couldn't find any relevant schema(dbscripts) for these databases in their respective packs.
On running, the Analytics database is populated but the APIM database isn't and throws an exception. The Analytics database not only gets the schema but also the invocation details of my API.
I am unable to get the stats on my dashboard though.
Previously, I (unwittingly) configured the h2-repository stats database to be the same for both servers (due to the folder structure) and was able to get all the statistics on my dashboard in the publisher.
Other configurations I have tried :
On the MySQL Server, pointed it to the same database (the Analytics one with the schema) but with no results on my dashboard (after waiting for a while).
Both datasources (WSO2AM_STATS_DB) in 2 servers should be pointed to the same database. There are no database scripts for this. Tables are created automatically.
By default in both servers, Stats DB path comes like this. (note ../ part)
<url>jdbc:h2:../tmpStatDB/WSO2AM_STATS_DB;DB_CLOSE_ON_EXIT=FALSE;LOCK_TIMEOUT=60000;AUTO_SERVER=TRUE</url>
So if you extract both servers to the same directory as mentioned in this doc, both datasources will be pointing to the same database (inside tmpStatDB) like this.
/parent_dir
|__wso2am-2.0.0/
|__wso2am-analytics-2.0.0/
|__tmpStatDB/
So, what happens here is, wso2am-analytics writes stats data to shared database, then apim reads it and shows data on its databases.
I'm performing WSO2 API manager + Analytics 2.0 POC now. When i change datasource from H2 to Oracle, in wso2am-2.0.1-SNAPSHOT, there are 2 data source config files:
master-datasources.xml & metrics-datasources.xml, according Installing and configuring the databases, there should be WSO2AM_DB, WSO2UM_DB and the WSO2REG_DB datasource configurations, but i just find WSO2_CARBON_DB & WSO2AM_DB, so my questions are
Is WSO2_CARBON_DB = WSO2UM_DB + WSO2REG_DB?
for WSO2_METRICS_DB, according Enabling Metrics and Storage Types, if we enable JDBC storage, can we store all components metrics information in one shared db or it needs one db per component(local)?
What's WSO2_MB_STORE_DB used for? from the scripts, it's for Message Store and Andes Context Store. Can we keep to use H2 in prod. cluster env.?
When i config wso2am-analytics-2.0.0-SNAPSHOT, i have below questions:
Can we share WSO2_CARBON_DB setting for both APIMGRT related components and analytics? or it's better to not share?
For WSO2AM_STATS_DB, is analytics resposible to aggregate and write to it, APIMGRT responsible to read? Which APIMGRT components need to read it?
For analytics related store, it supports RDBMS, Cassandra, HBase, but it does not support mongodb, right?
for GEO_LOCATION_DATA, What's this used for? Can we just use H2 in prod. env.?
APIM:
1) In default pack, yes. But in a production environment, it is recommended to separate them as WSO2_CARBON_DB, WSO2UM_DB and WSO2REG_DB (Please note you need WSO2_CARBON_DB too, to store local data. And this can be an h2 database)
2) You can have a shared DB
3) WSO2_MB_STORE_DB is required only if you use Advanced Throttling. Tables for this are created by APIM itself. So you don't need to run any scripts on it.
APIM Analytics:
1) You can share WSO2UM_DB and WSO2REG_DB. But don't share (local) WSO2_CARBON_DB.
2) Store and Publisher
3) See WSO2 DAS with MongoDB
4) GEO_LOCATION_DATA is used for Geolocation Based Statistics. H2 is not recommended.
I am running the WSO2 API Manager which is then posting it's stats to BAM. However the usage stats per subscriber per API call is taking ages and eating up large amount of CPU. I am guessing since unlike most calls it is not date based the data is to large and I was wondering as I use MySql as the DB for BAM if there is a way to wipe data from it. I have found a few ways to clear the Casandra on BAM but nothing about also clearing the statistics from MySql
Thanks
If you don't need your old data you can simply drop the column families related to apim_stat in cassandra.
And then you can delete data from apim_stat tables
API_DESTINATION_SUMMARY
API_FAULT_SUMMARY
API_REQUEST_SUMMARY
API_RESPONSE_SUMMARY
API_Resource_USAGE_SUMMARY
API_THROTTLED_OUT_SUMMARY
API_VERSION_USAGE_SUMMARY
We are working on the ETL. How to read data from the POSTGRESQL data base using streams in DATA ANALYTICS SERVER and manipulate some operations using the streams and insert the manipulated data into another POSTGRESQL data base on a scheduled time. Please share the procedures to follow.
Actually, you don't need to publish data from your PostgreSQL server. Using WSO2 Data Analytics Server (DAS) you can pull data from your database and do the analysis. Finally, you can push results back to the PostgreSQL server. In DAS, we have a special connector called "CarbonJDBC" and using that connector you can easily do this.
The current version of the "CarbonJDBC" connector supports following database management systems.
MySQL
H2
MS SQL
DB2
PostgreSQL
Oracle
You can use following query to pull data from your PostgreSQL database and populate a spark table. Once spark table is populated with data, you can start you data analysis tasks.
create temporary table <temp_table> using CarbonJDBC options (dataSource "<datasource name>", tableName "<table name>");
select * from <temp_table>;
insert into / overwrite table <temp_table> <some select statement>;
For more information regarding "CarbonJDBC" connector please refer following blog post [1].
[1]. https://pythagoreanscript.wordpress.com/2015/08/11/using-the-carbon-spark-jdbc-connector-for-wso2-das-part-1/