Create a QuickSight dataset from a PostgreSQL materialized view - amazon-web-services

The AWS QuickSight Documentation mentions that:
You can retrieve data from tables and materialized views in PostgreSQL instances, and from tables in all other database instances.
When creating a dataset from my PostgreSQL 9.5 database, none of my materialized views display in the list to select from.
Is the documentation incorrect? Is there somewhere else I should be selecting from?

I haven't used views as my source. However I can usually see tables only from one schema. Maybe your views are in different schema?
If that is not the case, just use Query instead of Table as source for your dataset and just select * from myview.

Related

Create PowerBI Datamart from Azure Analysis Service

I am trying to create PowerBI Datamart from Azure Analyis service. There is a datamodel available in the Azure Analysis Service and I can connect using URL and Database Name. The datamodel has ~100 tables present in it and relationship also setup. So my question is, if I want to create a PowerBI datamart from the Azure Analyis service datamode, I need to do the Get Data option of PowerBI datamart and connect to Azure Analyis service, select table, select fields 100 time for getting all the tables of Azure Analyis service datamode into my PowerBI datamart? Is there any import function available where I can import all the tables in a single time?
Why do you want to copy data from AAS into a database?
The reason you find it difficult is that it's an odd thing to do. The query designer for AAS/SSAS generates MDX queries which are indented to run aggregate queries that return a handful of rows, and are wholly unsuitable for extracting whole tables. If you try, the queries will just run forever and fail.
It's possible to extract data from AAS/SSAS tabular models, but you must use DAX not MDX, and so you need to the Power Query or "Transform Data" window, and use the advanced editor.
Each query to load a table should look like this, eg to load the 'Customer' table:
let
Dax = "evaluate Customer",
Source = AnalysisServices.Database("asazure://southcentralus.asazure.windows.net/myserver", "mydatabase", [Query=Dax])
in
Source

Do views of tables in BigQuery benefit from partitioning/clustering optimization?

We have a few tables in BigQuery that are being updated nightly, and then we have a deduplication process doing garbage collection slowly.
To ensure that our UI is always showing the latest, we have a view setup for each table that simply does a SELECT WHERE on the newest timestamp record_id combination
We're about to setup partitioning and clustering to optimize query scope/speed and I couldn't find a clear answer in Google documentation on whether the view of that table will still have partitioned queries or it will end up querying all data.
Alternatively when we create the view, can we include the partition and cluster on in the query that builds the view?
If you're talking about a logical view, then yes if the base table it references is clustered/partitioned it will use those features if they're referenced from the WHERE clause. The logical view doesn't have its own managed storage, it's just effectively a SQL subquery that gets run whenever the view is referenced.
If you're talking about a materialized view, then partitioning/clustering from the base table isn't inherited, but can be defined on the materialized view. See the DDL syntax for more details: https://cloud.google.com/bigquery/docs/reference/standard-sql/data-definition-language#create_materialized_view_statement

Are Amazon Athena views actually hive views, or are they a separate bolt-on?

Amazon Athena is based on Presto. Amazon Athena supports views.
Presto does not support Hive views because it doesn't want to deal with Hive Query Language. Since a view is actually a Hive query, it would have to understand hive's entire language rather than just its schema. Presto supports views via its Hive connector. These views are "Presto views", are Presto-specific (cannot be queried from Hive).
Does Athena support Hive views under the covers? Or are Athena views an entirely separate layer/bolt-on that just saves named Presto/Athena queries?
To the best of my knowledge they are Presto views. I've dug into how views are saved in the Glue catalog, and talked to the Athena team about why it's done the way it is. I'm no expert on what makes something a Presto view vs. a Hive view, but Athena is not doing anything on top of Presto when it comes to views.
When you create a view in Athena it creates a table in Glue that is of type VIRTUAL_VIEW, and has TableInput.ViewOriginalText with a very special structure (see below). Parameters also needs to contain presto_view: true.
The structure in TableInput.ViewOriginalText looks like this /* Presto View: <BASE64 DATA> */, where the payload is a base 64 encoded JSON structure that describes the view. Value of TableInput.ViewOriginalText is produced by Presto (see https://github.com/prestosql/presto/blob/27a1b0e304be841055b461e2c00490dae4e30a4e/presto-hive/src/main/java/io/prestosql/plugin/hive/HiveUtil.java#L597-L600).
If the question is whether or not views created in Athena can be used by other tools that connect to the Glue catalog I think the answer is no. The way they are encoded is Presto-specific.

WSO2 DAS spark script

I'm trying to deploy new data publisher car. I looked at tthe APIM_LAST_ACCESS_TIME_SCRIPT.xml spark script (used by api manager) and didn't understand the difference between the two temporaries tables created: API_LAST_ACCESS_TIME_SUMMARY_FINAL and APILastAccessSummaryData
The two Spark temporary tables represent different JDBC tables (possibly in different datasources), where one of them acts as the source for Spark and the other acts as the destination.
To illustrate this better, have a look at the simplified script in question:
create temporary table APILastAccessSummaryData using CarbonJDBC options (dataSource "WSO2AM_STATS_DB", tableName "API_LAST_ACCESS_TIME_SUMMARY", ... );
CREATE TEMPORARY TABLE API_LAST_ACCESS_TIME_SUMMARY_FINAL USING CarbonAnalytics OPTIONS (tableName "API_LAST_ACCESS_TIME_SUMMARY", ... );
INSERT INTO TABLE APILastAccessSummaryData select ... from API_LAST_ACCESS_TIME_SUMMARY_FINAL;
As you can see, we're first creating a temporary table in Spark with the name APILastAccessSummaryData, which represents an actual relational DB table with the name API_LAST_ACCESS_TIME_SUMMARY in the WSO2AM_STATS_DB datasource. Note the using CarbonJDBC keyword, which can be used to directly map JDBC tables within Spark. Such tables (and their rows) are not encoded, and can be read by the user.
Second, we're creating another Spark temporary table with the name API_LAST_ACCESS_TIME_SUMMARY_FINAL. Here however, we're using the CarbonAnalytics analytics provider, which will mean that this table will not be a vanilla JDBC table, but an encoded table similar to the one from your previous question.
Now, from the third statement, you can see that we're reading (SELECT) a number of fields from the second table API_LAST_ACCESS_TIME_SUMMARY_FINAL and inserting them (INSERT INTO) into the first, which is APILastAccessSummaryData. This represents the Spark summarisation process.
For more details on the differences between the CarbonAnalytics and CarbonJDBC analytics providers or on how Spark handles such tables in general, have a look at the documentation page for Spark Query Language.

How can I create a model with ActiveRecord capabilities but without an actual table behind?

I think this is a recurrent question in the Internet, but unfortunately I'm still unable to find a successful answer.
I'm using Ruby on Rails 4 and I would like to create a model that interfaces with a SQL query, not with an actual table in the database. For example, let's suppose I have two tables in my database: Questions and Answers. I want to make a report that contains statistics of both tables. For such purpose, I have a complex SQL statement that takes data from these tables to build up the statistics. However the SELECT used in the SQL statement does not directly take values from neither Answers nor Questions tables, but from nested SELECTs.
So far I've been able to create the StatItem model, without any migration, but when I try StatItem.find_by_sql("...nested selects...") the system complains about unexisting table stat_items in the database.
How can I create a model whose instance's data is retrieved from a complex query and not from a table? If it's not possible, I could create a temporary table to store the data in there. In such case, how can I tell the migration file to not create such table (it would be created by the query)?
How about creating a materialized view from your complex query and following this tutorial:
ActiveRecord + PostgreSQL Materialized Views
Michael Kohl and his proposal of materialized views has given me an idea, which I initially discarded because I wrongly thought that a single database connection could be shared by two processes, but after reading about how Rails processes requests, I think my solution is fine.
STEP 1 - Create the model without migration
rails g model StatItem --migration=false
STEP 2 - Create a temporary table called stat_items
#First, drop any existing table created by older requests (database connections are kept open by the server process(es).
ActiveRecord::Base.connection.execute('DROP TABLE IF EXISTS stat_items')
#Second, create the temporary table with the desired columns (notice: a dummy column called 'id:integer' should exist in the table)
ActiveRecord::Base.connection.execute('CREATE TEMP TABLE stat_items (id integer, ...)')
STEP 3 - Execute an SQL statement that inserts rows in stat_items
STEP 4 - Access the table using the model, as usual
For example:
StatItem.find_by_...
Any comments/improvements are highly appreciated.