aws emr with glue: how to specify database name? - amazon-web-services

I'm trying to run a hive job using Glue metadata. From the aws docs
Under AWS Glue Data Catalog settings select Use for Hive table
metadata.
I created a cluster that apparently connects to the default database from glue (i can tell by running show tables; from hive, which lists a table from defaultdatabase.
Now does anyone know how to provide an option to connect to another database from glue ? The only thing I could find in the docs is the opportunity of providing a hive.metastore.glue.catalogid where you can provide a catalog from another account, but I cannot find anything in the docs about using the right database.
Or perhaps all the databases are loaded. If so, do you know how to access them within hive ?

Ok, it turns out all the databases are loaded in hive. You can simply access them by using select * from my_database_name.my_table_name, or by setting the database name once with use my_database_name

Related

need DDL for all the tables under schemas in Redshift in one go

I need to create another redshift cluster in different region from my existing redshift cluster.so, i need the DDL for the all the tables of all schemas of my existing Redshift cluseter.please assist.
Please note - i tried runnng the script mentioned in the link to create views but it's not working - https://github.com/awslabs/amazon-redshift-utils/blob/master/src/AdminViews/v_generate_tbl_ddl.sql
I'd migrate a snapshot of the database to the other region and restore a new cluster from this snapshot. AWS has a page on doing exactly this.
https://aws.amazon.com/blogs/big-data/migrate-your-amazon-redshift-cluster-to-another-aws-region/

In AWS Redshift cluster i haved created a database now i want to see the database manually instead of querying it

i have created a database in Redshift cluster now i want see the database and its tables manually instead of querying it.
Where can i see those database
create database example1;
With Redshift, there is no way to look at the data in any way except by issuing queries and commands against it. This is fairly common for most DBMS products.
AWS "recommend" the free tool Sqlworkbench/J
https://docs.aws.amazon.com/redshift/latest/mgmt/connecting-using-workbench.html
In addition you can issue commands against Redshift using the AWS management console
https://docs.aws.amazon.com/redshift/latest/mgmt/query-editor.html
My personal favorite (as a professional developer) is to use the Jetbrains DataGrip product.

Create Database on Amazon Redshift with Query Editor

I've created a Redshift cluster using the AWS management console. The cool thing that AWS setup was this query editor to be able to write queries directly on your cluster without having to install a SQL client on your computer.
However, I was trying to create a new database on the instance but it doesn't seem to be possible using AWS query editor. Am I right or did I miss something?
I indeed missed something, you simply need to go into your query editor and write
CREATE DATABASE db_name OWNER=db_owner;

Where can I see tables for RDS instances in AWS console?

I created the RDS instance in AWS console, and I created the table and load the SQL script. Am I able to see the table and data for this RDS instance in AWS console?
No, you cannot see the RDS data (tables, rows, etc.) in the AWS Management Console.
To see the data, you'll need the appropriate client depending on the RDS engine type. Some examples:
MySQL: MySQL Workbench
SQL Server: SQL Server Management Studio
PostgreSQL: pgAdmin
Oracle: Oracle SQL Developer
It's possible to achieve, you can use AWS Glue - https://aws.amazon.com/blogs/big-data/how-to-access-and-analyze-on-premises-data-stores-using-aws-glue/
You can actually achieve this using the RDS query editor.
Type this command:
select * from information_schema.tables;
You will have to visually search for your tables here. Look through the "table_name" column until you can identify them. Every time I've used this command, the database tables I created were either listed first or very last. It's not a perfect way to do it, but it will usually suffice, and you don't need any extra services or software to achieve it.
You can use the QueryEditor to list the tables you've created using this SQL:
select * from information_schema.tables where TABLE_SCHEMA = 'name of your database goes here';

Presto on Amazon S3

I'm trying to use Presto on Amazon S3 bucket, but haven't found much related information on the Internet.
I've installed Presto on a micro instance but I'm not able to figure out how I could connect to S3. There is a bucket and there are files in it. I have a running hive metastore server and I have configured it in presto hive.properties. But when I try to run the LOCATION command in hive, its not working.
IT throws an error saying cannot find the file scheme type s3.
And also I do not know why we need to run hadoop but without hadoop the hive doesnt run. Is there any explanation to this.
This and this are the documentations i've followed while set up.
Presto uses the Hive metastore to map database tables to their underlying files. These files can exist on S3, and can be stored in a number of formats - CSV, ORC, Parquet, Seq etc.
The Hive metastore is usually populated through HQL (Hive Query Language) by issuing DDL statements like CREATE EXTERNAL TABLE ... with a LOCATION ... clause referencing the underlying files that hold the data.
In order to get Presto to connect to a Hive metastore you will need to edit the hive.properties file (EMR puts this in /etc/presto/conf.dist/catalog/) and set the hive.metastore.uri parameter to the thrift service of an appropriate Hive metastore service.
The Amazon EMR cluster instances will automatically configure this for you if you select Hive and Presto, so it's a good place to start.
If you want to test this on a standalone ec2 instance then I'd suggest that you first focus on getting a functional hive service working with the Hadoop infrastructure. You should be able to define tables that reside locally on the hdfs file system. Presto complements hive, but does require a functioning hive set-up, presto's native ddl statements are not as feature complete as hive, so you'll do most table creation from hive directly.
Alternatively, you can define Presto connectors for a mysql or postgresql database, but it's just a jdbc pass through do I don't think you'll gain much.