Using Harbor Helm with RDS? - amazon-web-services

Using Harbor Helm with RDS? - amazon-web-services

Is it possible to use Harbor Helm with RDS?
The original installation of Harbor, without using Helm Charts and Kubernetes, involves a harbor.yml that requires 4 databases to be set up: Harbor Core, Clair, Notary Server, and Notary Signer.
I have been told that using Harbor Helm requires these databases to be set up and managed. Therefore, when using Harbor Helm, that installs Harbor in a Kubernetes Cluster, do we still need these 4 databases to be set up and configured? If so, should RDS be used?

Yes, you do, We are using Postgres via RDS which is deployed via terraform. I then
updated the Harbor Helm Chart via Kustomize to inject an initContainer.
The initContainer then executes the following script which is passed the 4 database names registry, clair, notary_signer, notary_server
#!/bin/bash
echo "Creating Databases: $#"
for var in "$#"
do
select="SELECT 1 FROM pg_database WHERE datname = '$var'"
create="CREATE DATABASE $var;"
echo "psql -h <%=database.external.host%> -U postgres -tc \"$select\""
psql -h <%=database.external.host%> -U postgres -tc "select 1 from pg_database where datname = '$var';" | grep -q 1 || psql -h <%=database.external.host%> -U postgres -tc "$create"
done
It sort of stinks that Postgres does not have CREATE DATABASE IF NOT EXISTS like CockroachDB does.

Related

How to find the schema of Airflow Backend database?

I am using apache airflow (v 1.10.2) on Google Cloud Composer, and I would like to view the schema of the airflow database. Where can I find this information?

There are couple of ways I can think of comparing our current design.
External metadata DB. If you can connect to the DB then you can get the schema.
From your UI you can go to Data Profiling and run query against the metadata tables(depends on your database types(mysql or postgres etc) and find the information from there and create a schema diagram.
I hope this helps.

According to the Composer architecture design Cloud SQL is the main place where all the Airflow metadata is stored. However, in order to grant authorization access from client application over the GKE cluster to the database we use Cloud SQL Proxy service. Particularly in Composer environment we can find airflow-sqlproxy* Pod, leveraging connections to Airflow Cloud SQL instance.
Saying this, I believe that it will not make any problem establish connection to the above mentioned Airflow database from any of the GKE cluster workloads(Pods).
For instance, I will perform connection from Airflow worker reaching airflow-sqlproxy-service.default Cloud SQL proxy service and further perform DB discovering via mysql command-line util:
kubectl -it exec $(kubectl get po -l run=airflow-worker -o jsonpath='{.items[0].metadata.name}' \
-n $(kubectl get ns| grep composer*| awk '{print $1}')) -n $(kubectl get ns| grep composer*| awk '{print $1}') \
-c airflow-worker -- mysql -u root -h airflow-sqlproxy-service.default
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql> show databases;
+----------------------------------------+
| Database |
+----------------------------------------+
| information_schema |
| composer-1-8-3-airflow-1-10-3-* |
| mysql |
| performance_schema |
| sys |
+----------------------------------------+
5 rows in set (0.00 sec)

How to use pg_restore with AWS RDS correctly to restore postgresql database

I am trying to restore my Postgresql database to AWS RDS. I think I am almost there. I can get a dump, and recreate the db locally, but I am missing the last step to restore it to AWS RDS.
Here is what I am doing:
I get my dump
$ pg_dump -h my_public dns -U myusername -f dump.sql myawsdb
I create a local db in my shell called test:
create database test;
I put the dump into my test db
$ psql -U myusername -d test -f dump.sql
so far so good.
I get an error: psql:dump.sql:2705: ERROR: role "rdsadmin" does not exist, but I think I can ignore it, because my db is there with all the content. (I checked with \list and \connect test).
Now I want to restore this dump/test to my AWS RDS.
Following this https://gist.github.com/syafiqfaiz/5273cd41df6f08fdedeb96e12af70e3b
I now should do:
pg_restore -h <host> -U <username> -c -d <database name> <filename to be restored>
But what is my filename and what is my database name?
I tried:
pg_restore -h mydns -U myusername -c -d myawsdbname test
pg_restore -h mydns -U myusername -c -d myawsdbname dump.sql
and a couple of more options that I don't recall.
Most of the times it tells me something like: pg_restore: [archiver] could not open input file "test.dump": No such file or directory
Or, for the second: input file appears to be a text format dump. Please use psql.
Can somone point me into the right direction? Help is very much appreciated!
EDIT: So I created a .dump file using $ pg_dump -Fc mydb > db.dump
Using this file I think it works. Now I get the error [archiver (db)] could not execute query: ERROR: role "myuser" does not exist
Command was: ALTER TABLE public.users_user_user_permissions_id_seq OWNER TO micromegas;
Can I ingore that?
EDIT2: I got rid of the error adding the flags--no-owner --role=mypguser --no-privileges --no-owner

Ok, since this is apparently useful to some I will post - to the best of what I remember - an answer to this. I will answer this more broadly and not too AWS-specific because a) I don't use this instance anymore and b) I also don't remember perfectly how I did this.
But I gained experience with PostreSQL and since AWS RDS was also just a postgres instance the steps should work quite similar.
Here are my recommended steps when restoring a postgreSQL DB instance:
Pull the backup in a .dump-format and not in .sql-format. Why? The file-size will be smaller and it is easier to restore. Do this with the following command:
pg_dump -h <your_public_dns_ending_with.rds.amazonaws.com> -U <username_for_your_db> -Fc <name_of_your_db> > name_for_your_backup.dump
Now you can restore the backup easily to any postgreSQL instance. In general I'd recommend to set up a fresh DB instance with a new username and new databasename. Let's say you have a DB that is called testname with superuser testuser. Then you can just do:
pg_restore --no-owner --no-privileges --role=testuser -d testname <your_backup_file.dump>
And that should restore your instance.
When restoring to AWS or to any remote postgreSQL instance you will have to specify the host with the -h-flag. So this might be something like:
pg_restore -h <your_public_dns_ending_with.rds.amazonaws.com> -p 5432 --no-owner --no-privileges --role=testuser -d testname <your_backup_file.dump>
If you have a DB-instance running on a remote linux server, the host will be be your remote IP-address (-h <ip_od_server>) and the rest will be the same.
I hope this helps. Any questions please comment and I'll try my best to help more.

How to migrate elasticsearch data to AWS elasticsearch domain?

I have elasticsearch 5.5 running on a server with some data indexed in it. I want to migrate this ES data to AWS elasticsearch cluster. How I can perform this migration. I got to know that one way is by creating the snapshot of ES cluster, but I am not able to find any proper documentation for this.

The best way to migrate is by using Snapshots. You will need to snapshot your data to Amazon S3 and then proceed a restore from there. Documentation for snapshots to S3 can be found here. Alternatively, you can also re-index your data though this is a longer process and there are limitations depending on the version of AWS ES.
I also recommend looking at Elastic Cloud, the official hosted offering on AWS that includes the additional X-Pack monitoring, management, and security features. The migration guide for moving to Elastic Cloud also goes over snapshots and re-indexing.

I momentarily created a shell script for this -
Github - https://github.com/vivekyad4v/aws-elasticsearch-domain-migration/blob/master/migrate.sh
#!/bin/bash
#### Make sure you have Docker engine installed on the host ####
###### TODO - Support parameters ######
export AWS_ACCESS_KEY_ID=xxxxxxxxxx
export AWS_SECRET_ACCESS_KEY=xxxxxxxxx
export AWS_DEFAULT_REGION=ap-south-1
export AWS_DEFAULT_OUTPUT=json
export S3_BUCKET_NAME=my-es-migration-bucket
export DATE=$(date +%d-%b-%H_%M)
old_instance="https://vpc-my-es-ykp2tlrxonk23dblqkseidmllu.ap-southeast-1.es.amazonaws.com"
new_instance="https://vpc-my-es-mg5td7bqwp4zuiddwgx2n474sm.ap-south-1.es.amazonaws.com"
delete=(.kibana)
es_indexes=$(curl -s "${old_instance}/_cat/indices" | awk '{ print $3 }')
es_indexes=${es_indexes//$delete/}
es_indexes=$(echo $es_indexes|tr -d '\n')
echo "index to be copied are - $es_indexes"
for index in $es_indexes; do
# Export ES data to S3 (using s3urls)
docker run --rm -ti taskrabbit/elasticsearch-dump \
--s3AccessKeyId "${AWS_ACCESS_KEY_ID}" \
--s3SecretAccessKey "${AWS_SECRET_ACCESS_KEY}" \
--input="${old_instance}/${index}" \
--output "s3://${S3_BUCKET_NAME}/${index}-${DATE}.json"
# Import data from S3 into ES (using s3urls)
docker run --rm -ti taskrabbit/elasticsearch-dump \
--s3AccessKeyId "${AWS_ACCESS_KEY_ID}" \
--s3SecretAccessKey "${AWS_SECRET_ACCESS_KEY}" \
--input "s3://${S3_BUCKET_NAME}/${index}-${DATE}.json" \
--output="${new_instance}/${index}"
new_indexes=$(curl -s "${new_instance}/_cat/indices" | awk '{ print $3 }')
echo $new_indexes
curl -s "${new_instance}/_cat/indices"
done

Properly reach Google Cloud SQL via proxy on Bitbucket Pipelines

Whats the proper way to proxy a Cloud SQL Database into Bitbucket Pipelines?
I have a Google Cloud SQL Postgres Instance (And also tried a MySQL DB).
Opening all ports to connections allows bitbucket pipelines to properly deploy my Django based Google App Engine Project, based off this example pipeline - https://github.com/GoogleCloudPlatform/continuous-deployment-bitbucket/blob/master/bitbucket-pipelines.yml
However, when I try to limit the access to the Cloud SQL instances and use cloud_sql_proxy instead, I can properly deploy locally, but Bitbucket will always fail to find the SQL Server
My bitbucket-pipelines.yml looks something like this:
- export CLOUDSDK_CORE_DISABLE_PROMPTS=1
# Google Cloud SDK is pinned for build reliability. Bump if the SDK complains about deprecation.
- SDK_VERSION=127.0.0
- SDK_FILENAME=google-cloud-sdk-${SDK_VERSION}-linux-x86_64.tar.gz
- curl -O -J https://dl.google.com/dl/cloudsdk/channels/rapid/downloads/${SDK_FILENAME}
- tar -zxvf ${SDK_FILENAME} --directory ${HOME}
- export PATH=${PATH}:${HOME}/google-cloud-sdk/bin
# Install Google App Engine SDK
- GAE_PYTHONPATH=${HOME}/google_appengine
- export PYTHONPATH=${PYTHONPATH}:${GAE_PYTHONPATH}
- python scripts/fetch_gae_sdk.py $(dirname "${GAE_PYTHONPATH}")
- echo "${PYTHONPATH}" && ls ${GAE_PYTHONPATH}
# Install app & dev dependencies, test, deploy, test deployment
- echo "key = '${GOOGLE_API_KEY}'" > api_key.py
- echo ${GOOGLE_CLIENT_SECRET} > client-secret.json
- gcloud auth activate-service-account --key-file client-secret.json
- wget https://dl.google.com/cloudsql/cloud_sql_proxy.linux.amd64 -O cloud_sql_proxy
- chmod +x cloud_sql_proxy
- ./cloud_sql_proxy -instances=google-cloud-project-name:us-west1:google-cloud-sql-database-name=tcp:5432 &
- gcloud app deploy --no-promote --project google-cloud-project-name --quiet
At this point, I would expect to be able to access the SQL database, but it doesn't seem to be available, and my deployment fails to find a local proxy'ed database

Migrate postgres dump to RDS

I have a Django postgres db (v9.3.10) running on digital ocean and am trying to migrate it over to Amazon RDS (postgres v 9.4.5). The RDS is a db.m3.xlarge instance with 300GB. I've dumped the Digital Ocean db with:
sudo -u postgres pg_dump -Fc -o -f /home/<user>/db.sql <dbname>
And now I'm trying to migrate it over with:
pg_restore -h <RDS endpoint> --clean -Fc -v -d <dbname> -U <RDS master user> /home/<user>/db.sql
The only error I see is:
pg_restore: [archiver (db)] Error from TOC entry 2516; 0 0 COMMENT EXTENSION plpgsql
pg_restore: [archiver (db)] could not execute query: ERROR: must be owner of extension plpgsql
Command was: COMMENT ON EXTENSION plpgsql IS 'PL/pgSQL procedural language';
Apart from that everything seems to be going fine and then it just grinds to a halt. The dumped file is ~550MB and there are a few tables with multiple indices, otherwise pretty standard.
The Read and Write IOPS on the AWS interface are near 0, as is the CPU, memory, and storage. I'm very new to AWS and know that the parameter groups might need tweaking to do this better. Can anyone advise on this or a better way to migrate a Django db over to RDS?
Edit:
Looking at the db users the DO db looks like:
Role Name Attr Member Of
<user> Superuser {}
postgres Superuser, Create role, Create DB, Replication {}
And the RDS one looks like:
Role Name Attr Member Of
<user> Create role, Create DB {rds_superuser}
rds_superuser Cannot login {}
rdsadmin ... ...
So it doesn't look like it's a permissions issue to me as <user> has superuser permissions in each case.
Solution for anyone looking:
I finally got this working using:
cat <db.sql> | sed -e '/^COMMENT ON EXTENSION plpgsql IS/d' > edited.dump
psql -h <RDS endpoint> -U <user> -e <dname> < edited.dump
It's not ideal for a reliable backup/restore mechanism but given it is only a comment I guess I can do without. My only other observation is that running psql/pg_restore to a remote host is slow. Hopefully the new database migration service will add something.

Considering your dumped DB file is of ~550MB, I think using the Amazon guide for doing this is the way out. I hope it helps.
Importing Data into PostgreSQL on Amazon RDS

I think it did not halt. It was just recreating indexes, foreign keys etc. Use pg_restore -v to see what's going on during the restore. Check the logs or redirect output to a file to check for any errors after import, as this is verbose.
Also I'd recommend using directory format (pg_dump -v -Fd) as it allows for parallel restore (pg_restore -v -j4).
You can ignore this ERROR: must be owner of extension plpgsql. This is only setting a comment on extension, which is installed by default anyway. This is caused by a peculiarity in RDS flavor of PostgreSQL, which does not allow to restore a database while connecting as postgres user.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js