CKAN: automatically delete datastore tables when a resource is removed - datastore

I have a ckan instance configured with the filestore, datastore and datapusher plugins enabled.
When I create a new resource, the datapusher plugin correctly adds a new table to the datasoredb and populates it with the data.
But if I update the resource, a new datapusher task is executed and everything updates correctly. On another ckan instance with a resource linked to it, I have to manually run the task, but everything works ok.
The problem comes if I delete the resource. The datastore tables are still available, and even the link to the file is still active.
Is there some way to configure it to autoremove every trace of the resource??? I mean, remove the files from the filestore, the tables from the datastore, the api, the links, etc.

I partially confirmed this behaviour with http://demo.ckan.org, which is currently ckan_version: "2.4.1"
Create a resource
Query resource via data pusher
Delete resource
Query resource via datastore_search API -> still works can query.
Attempt to access resource file -> 404 - not found.
Will file as bug.
Perhaps use this to delete ?http://docs.ckan.org/en/latest/maintaining/datastore.html#ckanext.datastore.logic.action.datastore_delete

This is possible through CLI:
sudo -u postgres psql datastore_default
(assumes datastore installed from package using these Datastore Extension settings and database name is datastore_defaultand postgres is superuser).
THEN (OPTIONAL TO FIND ALL RESOURCE UUID's):
\dt to list all tables
THEN:
DROP TABLE "{RESOURCE ID}";
(Replace {RESOURCE ID} with resource UUID)

Related

AWS AppSync searchItems type return data while table is empty

I deleted all the items in the DataTemplate table but when I query them again with the searchDataTemplates endpoint on the app or in AppSync it returns the old data, but when I use the listDataTemplates it returns nothing which is correct. Needed to repopulate the data in the table.
data template table
search endpoint
list endpoint
when I updated items individually it worked just fine but when i deleted all the items from the console (around 700 items) the search endpoint stopped working. Just the search
UPDATE:
I repopulated the data hoping it'd reset but now the listDataTemplates shows the new data and the search still shows the old data, is there some cache that needs to be reset?
SECOND UPDATE:
I removed the table and the appsync functions are gone however when i recreated the table (with no data) the testing out the function still returns the old data. I'm guessing the opensearch stuff hasn't been updated?
If you are using AppSync with Amplify CLI, #searchable will automatically create the followings:
An OpenSearch Domain
A Lambda Function that will be attached to the DynamoDB Streams and push the changes (create/update/delete) over to your OpenSearch Domain.
And the problem that you're facing is most likely due to the Lambda Function created failed to push the changes from DynamoDB Streams to OpenSearch. A quick suggestion is to check on the created Lambda Function first.
Reference: #searchable
This issue can only happen if caching is enabled in your application.
I am not sure what's the infrastructure you are using, so i would go ahead with some educated guess. Please feel free to correct me if i overstepped.
From your description of question, you have an AppSync as API layer and DynamoDb as primary database.
If these are the only two resources you have, please check the AppSync cache configuration.
Open AppSync console
from left panel select APIs -> your api -> caching
Validate Caching behavior is set to None
In case if you have AWS OpenSearch enabled for search query (i could be wrong, however picking up from previous comment). Then validate the cluster configuration.
Open AWS Open Search Service console
From left panel select Domains and click on the openserch domain that you are using
scroll to the bottom right and look for Advanced cluster settings and ensure the attribute Fielddata cache allocation is set to 0
If Fielddata cache allocation is not 0, update the cluster configuration and modify the advanced cluster setting to set the Fielddata cache allocation field to 0.
Wait for a few minutes (I would suggest 5 minutes) and then retry your use-case.
I hope this would help resolve your issue.

Using S3 as target for AWS DMS: Uploaded File name doesn't change

We are using DMS to get data from SQL Server and load it in S3 bucket, after which the data is finally loaded into Snowflake DB using Snowpipe for Full Load.
Now, in order for Snowpipe to know there is new data in S3 bucket, the filename needs to be different than the last one. Have tried all the task setting options available (DROP_AND_CREATE, DO_NOTHING, TRUNCATE) to have the file name different, but still not working. It loads the file name as LOAD00000001.csv
In documentation it shows that file name will be incremental (eg. LOAD00000001.csv, LOAD00000002.csv .. and so on) but it's not happening. Which is why the Snowpipe is not able to register the changes.
https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Target.S3.html
Can someone please help?
For DMS the incremental counter is started over from 1 each time the task is run. It does not have a "Don't override existing objects" feature.
Your best bet may be to handle the load yourself by looking for updated object timestamps in your folder or setting up S3 event notifications.

Unable to upload to S3 with Grails S3 Demo Application

I am trying to run a demo project for uploading to S3 with Grails 3.
The project in question is this, more specifically the S3 upload is only for the 'Hotel' example at the end.
When I run the project and go to upload the image, I get an updated message but nothing actually happens - there's no inserted url in the dbconsole table.
I think the issue lies with how I am running the project, I am using the command:
grails -Daws.accessKeyId=XXXXX -Daws.secretKey=XXXXX run-app
(where I am supplementing the X's for my keys obviously).
This method of running the project appears to be slightly different to the method shown in the example. I run my project from the command line and I do not use GGTS, just Sublime.
I have tried inserting my AWS keys into the application.yml but I receive an internal server error then.
Can anyone help me out here?
Check your bucket policy in s3. You need to grant permissions to the API user to allow uploads.

Connect IntelliJ to Amazon Redshift

I'm using the latest version of IntelliJ and I've just created a cluster in Amazon Redshift. How do I connect IntelliJ to Redshift so that I can query it from my favorite IDE?
Download a jdbc driver:
http://docs.aws.amazon.com/redshift/latest/mgmt/configure-jdbc-connection.html#download-jdbc-driver
On IntelliJ: View |Tool Windows | Database
Click on "Data Source
Properties" ()
Click Add (+) and select "Database Driver":
Uncheck "JDBC drivers", and add a jdbc driver, select a class from the dropdown and select a PostgreSQL dialect:
6.Add a new connection, and use this datasource for your connection: (+ | Data Source | RedShift).
7.Set URL templates:
jdbc:redshift://[{host::localhost}[:{port::5439}]][/{database::postgres}?][\?<&,user={user:param},password={password:param},{:identifier}={:param}>]
jdbc:redshift://\[{host:ipv6:\:\:1}\][:{port::5439}][/{database::postgres}?][\?<&,user={user:param},password={password:param},{:identifier}={:param}>]
jdbc:redshift:{database::postgres}[\?<&,user={user:param},password={password:param},{:identifier}={:param}>]
You can connect IntelliJ to Redshift by the using the JDBC Driver supplied by Amazon. In the Redshift Console, go to "Connect Client" to get the driver.
Then, in the IntelliJ Data Source window, add the JAR as a Driver file, and use the following settings:
Class: com.amazon.redshift.jdbc41.Driver
URL template: jdbc:redshift://{host}:{port}/{database}
Common Pitfalls:
If the driver file is not readable or marked as in quarantine by OS X, you won't be able to select the driver class.
For a more detailed guide, see this blog post: Connecting IntelliJ to Redshift
Note: There is no native Redshift support in IntelliJ yet. IntelliJ Issue DBE-1459
Update for 2019: I've just created a PostgreSQL connection and then filled the usual Redshift settings (don't forget port: 5439), no need to download Amazon's JDBC driver.
Only little issue is that the syntax check doesn't know Redshift specificities such as AS and some functions, but queries execute correctly.
Update for 2020: PyCharm (and possibly all other JetBrains IDEs) now supports connecting to Redshift through IAM AWS credentials without manual driver installation.
Here are the detailed setup instructions:
Grant a redshift:GetClusterCredentials permission to your AWS user. Either create and attach a new policy (docs) or use an existing one such as AmazonRedshiftFullAccess (not recommended: too permissive).
Create an AWS access key (access key id + secret access key pair) for your user (docs).
Create a text configuration file ~/.aws/credentials (no extension) with the following content (docs):
[default] # arbitrary profile name, will be used later
region = <your region>
aws_access_key_id = <your access key id> # created on the previous step
aws_secret_access_key = <your secret access key>
Create a new PyCharm database connection of type Amazon Redshift and set it up (docs):
Choose connection type = IAM cluster/region (right under the «General» tab of the connection settings window).
Authentication = AWS Profile
User = {your AWS login}
Profile = default or the one you have used in credentials file.
The credentials can possibly be provided through AccessKeyID/SecretAccessKey connection settings on the «Advanced» tab but it did not work for me (due to NullPointerException if Profile field is empty).
Database = {your database}, choose an existing one to not face non descriptive errors from the driver.
Region = {your region}
Cluster = {cluster name}, get it from Redshift AWS console.
Setup the connection:
Check necessary databases in the «Schemas» tab.
«Advanced» tab: AutoCreate = true (literal lowercase true as the setting value). This will automatically create a new database user with your AWS login.
Test connection.

Cloud Foundry can't auto create tables using Mysql

Solved
Found the problem, I had a schema defined in my java class
I have a cloud foundry app which uses a mysql data service.
It works great but I want to add another database table.
When I re-deploy to cloud foundry with the new entity class it does not create the table and the log has the following error
2012-08-12 20:42:23,699 [main] ERROR org.hibernate.tool.hbm2ddl.SchemaUpdate - CREATE command denied to user 'ulPKtgaPXgdtl'#'172.30.49.146' for table 'acl_class'
Schema is created dynamically via the service. All you need to do is bind the application to the service and use the cloud namespace. As you mentioned above, removing the schema name from your configuration file will resolve the issue.