Export single column from DynamoDB to csv (or the like) - amazon-web-services

My DynamoDB table is quite large and I don't particularly want to dump the whole thing. There is one column that I want to test on, so I would like a dump of all of its values that I could have locally to code/test with. However I am not finding anything that lets me do this.
I found RazorSQL and it semi worked (in the sense that it let me pull down just one column of information from the table but it clearly didn't pull down all the data).
I also found a Data Pipeline Template on AWS but from what I can tell this will dump the entire table. I am relatively new to AWS so it's possible I'm not understanding something about pipelines properly.
I'm okay with writing to S3 because I can pull down all the data from there, but anything that gets to my local machine is fine by me
Thanks for the help!
UPDATE: This tutorial looks promising but I want to achieve this effect in a non-interactive method

Related

DBT - write dbt test --store-failures to a specific table in my data warehouse

Good afternoon,
I want to write the dbt test values to a specific table in my data warehouse.
I have tested multiple schema inclusions in all the possible .yml files and I am not really finding the correct place to specify to which database I want the tests to be recorded.
Nowadays, it always answers me with the error of not having the permissions to perform a glue:CreateDatabase action, which in fact is not what I want to do, but rather write to table specified by me.
To conclude, what I am asking here is how can I specify where the dbt tests results are being written, instead of letting dbt create and store the values in the default schemas?
If somebody could help me on this I would really appreciate it!
Well, I managed to fix it, I was almost throwing my computer away, because I didn't have more air to pull off, but basically we can specify the target database in the dbt_project.yml. To do that, just add to the dbt_project.yml
tests:
+store_failures: true
+schema: "path that you want"
I did not find any information in dbt documentation, neither community forums, so it was just an iterative process of testing possible approaches

What to do with Athena Results Files?

Newer to AWS and working with Athena for the first time. Would appreciate any help/clarification.
I set the query results location to be s3://aws-athena-query-results-{ACCOUNTID}-{Region}, I can see that whenever I am running the query, whether it be from console or externally elsewhere, that the two results file are created as expected.
However, my question is what are supposed to do with these files long term? What are some recommendations on rotating them? From what I understand, these are the query results (other one is metadata file) that contains the results of the user's query and is passed back to them. What are the recommendations on how to manage the query results bucket files? I don't want to just let them accumulate there and comeback to a million files if that makes sense.
I did search through the docs and couldn't find info on the above topic, maybe I missed it? Would appreciate any help!
Thanks!
From the documentation,
You can delete metadata files (*.csv.metadata) without causing errors,
but important information about the query is lost
The query results files can be safely deleted if you dont want to refer back to the query that ran at a particular date in past and the result it returned. If you have deleted the results files from the S3 buckets and from Athena "History" trying to download the result, it will just give you error message that result file is not available.
In summary, its up to your use case whether you can afford to run the same query in future if required? or just want to extract the result from past run history.

Is there a way to create an alias to a Redshift table?

I have the requirement to change several table names to adjust to a convention (it's just in Dev). However, there are several consumers already using those tables (directly, then again it's just Dev and it will not be kept that way). Is there a way to change the name and keep the old one as an alias, for a transition period? I have browsed Redshift documentation but I haven't found anything like that.
Thank you!
Using CREATE VIEW is the closest thing to an alias.
It also gives you the ability to present a subset of columns and even differently-named columns, which can be handy when migrating to a new schema.

Updating a field in all records in elasticsearch

I'm new to ElasticSearch, so this is probably something quite trivial, but I haven't figured out anything better that fetching everything, processing with a script and updating the registers one by one.
I want to make something like a simple SQL update:
UPDATE RECORD SET SOMEFIELD = SOMEXPRESSION
My intent is to replace the actual bogus data with some data that makes more sense (so the expression is basically randomly choosing from a pool of valid values).
There are a couple of open issues about making possible to update documents by query.
The technical challenge is that lucene (the text search engine library that elasticsearch uses under the hood) segments are read only. You can never modify an existing document. What you need to do is delete the old version of the document (which by the way will only be marked as deleted till a segment merge happens) and index the new one. That's what the existing update api does. Therefore, an update by query might take a long time and lead to issues, that's why it's not released yet. A mechanism that allows to interrupt running queries would be a nice to have too for this case.
But there's the update by query plugin that exposes exactly that feature. Just beware of the potential risks before using it.

data is generated in database even if we perform preview data option in spoon

can some one suggest me best idea to overcome this situation. Iam using kettle 4.1.0 community version, here when i want to preview the data in spoon for the transformation table output, then when i click on preview data option, the data is being generated in database directly even if we are not performing Run transformation option.. how can i overcome this problem..
regards
kiran kumar.g
Thats just how it works. Perhaps the name "preview" is badly named.
Couple of ways around it. Preview the step before the table output, and disable the hop so no data goes to table output. If the table output step is collecting several inputs, then put a "dummy" step and make that do the collecting, and then preview that.
or change your db to use a local db (via properties or jndi or even a different connection on the step) then you wont care if the db is generated.