I have the requirement to change several table names to adjust to a convention (it's just in Dev). However, there are several consumers already using those tables (directly, then again it's just Dev and it will not be kept that way). Is there a way to change the name and keep the old one as an alias, for a transition period? I have browsed Redshift documentation but I haven't found anything like that.
Thank you!
Using CREATE VIEW is the closest thing to an alias.
It also gives you the ability to present a subset of columns and even differently-named columns, which can be handy when migrating to a new schema.
Related
Are there any tried/true methods of managing your own sequential integer field w/o using SQL Server's built in Identity Specification? I'm thinking this has to have been done many times over and my google skills are just failing me tonight.
My first thought is to use a separate table to manage the IDs and use a trigger on the target table to manage setting the ID. Concurrency issues are obviously important, but insert performance is not critical in this case.
And here are some gotchas I know I need to look out for:
Need to make sure the same ID isn't doled out more than once when
multiple processes run simultaneously.
Need to make sure any solution to 1) doesn't cause deadlocks
Need to make sure the trigger works properly when multiple records are
inserted in a single statement; not only for one record at a time.
Need to make sure the trigger only sets the ID when it is not already
specified.
The reason for the last bullet point (and the whole reason I want to do this without an Identity Specification field in the first place) is because I want to seed multiple environments at different starting points and I want to be able to copy data between each of them so that the ID for a given record remains the same between environments (and I have to use integers; I cannot use GUIDs).
(Also yes, I could set identity insert on/off to copy data and still use a regular Identity Specification field but then it reseeds it after every insert. I could then use DBCC CHECKIDENT to reseed it back to where it was, but I feel the risk with this solution is too great. It only takes one time for someone to make a mistake and then when we realize it, it would be a real pain to repair the data... probably enough pain that it would have made more sense just to do what I'm doing now in the first place).
SQL Server 2012 introduced the concept of a SEQUENCE database object - something like an "identity" column, but separate from a table.
You can create and use sequence from your code, you can use the values in various place, and more.
See these links for more information:
Sequence numbers (MS Docs)
CREATE SEQUENCE statement (MS Docs)
SQL Server SEQUENCE basics (Red Gate - Joe Celko)
My DynamoDB table is quite large and I don't particularly want to dump the whole thing. There is one column that I want to test on, so I would like a dump of all of its values that I could have locally to code/test with. However I am not finding anything that lets me do this.
I found RazorSQL and it semi worked (in the sense that it let me pull down just one column of information from the table but it clearly didn't pull down all the data).
I also found a Data Pipeline Template on AWS but from what I can tell this will dump the entire table. I am relatively new to AWS so it's possible I'm not understanding something about pipelines properly.
I'm okay with writing to S3 because I can pull down all the data from there, but anything that gets to my local machine is fine by me
Thanks for the help!
UPDATE: This tutorial looks promising but I want to achieve this effect in a non-interactive method
I was wondering what you recommend for running a user upload system with s3. I plan on using MongoDB for storing metadata such as the uploader, size, etc. How should I go about storing the actual file in s3.
Here are some of my ideas, what do you think is the best? All of these examples would involve saving the metadata to MongoDB.
1.Should I just store all the files in a bucket?
2. Maybe organize them into dates (e.g. 6/8/2014/mypicture.png)?
3.Should I save them all in one bucket, but with an added string (such as d1JdaZ9-mypicture.png) to avoid duplicates.
4. Or should I generate a long string for a folder, and store the file in that folder. (to retain the original file name). e.g. sh8sb36zkj391k4dhqk4n5e4ndsqule6/mypicture.png
This depends primarily on how you intend to use the pictures and which objects/classes/modules/etc. in your code will actually deal with retrieving them.
If you find yourself wanting to do things like - "all user uploads on a particular day" - A simple naming convention with folders for the year, month and day along with a folder at the top level for the user's unique ID will solve the problem.
If you want to ensure uniqueness and avoid collisions in your bucket, you could generate a unique string too.
However, since you've got MongoDB which (i'm assuming) will actually handle these queries for user uploads by date, etc., it makes the choice of your bucket more aesthetic than functional.
If all you're storing in mongoDB is the key/URL, it doesn't really matter what the actual structure of your bucket is. Nevertheless, it makes sense to still split this up in some coherent way - maybe group all a user's uploads and give each a unique name (either generate a unique name or prefix a unique prefix to the file name).
That being said, do you think there might be a point when you might look at changing how your images are stored? You might move to a CDN. A third party might come up with an even cheaper/better product which you might want to try. In a case like that, simply storing the keys/URLs in your MongoDB is not a good idea since you'll have to update every entry.
To make this relatively future-proof, I suggest you give your uploads a definite structure. I usually opt for:
bucket_name/user_id/yyyy/mm/dd/unique_name.jpg
Your database then only needs to store the file name and the upload time stamp.
You can introduce a middle layer in your logic (a new class perhaps or just a helper function/method) which then generates the URL for a file based on this info. That way, if you change your storage method later, you only need to make a small change in this middle layer (after migrating your files of course) and not worry about MongoDB.
I'm new to ElasticSearch, so this is probably something quite trivial, but I haven't figured out anything better that fetching everything, processing with a script and updating the registers one by one.
I want to make something like a simple SQL update:
UPDATE RECORD SET SOMEFIELD = SOMEXPRESSION
My intent is to replace the actual bogus data with some data that makes more sense (so the expression is basically randomly choosing from a pool of valid values).
There are a couple of open issues about making possible to update documents by query.
The technical challenge is that lucene (the text search engine library that elasticsearch uses under the hood) segments are read only. You can never modify an existing document. What you need to do is delete the old version of the document (which by the way will only be marked as deleted till a segment merge happens) and index the new one. That's what the existing update api does. Therefore, an update by query might take a long time and lead to issues, that's why it's not released yet. A mechanism that allows to interrupt running queries would be a nice to have too for this case.
But there's the update by query plugin that exposes exactly that feature. Just beware of the potential risks before using it.
The obvious way is to load up JDBC support from Clojure Contrib and write some function to translate a map/struct to a table. One drawback of this is that it isn't very flexible; changes to your structure will require DDL changes. This implies either writing DDL generation (tough) or hand-coding migrations (boring).
What alternatives exist? Answers must be ACID, ruling out serializing to a file, etc.
FleetDB is a database implemented in Clojure. It has a very natural syntax for working with maps/structs, e.g. to insert:
(client ["insert" "accounts" {"id" 1, "owner" "Eve", "credits" 100}])
Then select
(client ["select" "accounts" {"where" ["=" "id" 1]}])
http://fleetdb.org/
One option for persisting maps in Clojure that still uses a relation database is to store the map data in an opaque blob. If you need the ability to search for records you can store indexes in separate tables. For example you can read how FriendFeed is storing schemaless data on top of MySQL - http://bret.appspot.com/entry/how-friendfeed-uses-mysql
Another option is to use the Entity-Attribute-Value model (EAV) for storing data in a database. You can read more about EAV on Wikipedia (I'd post a link but I'm a new user and can only post one link).
Yet another option is to use BerkeleyDB for Java - it's a native Java solution providing ACID and record level locking. (Same problem with posting a link).
Using CouchDB's Java-client lib and clojure.contrib.json.read/write works reasonably well for me. CouchDB's consistency guarantees may not be strong enough for your purposes, though.
Clj-record is an implementation of active record in clojure that may be of interest to you.
You could try one of the Java-based graph databases, such as Neo4J. It might be easy to code up a hashmap interface to make it reasonably transparent.
MongoDB and it's framework congomongo (lein: [congomongo "0.1.3-SNAPSHOT"]) works for me. It's incredible nice with the schemaless databases, and congomongo is quite easy to get along with. MongoDB adds an _id-field in every document to keep it identified, and there is quite good transparency between clojure-maps and mongo-maps.
https://github.com/somnium/congomongo
EDIT: I would not use MongoDB today. I would suggest you use transit. I would use JSON if the backend (Postgres etc) support it or the msgpack coding if you want to have a more compact binary encoding.