What's a good way to store ActiveRecord queries (not the results of the query)? - ruby-on-rails-4

I have certain conditions in my application that I need to query for regularly (like all users that have signed up in the last 24 hours), and I want to store that query for later use. What's a good way to store the query itself, not the results of the query?
A few different ways I've thought of:
Hash that maps keys to symbols that I use as functions, each function is defined on the User object, and its implementation defines the query I want.
Hash that maps keys to raw SQL queries.
Global hash of singleton methods that stores each query.

Related

AppSync $util.autoId() and DynamoDB Partition and Sort Keys Design Questions

The limits for partition and sort keys of dynamoDB are such that if I want to create a table with lots of users (e.g. the entire world population), then I can't just use a unique partition key to represent the personId, I need to use both partition key and sort key to represent a personId.
$util.autoId() in AppSync returns a 128-bit String. If I want to use this as the primary key in the dynamoDB table, then I need to split it into two Strings, one being the partition key and the other being the sort key.
What is the best way to perform this split? Or if this is not the best way to approach the design, how should I design it instead?
Also, do the limits on partition and sort keys apply to secondary indexes as well?
Regarding $util.autoId(), since it's generated randomly, if I call it many times, is there a chance that it will generate two id's that are exactly the same?
I think I'm misunderstanding something from your question's premise because to my brain, using AppSync's $util.autoId() gives you back a 128 bit UUID. The point of UUIDs is that they're unique, so you can absolutely have one UUID per person in the world. And the UUID string will definitely fit within the maximum character length limits of Dynamo's partition key requirements.
You also asked:
if I call it many times, is there a chance that it will generate two
id's that are exactly the same?
It's extremely unlikely.

Do all MapReduce implementations take a keys paramater input into the reduce function?

Previously I asked what the use case for passing a list of keys to CouchDB's reduce function was; the answer (https://stackoverflow.com/a/46713674/3114742) mentions two potential use-cases:
From a design perspective you may want to work with keys emitted by a map function
You may be calculating something based on key input (such as how often a particular key appears)
Do all implementations of MapReduce take an array of keys as input to the reduce functions? CouchDB specifically keeps track of the original document that produces a key. i.e. the input to a CouchDB reduce function:
function(keys, values, rereduce) {...}
The keys arg looks like this: [[key1,id1], [key2,id2], [key3,id3]].
i.e. Couch keeps track of the entity that emitted the key as a result of the Map function, even in the reduce function. Do other MapReduce implementations keep track of this information? Or is this specific to CouchDB...
Not all mapreduce implementation has the same structure as in couchdb.
For example in mongodb mapreduce, there is just a single key and list of values unlike the couch db. So, all the keys that is emitted by map function is grouped and passed as just one key and list of values to reduce function.
Example:
emit(1,10)
emit(1,20)
will be grouped to
reduce(1,[10,20])

Can we use variables in Siddhi SQL statements?

I'm going to use the same value in lots of statements in the SQL Expression. So it is possible to declare and assign the value to a variable at the beginning of the query and refer the value by it?
(I'm writing an execution plan in WSO2 DAS)
This is not supported as of now. However, supporting this has been under discussion, hence this might be implemented in a future release.
If you want to store a value and use it in a query, the currently available ways are:
Putting that value into an indexed event table and then doing a join with the event table to read that value whenever required.
Indexed In-memory Event Table internally uses a Hash-Map, therefore you could use one to store your variables, in such a way that the key of the hashmap will be the name of your varaible and the value of the hashmap will be the value of your variable.
However I feel that above solution is too complicated for your requirement.
Using the Map Extension in Siddhi

What is the best way to safely assign a unique ID number to each entity a user creates?

My users can create entries, each of which I want to automatically assign an ID number to. The entries are stored in a DynamoDB table and are created via lambda functions and api gateway.
The ID number must be unique and the process of assigning it must be robust and garuntee uniqueness.
My thinking now is to use a "global" variable that starts at 1 and every time an entry is created it assigns that entries ID to the global variables value, then increments the global variables value.
Would this approach work, and if so what would the best approach be to implement it? Can you think of a better approach?
Your solution will not scale.
Having a global variable will need you to be increment its value in concurrent safe manner(to avoid race conditions) and will also need persistence for the variable to support increments after application restarts.
To avoid this exact problem, one of the patterns used is to use UUID as your key. Dynamo DB's java sdk supports this pattern by providing a custom annotation #DynamoDBAutoGeneratedKey. This ensures each entry has a random identifier generated for itself.
You should be able to get a library to generate UUID, if your preferred language in not Java.

What is the best way to store a relation in main memory?

I am working on an application which is a mini DBMS design for evaluating SPJ queries. The program is being implemented in C++.
When I have to process a query for joins and group-by, I need to maintain a set of records in the main memory. Thus, I have to maintain temporary tables in main memory for executing the queries entered by the user.
My question is, what is the best way to achieve this in C++? What data structure do I need to make use of in order to achieve this?
In my application, I am storing data in binary files and using the Catalog (which contains the schema for all the existing tables), I need to retrieve data and process them.
I have only 2 datatypes in my application: int (4 Bytes) and char (1 Byte)
I can use std:: vector. In fact, I tried to use vector of vectors: the inner vector is used for storing attributes, but the problem is there can be many relations existing in the database, and each of them may be any number of attributes. Also, each of these attributes can be either an int or a char. So, I am unable to identify what is the best way to achieve this.
Edit
I cannot use a struct for the tables because I do not know how many columns exist in the newly added tables, since all tables are created at runtime as per the user query. So, a table schema cannot be stored in a struct.
A Relation is a Set of Tuples (and in SQL, a Table is a Bag of Rows). Both in Relational Theory and in SQL, all tuples (/rows) in a relation (/table) "comply to the heading".
So it is interesting to make an object to store relations (/tables) consist of two components: an object of type "Heading" and a Set (/Bag) object containing the actual tuples (/rows).
The "Heading" object is itself a Mapping of attribute (/column) names to "declared data types". I don't know C, but in Java it might be something like Map<AttributeName,TypeName> or Map<AttributeName,Type> or even Map<String,String> (provided you can use those Strings to go get the actual 'Type' objects from wherever they reside).
The set of tuples (/rows) consists of members that are all a Mapping of attribute (/column) names to attribute Values, which are either int or String, in your case. Biggest problem here is that this suggests that you need something like Map<AttributeName,Object>, but you might get into trouble over your int's not being an object.
As a generic container for any table rows, I'd most likely use std::vector (as pointed out by Iarsmans). As for the table columns, I'd most likely define those with structs representing the table schema. For example:
struct DataRow
{
int col1;
char col2;
};
typedef std::vector<DataRow> DataTable;
DataTable t;
DataRow dr;
dr.col1 = 1;
dr.col2 = 'a';
t.push_back(dr);