Can we use variables in Siddhi SQL statements? - wso2

I'm going to use the same value in lots of statements in the SQL Expression. So it is possible to declare and assign the value to a variable at the beginning of the query and refer the value by it?
(I'm writing an execution plan in WSO2 DAS)

This is not supported as of now. However, supporting this has been under discussion, hence this might be implemented in a future release.
If you want to store a value and use it in a query, the currently available ways are:
Putting that value into an indexed event table and then doing a join with the event table to read that value whenever required.
Indexed In-memory Event Table internally uses a Hash-Map, therefore you could use one to store your variables, in such a way that the key of the hashmap will be the name of your varaible and the value of the hashmap will be the value of your variable.
However I feel that above solution is too complicated for your requirement.
Using the Map Extension in Siddhi

Related

DynamoDB Local Secondary Index vs Global Secondary Index

I have been reading the Amazon DynamoDB documentation to compare Global Secondary Index (GSI) and Local Secondary Index (LSI). I am still unclear that in the below use case, does it matter to me what I use? I am familiar with things like LSI ought to use the same partition key etc.
Here is the use case:
I already know the sort key for my index.
My partition key is the same in both cases
I want to project ALL the attributes from original table onto my index
I know already prior to creating the table what index I need for my use case.
In the above use case, there is absolutely no difference apart from minor latency gain in LSI Vs GSI because LSI might end up in the same shard. I want to understand the Pro Vs Con in my use case.
Here are some questions that I am trying to find the answer to and I have not encountered a blog that is explicit about these:
Use GSI only because the partition key is different?
Use GSI even if the partition key is same but I did not know during table creation that I would need such an index?
Are there any other major reasons where one is superior than the other (barring basic stuff like count limit of 5 vs 20 and all).
There are two more key differences that are not mentioned. You can see a full comparison between the two index types in the official documentation.
If you use a LSI, you can have a maximum of 10 Gb of data per partition key value (table plus all LSIs). For some use cases, this is a deal breaker. Before you use a LSI, make sure this isn’t the case for you.
LSIs allow you to perform strongly consistent queries. This is the only real benefit of using a LSI.
The AWS general guidelines for indexes say
In general, you should use global secondary indexes rather than local secondary indexes. The exception is when you need strong consistency in your query results, which a local secondary index can provide but a global secondary index cannot (global secondary index queries only support eventual consistency).
You may also find this SO answer a helpful discussion about why you should prefer a GSI over a LSI.

What is the best way to safely assign a unique ID number to each entity a user creates?

My users can create entries, each of which I want to automatically assign an ID number to. The entries are stored in a DynamoDB table and are created via lambda functions and api gateway.
The ID number must be unique and the process of assigning it must be robust and garuntee uniqueness.
My thinking now is to use a "global" variable that starts at 1 and every time an entry is created it assigns that entries ID to the global variables value, then increments the global variables value.
Would this approach work, and if so what would the best approach be to implement it? Can you think of a better approach?
Your solution will not scale.
Having a global variable will need you to be increment its value in concurrent safe manner(to avoid race conditions) and will also need persistence for the variable to support increments after application restarts.
To avoid this exact problem, one of the patterns used is to use UUID as your key. Dynamo DB's java sdk supports this pattern by providing a custom annotation #DynamoDBAutoGeneratedKey. This ensures each entry has a random identifier generated for itself.
You should be able to get a library to generate UUID, if your preferred language in not Java.

MySQL, C++ - Programmatically, How does MySQL Autoincrement Work?

From the latest source code (not certain if it's C or C++) of MySQL, how does it do an autoincrement? I mean, is it efficient in that it stores like a metadata resource on the table where it last left off, or does it have to do a table scan to find the greatest ID in use in the table? Also, do you see any negative aspects of using autoincrement when you look at how it's implemented versus, say, PostgreSQL?
That will depend on which engine the database is using. InnoDB is storing the largest value in memory and not on disk. Very efficient. I would guess most engines would do something similar, but cannot guarantee it.
InnoDB's Auto Increment Is going to run the below query once when DB is loaded and store the variable in memory:
SELECT MAX(ai_col) FROM t FOR UPDATE;
Comparing that to PostgreSQL's complete lack of an auto_increment depends on how you would implement the field yourself. (At least it lacked it last time I used it. They may have changed) Most would create a SEQUENCE. Which appears to be stored in an in memory pseudo-table. I'd take InnoDBs to be a simpler better way. I'd guess InnoDB would be more efficient if they are not equal.

Fastest C++ Container: Unique Values

I am writing an email application that interfaces with a MySQL database. I have two tables that are sourcing my data, one of which contains unsubscriptions, the other of which is a standard user table. As of now, I'm creating a vector of pointers to email objects, and storing all of the unsubscribed emails in it, initially. I then have a standard SQL loop in which I'm checking to see if the email is not in the unsubscribe vector, then adding it to the global send email vector. My question, is, is there a more efficient way of doing this? I have to search the unsub vector for every single email in my system, up to 50K different ones. Is there a better structure for searching? And, a better structure for maintaining a unique collection of values? Perhaps one that would simply discard the value if it already contains it?
If your C++ Standard Library implementation supports it, consider using a std::unordered_set or a std::hash_set.
You can also use std::set, though its overhead might be higher (it depends on the cost of generating a hash for the object versus the cost of comparing two of the objects several times).
If you do use a node based container like set or unordered_set, you also get the advantage that removal of elements is relatively cheap compared to removal from a vector.
Tasks like this (set manipulations) are better left to what is MEANT to execute them - the database!
E.g. something along the lines of:
SELECT email FROM all_emails_table e WHERE NOT EXISTS (
SELECT 1 FROM unsubscribed u where e.email=u.email
)
If you want an ALGORITHM, you can do this fast by retrieving both the list of emails AND a list of unsubscriptions as ORDERED lists. Then you can go through the e-mail list (which is ordered), and as you do it you glide along the unsubscribe list. The idea is that you move 1 forward in whichever list has the "biggest" current" element. This algo is O(M+N) instead of O(M*N) like your current one
Or, you can do a hash map which maps from unsubscribed e-mail address to 1. Then you do find() calls on that map whcih for correct hash implementations are O(1) for each lookup.
Unfortunately, there's no Hash Map standard in C++ - please see this SO question for existing implementations (couple of ideas there are SGI's STL hash_map and Boost and/or TR1 std::tr1::unordered_map).
One of the comments on that post indicates it will be added to the standard: "With this in mind, the C++ Standard Library Technical Report introduced the unordered associative containers, which are implemented using hash tables, and they have now been added to the Working Draft of the C++ Standard."
Store your email adresses in a std::set or use std::set_difference().
The best way to do this is within MySQL, I think. You can modify your users table schema with another column, a BIT column, for "is unsubscribed". Better yet: add a DATETIME column for "date deleted" with a default value of NULL.
If using a BIT column, your query becomes something like:
SELECT * FROM `users` WHERE `unsubscribed` <> 0b1;
If using a DATETIME column, your query becomes something like:
SELECT * FROM `users` WHERE `date_unsubscribed` IS NULL;

Multiple keys Hash Table (unordered_map)

I need to use multiple keys(int type) to store and retrieve a single value from a hash table. I would use multiple key to index a single item. I need fast insertion and look up for the hash table. By the way, I am not allowed to use the Boost library in the implementation.
How could I do that?
If you mean that two ints form a single key then unordered_map<std::pair<int,int>, value_type>. If you want to index the same set of data by multiple keys then look at Boost.MultiIndex.
If the key to your container is comprised of the combination of multiple ints, you could use boost::tuple as your key, to encapsulate the ints without more work on your part. This holds provided your count of key int subcomponents is fixed.
Easiest way is probably to keep a map of pointers/indexes to the elements in a list.
A few more details are needed here though, do you need to support deletion? how are the elements setup? Can you use boost::shared pointers? (rather helpful if you need to support deletion)
I'm assuming that the value object in this case is large, or there is some other reason you can't simply duplicate values in a regular map.
If its always going to be a combination for retrieval.
Then its better to form a single compound key using multiple keys.
You can do this either
Storing the key as a concatenated string of ints like
(int1,int2,int3) => data
Using a higher data type like uint64_t where in u can add individual values to form a key
// Refer comment below for the approach