terraform "element" and "concat" used together - amazon-web-services

we have a module that builds a security proxy that hosts an elasticsearch site using terraform. In its code there is this;
elastic_search_endpoint = "${element(concat(module.es_cluster.elasticsearch_endpoint, list("")),0)}"
which as I understand, then goes and finds the es_cluster module and gets the elasticsearch endpoint that was outputted from that. This then allows the proxy to have this endpoint available so it can run elasticsearch.
But I don't actually understand what this piece of code is doing and why the 'element' and 'concat' functions are there. Why can't it just be like this?
elastic_search_endpoint = "${module.es_cluster.elasticsearch_endpoint}"

Let's break this up and see what each part does.
It's not shown in the example, but I'm going to assume that module.es_cluster.elasticsearch_endpoint is an output value that is a list of eitehr zero or one ElasticSearch endpoints, presumably because that module allows disabling the generation of an ElasticSearch endpoint.
If so, that means that module.es_cluster.elasticsearch_endpoint would either be [] (empty list) or ["es.example.com"].
Let's consider the case where it's a one-element list first: concat(module.es_cluster.elasticsearch_endpoint, list("")) in that case will produce the list ["es.example.com", ""]. Then element(..., 0) will take the first element, giving "es.example.com" as the final result.
In the empty-list case, concat(module.es_cluster.elasticsearch_endpoint, list("")) produces the list [""]. Then element(..., 0) will take the first element, giving "" as the final result.
Given all of this, it seems like the intent of this expression is to either return the one ElasticSearch endpoint, if available, or to return an empty string as a placeholder if not.
I expect this is written this specific way because it was targeting an earlier version of the Terraform language which had fewer features. A different way to write this expression in current Terraform (v0.14 is current as of my writing this) would be:
elastic_search_endpoint = (
length(module.es_cluster.elasticsearch_endpoint) > 0 ? module.es_cluster.elasticsearch_endpoint : ""
)
It's awkward that this includes the full output reference twice though. That might be justification for using the concat approach even in modern Terraform, although arguably the intent wouldn't be so clear to a future reader:
elastic_search_endpoint = (
concat(module.es_cluster.elasticsearch_endpoint, "")[0]
)
Modern Terraform also includes the possibility of null values, so if I were writing a module like yours today I'd probably prefer to return a null rather than an empty string, in order to be clearer that it's representing the absense of a value:
elastic_search_endpoint = (
length(module.es_cluster.elasticsearch_endpoint) > 0 ? module.es_cluster.elasticsearch_endpoint : null
)
elastic_search_endpoint = (
concat(module.es_cluster.elasticsearch_endpoint, null)[0]
)

First things first: who wrote that code? Why is not documented? Ask the guy!
Just from that code... There's not much to do. I'd say that since concat expects two lists, module.es_cluster.elasticsearch_endpoint is a list(string). Also, depending on some variables, it might be empty. Concatenating an empty string will ensure that there's something at 0 position
So the whole ${element(concat(module.es_cluster.elasticsearch_endpoint, list("")),0)} could be translated to length(module.es_cluster.elasticsearch_endpoint) > 0 ? module.es_cluster.elasticsearch_endpoint[0] : "" (which IMHO is much readable)
Why can't it just be like this?
elastic_search_endpoint = "${module.es_cluster.elasticsearch_endpoint}"
Probably because elastic_search_endpoint is an string and, as mentioned before, module.es_cluster.elasticsearch_endpoint is a list(string). You should provide a default value in case the list is empty

Related

Adding a node and edge to a graph using Gremlin behaving strange

I'm new to using Gremlin (up until now I was accessing Neptune using Opencypher and given up due to how slow it was) and I'm getting really confused over some stuff here.
Basically what I'm trying to do is -
Let us say we have some graph A-->B-->C. There are multiple such graphs in the database, so I'm looking for the specific A,B,C nodes that have the property 'idx' equals '1'. I want to add a node D{'idx' = '1'} and an edge so I will end up having
A-->B-->C-->D
It is safe to assume A,B,C already exist and are connected together.
Also, we wish to add D only if it doesn't already exist.
So what I currently have is this:
g.V().
hasLabel('A').has('idx', '1').
out().hasLabel('B').has('idx', '1').
out().hasLabel('C').has('idx', '1').as('c').
V().hasLabel('D').has('idx', '1').fold().
coalesce(
unfold(),
addV('D').property('idx','1')).as('d').
addE('TEST_EDGE').from('c').to('d')
now the problem is that well, this doesn't work and I don't understand Gremlin enough to understand why. This returns from Neptune as "An unexpected error has occurred in Neptune" with the code "InternalFailureException"
another thing to mention is that if the node D does exist, I don't get an error at all, and in fact th node is properly connected to the graph as it should.
furthermore, I've seen in a different post that using ".as('c')" shouldn't work since there is a 'fold' action afterwards which makes it unusable (for a reason I still don't understand, probably cause I'm not sure how this entire .as,.store,.aggregate work)
And suggests using ".aggregate('c')" instead, but doing so will change the returned error to "addE(TEST_EDGE) could not find a Vertex for from() - encountered: BulkSet". This, adding to the fact that the code I wrote actually works and connects node D to the graph if it already exists, makes me even more confused.
So I'm lost
Any help or clarification or explanation or simplification would be much appreciated
Thank you! :)
A few comments before getting to the query:
If the intent is to have multiple subgraphs of (A->B->C), then you may not want to use this labeling scheme. Labels are meant to be of lower variation - think of labels as groups of vertices of the same "type".
A lookup of a vertex by an ID is the fastest way to find a vertex in a TinkerPop-based graph database. Just be aware of that as you build your access patterns. Instead of doing something like `hasLabel('x').has('idx','y'), if both of those items combined make a unique vertex, you may also want to think of creating a composite ID of something like 'x-y' for that vertex for faster access/lookup.
On the query...
The first part of the query looks good. I think you have a good understanding of the imperative nature of Gremlin just up until you get to the second V() in the query. That V() is going to tell Neptune to start evaluating against all vertices in the graph again. But we want to continue evaluating beyond the 'C' vertex.
Unless you need to return an output in either case of existence or non-existence, you could get away with just doing the following without a coalesce() step:
g.V().
hasLabel('A').has('idx', '1').
out().hasLabel('B').has('idx', '1').
out().hasLabel('C').has('idx', '1').
where(not(out().hasLabel('D').has('idx','1'))).
addE('TEST_EDGE).to(
addV('D').property('idx','1'))
)
The where clause allows us to do the check for the non-existence of a downstream edge and vertex without losing our place in the traversal. It will only continue the traversal if the condition specified is not() found in this case. If it is not found, the traversal continues with where we left off (the 'C' vertex). So we can feed that 'C' vertex directly into an addE() step to create our new edge and new 'D' vertex.

AWS DynamoDB Scan Filter Expression Returning Empty

I'm stumped trying to figure out why my scan won't return anything but [ ]. Here are my scan params:
var params = {
TableName: tableName,
FilterExpression: "#wager = :wager",
ExpressionAttributeNames: {
"#wager": "wager"
},
ExpressionAttributeValues: {
":wager": wager
}
};
My DynamoDB table works perfectly when I run a filter expression in the DynamoDB dashboard, like "wager [NUMBER] = 0.001".
jellycsc and Seth Geoghegan already mentioned the two most likely explanations in comments:
First, make sure you do not call a single Scan operation, but rather do a loop to get all the pages of the scan result. The specific way to do this depends on which programing language you are using. When your filter leaves only a small subset of the results (e.g., only when wage is exactly 0.001) remember to read all the pages is critical, because the first page may be empty: DynamoDB might have read 1MB of items (the default page size), and none of them matched wager=0.001 so an empty first page is return.
Second, the wager might have a wrong type. Obviously, if you store numbers but search for a string, nothing would match, so check you didn't do that. But a more subtle problem can be how you store numbers. DynamoDB holds floating-point in an unusual manner - using decimal, not binary, digits. This means that DynamoDB can hold the number "0.001" precisely, without any rounding errors. The same cannot be said for most programming languages. On my machine, if I set a "double" variable to 0.001, the result is 0.0010000000000000000208. If I pass this to DynamoDB, the equality check will not match! This means you should make sure that wager variable is not a double. In Python, for example, wager should be set to Decimal("0.001") - note how it is constructed from the string "0.001", not from the floating-point 0.001 which already has rounding errors.
thanks for the ideas everyone. it did indeed turn out to be a type issue - all I had to do was cast "wager" as
wager = Number(wager);
ahead of setting the scan params (the same params I have in the question worked).

Terraform Interpolation in outputs

I create an AWS RDS instance with different KMS CMKs depending on whether or not the environment is Production or Non-Production. So I have two resources that use the terraform count if:
count = "${var.bluegreen == "nonprod" ? 1 : 0}"
This spins up an RDS instance with different KMS keys with different addresses. I need to capture that endpoint (which I do with terraform show after the build finishes) so why doesn't this work in Terraform?
output "rds_endpoint" {
value = "${var.bluegreen == "nonprod" ? aws_db_instance.rds_nonprod.address : aws_db_instance.rds_prod.address}"
}
It is an error to access attributes of a resource that has count = 0, and unfortunately Terraform currently checks both "sides" of a conditional during its check step, so expressions like this can fail. Along with this, there is a current behavior that errors in outputs are not explicitly shown since outputs can get populated when the state isn't yet complete (e.g. as a result of using -target). These annoyances all sum up to a lot of confusion in this case.
Instead of using a conditional expression in this case, it works better to use "splat expressions", which evaluate to an empty list in the case where count = 0. This would looks something like the following:
output "rds_endpoint" {
value = "${element(concat(aws_db_instance.rds_nonprod.*.address, aws_db_instance.rds_prod.*.address), 0)}"
}
This takes the first element of a list created by concatenating together all of the nonprod addresses and all of the prod addresses. Due to how you've configured count on these resource blocks, the resulting list will only ever have one element and so it will just take that element.
In general, to debug issues with outputs it can be helpful to evaluate the expressions in terraform console, or somewhere else in a config, to bypass the limitation that errors are silently ignored on outputs.

OCaml dictionary update

I am new to OCaml and am trying to learn how to update dictionary and deal with if/else conditionals.
I wrote the following code to check whether the dictionary has some key. If not, add a default value for that key. Finally print it out.
module MyUsers = Map.Make(String)
let myGraph = MyUsers.empty;;
let myGraph = MyUsers.add "test" "preset" myGraph in
try
let mapped = MyUsers.find "test" myGraph
with
Not_found -> let myGraph = MyUsers.add "test" "default" myGraph in
Printf.printf "value for the key is now %s\n" (MyUsers.find "test" myGraph)
The error message I have now is syntax error for line 6: with
What is wrong here? Also, when to use in ; or;; ?
I have done some google searches and understand that in seems to define some scope before the next ;;. But it is still very vague to me. Could you please explain it more clearly?
Your immediate problem is that, except at the top level, let must be followed by in. The expression looks like let variable = expression1 in expression2. The idea is that the given variable is bound to the value of expression1 in the body of expression2. You have a let with no in.
It's hard to answer your more general question. It's easier to work with some specific code and a specific question.
However, a semicolon ; is used to separate two values that you want to be evaluated in sequence. The first should have type unit (meaning that it doesn't have a useful value).
In my opinion, the double semicolon ;; is used only in the top-level to tell the interpreter that you're done typing and that it should evaluate what you've given it so far. Some people use ;; in actual OCaml code, but I do not.
Your code indicates strongly that you're thinking imperatively about OCaml maps. OCaml maps are immutable; that is, you can't change the value of a map. You can only produce a new map with different contents than the old one.

Is it possible to detect and handle string collisions among grouped values when grouping in Hadoop Pig?

Assuming I have lines of data like the following that show user names and their favorite fruits:
Alice\tApple
Bob\tApple
Charlie\tGuava
Alice\tOrange
I'd like to create a pig query that shows the favorite fruit of each user. If a user appears multiple times, then I'd like to show "Multiple". For example, the result with the data above should be:
Alice\tMultiple
Bob\tApple
Charlie\tGuava
In SQL, this could be done something like this (although it wouldn't necessarily perform very well):
select user, case when count(fruit) > 1 then 'Multiple' else max(fruit) end
from FruitPreferences
group by user
But I can't figure out the equivalent PigLatin. Any ideas?
Write a "Aggregate Function" Pig UDF (scroll down to "Aggregate Functions"). This is a user-defined function that takes a bag and outputs a scalar. So basically, your UDF would take in the bag, determine if there is more than one item in it, and transform it accordingly with an if statement.
I can think of a way of doing this without a UDF, but it is definitely awkward. After your GROUP, use SPLIT to split your data set into two: one in which the count is 1 and one in which the count is more than one:
SPLIT grouped INTO one IF COUNT(fruit) == 0, more IF COUNT(fruit) > 0;
Then, separately use FOREACH ... GENERATE on each to transform it:
one = FOREACH one GENERATE name, MAX(fruit); -- hack using MAX to get the item
more = FOREACH more GENERATE name, 'Multiple';
Finally, union them back:
out = UNION one, more;
I haven't really found a better way of handing the same data set in two different ways based on some conditional, like you want. I typically do some sort of split/recombine like I did here. I believe Pig will be smart and make a plan that doesn't use more than 1 M/R job.
Disclaimer: I can't actually test this code at the moment, so it may have some mistakes.
Update:
In looking harder, I was reminded of the bicond operator and I think that will work here.
b = FOREACH a GENERATE name, (COUNT(fruit)==1 ? MAX(FRUIT) : 'Multiple');