I get it that in a pure sense, JSON doesn't account for tuples, but I don't think it's unreasonable to treat tuples as lists in terms of the JSON encoding. \Has anyone else faced and resolved this? I'd like to stay out of the business of pre-processing my error data to replace tuples with lists.
Perhaps I need to specify a different serialization approach?
EDIT: here is a practical example:
Here is some toy code.
the_data = {:ok, %{...}}
Sentry.capture_message(message, the_data)
All it does is attempt to send a message to Sentry with tuples in the data.
If you're unfamiliar with Sentry, the sentry-elixir library provides two functions (among many other, of course) that are used to explicitly send either exceptions or messages to Sentry. The functions are:
Sentry.capture_exception/2
Sentry.capture_message/2
In addition, errors are sent to Sentry when they bubble up to the "top". These can't be intercepted so I have to specify (and implement) a before_send_event "handler" in the configuration for Sentry.
This is what my configuration looks like for the environment I'm working in:
config :sentry,
dsn: "https://my_neato_sentry_key#sentry.io/33333343",
environment_name: :staging,
enable_source_code_context: true,
root_source_code_path: File.cwd!(),
tags: %{
env: "staging"
},
before_send_event: {SomeApplication.Utils.SentryLogger, :before_send},
included_environments: [:staging],
filter: SomeApplication.SentryEventFilter
My before_send function basically attempts to sanity check the data and replace all tuples with lists. I haven't implemented this entirely yet though and instead of replacing all tuples I am temporarily using Kernel.inspect/2 to convert it to a string. This isn't ideal of course, because they I can't manipulate the data in the Sentry views:
def before_send(sentry_event) do
IO.puts "------- BEFORE SEND TWO ---------------------------"
sentry_event
|> inspect(limit: :infinity)
end
This results in the following output:
{:invalid, {:ok, the_data}}
And the capture_message fails.
By default, sentry uses jason to encode its JSONs and, again by default, jason doesn't encode tuples. You can change that by implementing Jason.Encoder for Tuple:
defimpl Jason.Encoder, for: Tuple do
def encode(tuple, opts) do
Jason.Encode.list(Tuple.to_list(tuple), opts)
end
end
Be warned - this will have a global effect on how tuples are converted to JSON in your application.
Related
we have a module that builds a security proxy that hosts an elasticsearch site using terraform. In its code there is this;
elastic_search_endpoint = "${element(concat(module.es_cluster.elasticsearch_endpoint, list("")),0)}"
which as I understand, then goes and finds the es_cluster module and gets the elasticsearch endpoint that was outputted from that. This then allows the proxy to have this endpoint available so it can run elasticsearch.
But I don't actually understand what this piece of code is doing and why the 'element' and 'concat' functions are there. Why can't it just be like this?
elastic_search_endpoint = "${module.es_cluster.elasticsearch_endpoint}"
Let's break this up and see what each part does.
It's not shown in the example, but I'm going to assume that module.es_cluster.elasticsearch_endpoint is an output value that is a list of eitehr zero or one ElasticSearch endpoints, presumably because that module allows disabling the generation of an ElasticSearch endpoint.
If so, that means that module.es_cluster.elasticsearch_endpoint would either be [] (empty list) or ["es.example.com"].
Let's consider the case where it's a one-element list first: concat(module.es_cluster.elasticsearch_endpoint, list("")) in that case will produce the list ["es.example.com", ""]. Then element(..., 0) will take the first element, giving "es.example.com" as the final result.
In the empty-list case, concat(module.es_cluster.elasticsearch_endpoint, list("")) produces the list [""]. Then element(..., 0) will take the first element, giving "" as the final result.
Given all of this, it seems like the intent of this expression is to either return the one ElasticSearch endpoint, if available, or to return an empty string as a placeholder if not.
I expect this is written this specific way because it was targeting an earlier version of the Terraform language which had fewer features. A different way to write this expression in current Terraform (v0.14 is current as of my writing this) would be:
elastic_search_endpoint = (
length(module.es_cluster.elasticsearch_endpoint) > 0 ? module.es_cluster.elasticsearch_endpoint : ""
)
It's awkward that this includes the full output reference twice though. That might be justification for using the concat approach even in modern Terraform, although arguably the intent wouldn't be so clear to a future reader:
elastic_search_endpoint = (
concat(module.es_cluster.elasticsearch_endpoint, "")[0]
)
Modern Terraform also includes the possibility of null values, so if I were writing a module like yours today I'd probably prefer to return a null rather than an empty string, in order to be clearer that it's representing the absense of a value:
elastic_search_endpoint = (
length(module.es_cluster.elasticsearch_endpoint) > 0 ? module.es_cluster.elasticsearch_endpoint : null
)
elastic_search_endpoint = (
concat(module.es_cluster.elasticsearch_endpoint, null)[0]
)
First things first: who wrote that code? Why is not documented? Ask the guy!
Just from that code... There's not much to do. I'd say that since concat expects two lists, module.es_cluster.elasticsearch_endpoint is a list(string). Also, depending on some variables, it might be empty. Concatenating an empty string will ensure that there's something at 0 position
So the whole ${element(concat(module.es_cluster.elasticsearch_endpoint, list("")),0)} could be translated to length(module.es_cluster.elasticsearch_endpoint) > 0 ? module.es_cluster.elasticsearch_endpoint[0] : "" (which IMHO is much readable)
Why can't it just be like this?
elastic_search_endpoint = "${module.es_cluster.elasticsearch_endpoint}"
Probably because elastic_search_endpoint is an string and, as mentioned before, module.es_cluster.elasticsearch_endpoint is a list(string). You should provide a default value in case the list is empty
I'm trying to figure out what are the ways (and which of them the best one) of extraction of Values for predefined Keys in the unstructured text?
Input:
The doctor prescribed me a drug called favipiravir.
His name is Yury.
Ilya has already told me about that.
The weather is cold today.
I am taking a medicine called nazivin.
Key list: ['drug', 'name', 'weather']
Output:
['drug=favipiravir', 'drug=nazivin', 'name=Yury', 'weather=cold']
So, as you can see, in the 3d sentence there is no explicit key 'name' and therefore no value extracted (I think there is the difference with NER). At the same time, 'drug' and 'medicine' are synonyms and we should treat 'medicine' as 'drug' key and extract the value also.
And the next question, what if the key set will be mutable?
Should I use as a base regexp approach because of predefined Keys or there is a way to implement it with supervised learning/NN? (but in this case how to deal with mutable keys?)
You can use a parser to tag words. Your problem is similar to Named Entity Recognition (NER). A lot of libraries, like NLTK in Python, have POS taggers available. You can try those. They are generally trained to identify names, locations, etc. Depending on the type of words you need, you may need to train the parser. So you'll need some labeled data also. Check out this link:
https://nlp.stanford.edu/software/CRF-NER.html
I want to store arbitrary key value pairs. For example,
{:foo "bar" ; string
:n 12 ; long
:p 1.2 ; float
}
In datomic, I'd like to store it as something like:
[{:kv/key "foo"
:kv/value "bar"}
{:kv/key "n"
:kv/value 12}
{:kv/key "p"
:kv/value 1.2}]
The problem is :kv/value can only have one type in datomic. A solution is to to split :kv/value into :kv/value-string, :kv/value-long, :kv/value-float, etc. It comes with its own issues like making sure only one value attribute is used at a time. Suggestions?
If you could give more details on your specific use-case it might be easier to figure out the best answer. At this point it is a bit of a mystery why you may want to have an attribute that can sometimes be a string, sometimes an int, etc.
From what you've said so far, your only real answer it to have different attributes like value-string etc. This is like in a SQL DB you have only 1 type per table column and would need different columns to store a string, integer, etc.
As your problem shows, any tool (such as a DB) is designed with certain assumptions. In this case the DB assumes that each "column" (attribute in Datomic) is always of the same type. The DB also assumes that you will (usually) want to have data in all columns/attrs for each record/entity.
In your problem you are contradicting both of these assumptions. While you can still use the DB to store information, you will have to write custom functions to ensure only 1 attribute (value-string, value-int, etc) is in use at one time. You probably want custom insertion functions like "insert-str-val", "insert-int-val", etc, as well as custom read functions "read-str-val" etc al. It might be also a good idea to have a validation function that could accept any record/entity and verify that exactly one-and-only-one "type" was in use at any given time.
You can emulate a key-value store with heterogenous values by making :kv/key a :db.unique/identity attribute, and by making :kv/value either bytes-typed or string-typed and encoding the values in the format you like (e.g fressian / nippy for :db.types/bytes, edn / json for :db.types/string). I advise that you set :db/index to false for :kv/value in this case.
Notes:
you will have limited query power, as the values will not be indexed and will need to be de-serialized for each query.
If you want to run transaction functions which read or write the values (e.g for data migrations), you should make your encoding / decoding library available to the Transactor as well.
If the values are large (say, over 20kb), don't store them in Datomic; use a complementary storage service like AWS S3 and store a URL.
I wrote this list comprehension to export pandas Data Frames to CSV files (each data frame is written to a different file):
[v.to_csv(str(k)+'.csv') for k,v in df_dict.items()]
The pandas Data Frames are the values of a dictionary where the keys will be the part of the CSV file names. So in the code above v are the Data Frames, and k are strings to which the Data Frames are mapped to.
A colleague said that using list comprehensions is not a good idea for writing to output files. Why would that be? Moreover, he said that using a for loop for this would be more reliable. If true, why is that so?
A colleague said that using list comprehensions is not a good idea for writing to output files. Why would that be?
List comprehensions are usually more performant and readable than for loops when you are building a list (i.e., using append to generate a list with a for loop).
In other cases, like yours, a for loop is preferred when you want the "side effect" of an iteration.
Moreover, he said that using a for loop for this would be more reliable. If true, why is that so?
A for loop is more readable and relevant for this use case, IMHO, and should therefore be preferred:
for k,v in df_dict.items():
v.to_csv(str(k)+'.csv')
d = random,randint(1,30)
data = [d, strftime("%Y%m%d %H%M%S", gmtime())] #random num , system time
client.publish("gas", str(data)]
This is a part of my python code which is ver2.
I'm trying to send a list using MQTT.
However, If I write bytearray instead of str which is third line
It says "ValueError: string must be of size 1".
So I wrote str then make it sting type
Can I send a just list which is NOT string type.
MQTT message payloads are just byte arrays, there is no inherent format to them. Strings tend to works as long as both ends of the transaction are using the same character encoding.
If you want to send structured data (such as the ost) then you need to decide on a way to encode that structure so the code receiving the message will know how to reconstruct it.
The current usual solution to this problem is to encode structures are JSON, but XML or something like protobuffers are also good candidates.
The following question has some examples of converting Python lists to JSON objects
Serializing list to JSON