Idiomatic Way to Set & Update Clojure Namespace Flags?

Idiomatic Way to Set & Update Clojure Namespace Flags? - clojure

I have a ring/compojure based web API and I need to be able to optionally turn on and off caching (or any flag for that matter) depending on a startup flag or having a param passed in to the request.
I tried having the flag set as a dynamic var:
(def ^:dynamic *cache* true)
(defmacro cache [source record options & body]
`(let [cachekey# (gen-cachekey ~source ~record ~options)]
(if-let [cacheval# (if (and (:ttl ~source) ~*cache*) (mc/generic-get cachekey#) nil)]
cacheval#
(let [ret# (do ~#body)]
(if (and (:ttl ~source) ~*cache*) (mc/generic-set cachekey# ret# :ttl (:ttl ~source)))
ret#))))
...but that only allows me to update the flag within a binding block which isn't ideal to wrap every data fetching function, and it din't allow me to optionally set the flag on start up
I then tried to set the flag in an atom, which allowed me to set the flag on the start up, and easily update the flag if a certain param was passed to the request, but the update would be changing the flag for all threads and not just the specific request's flag.
What's the most idiomatic way to do something like this in Clojure?

Firstly, unquoting *cache* in your macro definition means that it's compile-time value will be included in the compiled output and rebinding it at runtime will have no effect. If you want the value to be looked up at runtime, you should not unquote *cache*.
As for the actual question: if you want the various data fetching functions to react to a cache setting, you'll need to communicate it to them somehow anyway. Additionally, there are the two separate concerns of (1) computing the relevant flag values, (2) making them available to the handler so that it can communicate them to the functions which care.
Computing flag values and making them available to the main handler
For decisions on a per-request basis, examining some incoming parameters and settings, you might want to use a piece of middleware which will determine the correct values of the various flags and assoc them onto the request map. This way handlers living downstream from this piece of middleware will be able to look them up in the request map without knowing how they were computed.
You can of course install multiple pieces of middleware, each responsible for computing a different set of flags.
If you do use middleware, you'll likely want it to handle the default values. In this case, the note about setting defaults at startup in the section on dynamic Vars below may not be relevant.
Finally, if the application-level (global, thread-independent) defaults might change at runtime (as a result of a "turn off all caching" request, perhaps), you can store these in Atoms.
Communicating flag values to functions which care
First approach: dynamic Vars
Once you do that, you'll have to communicate the flags to the functions which actually perform operations where the flags are relevant; here dynamic Vars and explicit arguments are the most natural options.
Using a dynamic Var means that you don't have to do it explicitly for every function call involving such functions; instead, you can do it once per request, say. Installing a default value at startup is quite possible too; for example, you could use alter-var-root for that. (Or you could simply define the initial value of the Var in terms of information obtained from the environment.)
NB. if you launch new threads within the scope of a binding block, they will not see the bindings installed by this binding block automatically -- you'll have to arrange for them to be transmitted. The bound-fn macro is useful for creating functions which handle this automatically; see (doc bound-fn) for details.
The idea of using a single map with all flags described below is relevant here too, if perhaps not equally necessary for reasonable convenience; in essence, you'd be using a single dynamic Var instead of many.
Second approach: explicit arguments and flag maps
The other natural option is simply to pass in any relevant flags to the functions which need them. If you pass all the flags in a map, you can just assemble all options relevant to a request in a single map and pass it in to all the flag-aware functions without caring which flags any given function needs (as each function will simply examine the map for the flags it cares about, disregarding the others).
With this approach, you'll likely want to split the data fetching functionality into a function to get the value from the cache, a function to get the value from the data store and a flag-aware function which calls one of the other two depending on the flag value. This way you can, for example, test them separately. (Although if the individual functions really are completely trivial, I'd say it's ok to create only the flag-taking version at first; just remember to factor out any pieces which become more complex in the course of development.)

Related

Apply LLVM pass to a specific basic block

Is it possible to apply LLVM transformation pass to a specific basic block, instead of the whole IR?
I know how to apply a pass to the whole IR:
$ opt –S –instcombine test.ll –o out.ll
But there might be several basic blocks inside test.ll and I want to apply –instcombine to just one of them.

Generally, no. Some LLVM passes are written to work on whole modules, others on whole functions. Some are also safe to use for single basic blocks (more by chance than by design), but LLVM's pass interface deals with only the design unit (functions in case of function passes, modules in case of module passes). That is, function passes are given a function by the pass manager, and nothing else.

In Clojure, how can I secure the "resolve" function for user input?

My Problem
I'm currently writing a REST-API which is supposed to take JSON requests and work with an intern library we use. The main usage will be to either run the server with a web interface or to possibly use another language to work with the API since Clojure isn't common elsewhere.
In order to achieve this, the JSON request contains data and a functionname, which is run with resolve, since I'm supposed to make it so that we don't have to change the API each time a function is added/removed.
Now the actual question is: How can I make sure the function I run combined with it's argument dosen't destroy the whole thing?
So, what did I try already?
Now, I've actually told only half the truth until now: I don't use resolve, I use ns-resolve. My first intuition was to create a seperate file which will load in all namespaces from the library, there's nothing malicious you could do with those. The problem is, I want only those functions and I'm not aware of any way to remove clojure.core functions. I could do a blacklist for those but whitelisting would be a whole lot easier. Not to mention I could never find all core functions I actually should be blacklisting.
The oher thing is the input.
Again I've got a basic idea which is to sanitize the input to replace all sort of brackets just to make sure the input isn't other clojure code which would just bypass the namespace restriction from above. But would this actually be enough? I've got not much experience in breaking things.
Another concern I've heard is that some functions could run the input as argument long before intended. The server works with ring and its JSON extension.
JSON should only give strings, numbers, booleans and nil as atomic data types. I conclude each possible malicious input should be a string at my end - besides resolve, is there any function which could have the side effect of running such input?
Since they are string: Is there even a concern to be had with the data at all?

I would strongly advise to use a whitelisting approach for functions, and not to evaluate anything else.
You could maybe add a metadata flag to the exposed functions that you check where you resolve them.
Everything else should just be data; don't evaluate it.

Probably you want to look into the following:
How to determine public functions from a given namespace. This will give you a list of the valid functions names that your API can accept as part of the input. Here's a sample:
user=> (ns-publics (symbol "clojure.string"))
{ends-with? #'clojure.string/ends-with?, capitalize #'clojure.string/capitalize, reverse #'clojure.string/reverse, join #'clojure.string/join, replace-first #'clojure.string/replace-first, starts-with? #'clojure.string/starts-with?, escape #'clojure.string/escape, last-index-of #'clojure.string/last-index-of, re-quote-replacement #'clojure.string/re-quote-replacement, includes? #'clojure.string/includes?, replace #'clojure.string/replace, split-lines #'clojure.string/split-lines, lower-case #'clojure.string/lower-case, trim-newline #'clojure.string/trim-newline, upper-case #'clojure.string/upper-case, split #'clojure.string/split, trimr #'clojure.string/trimr, index-of #'clojure.string/index-of, trim #'clojure.string/trim, triml #'clojure.string/triml, blank? #'clojure.string/blank?}
You probably want to use the keys from the map above (in the namespace that applies to your use case) to validate the input, because you can "escape" the ns-resolve namespace if you fully qualify the function name:
user=> ((ns-resolve (symbol "clojure.string") (symbol "reverse")) "hello")
"olleh"
user=> ((ns-resolve (symbol "clojure.string") (symbol "clojure.core/reverse")) "hello")
(\o \l \l \e \h) ;; Called Clojure's own reverse, probably you don't want to allow this
Now, with that being said, I'm going to offer you some free advice:
I'm supposed to make it so that we don't have to change the API each time a function is added/removed
If you have watched some of Rich Hickey's talks you'll know that API changes are a sensible topic. In general you should think carefully before adding new functions or thinking of deleting any, because it sounds like your team is willing to cut corners on getting clients of the API together on the same page.
Unless your clients can discover dynamically what functions are available (maybe you'll expose some API?), it sounds like you will be open to receiving requests you cannot fulfill because the functions have changed or could be removed.

How to identify uninitialized variables in a Lua script, without running it

I'd like to be able to write some Lua code like this:
y=x+1
and be able to get the names of all variables (x and y in this case) so that I can read from/write to them in the calling C++ program. The problem is that x is uninitialized, so this chunk will not execute and therefore neither variable will appear in the globals table. My current work-around is to have the user explicitly declare that they want to initialize x externally (as well as how to initialize it), then I pre-pend the Lua script with an appropriate declaration for x, so that the final script looks like this:
x= /*some value calculated outside of the Lua script*/
y=x+1
Although this works, I'd really like to have a way to automatically list all uninitialized variables in the Lua code and present them to the user, instead of the user having to remember to explicitly declare them. A function that parses the Lua code without executing it would probably be what I want. I've tried the function luaL_loadstring, but x and y don't show up in the globals table.
Since this is a bit vague, I'll give an actual use case. My C++ code basically performs optimizations on functions, such as finding a root or a maximum. I want the user to be able to define custom functions (in the form of Lua scripts), which in general will have one or more inputs and one or more outputs. The user will define which parameters the optimizer should operate on. For example, the user may want to find the minimum of y=x^2. The way I'd like it to work is that the user writes a Lua script consisting of nothing more than y=x^2, and then tells the optimizer to vary x in order to minimize y. On each iteration of the optimizer, the current guess for x would be automatically pasted into the user script, which is then executed, and then the value of y is pulled from the Lua state to be fed back to the optimizer. This is how I have it working now, however it's a bit clumsy from a UX perspective because the user has to manually declare that x is a Lua variable. This gets tedious when there are many variables that require manual declaration. It would be much better if I could automatically scan the script and show the user a list of their undeclared variables so they could then use drag-and-drop and other GUI sugar to do the manual declaration.

Lua isn't meant to work like that. Lua/C interop is intended to be collaborative; it's not supposed to be that C can do whatever it wants.
Using your example, if you have a Lua script that is supposed to take a value from C and return that value + 1, then you spell that in Lua like this:
local x = ... --Get the first parameter to the chunk.
return x + 1 --Adds 1 to the value and returns it.
You compile this string into a Lua chunk and call it like a Lua function. You pass it the value you want to manipulate and get the return value from the Lua stack.
The idea is not that C code can just reach into a Lua script and shove data into it arbitrarily. The above chunk takes parameters from the user and provides return values to the user. That's typically how C interfaces with Lua.
Yes, you can write values to globals and have the Lua script read them, and write its "results" to globals that the external code reads. But this is not the most effective way to interact with scripts.
I'd really like to have a way to automatically list all uninitialized variables
There's no such thing in Lua as an "uninitialized variable". Not in the way that you mean.
Yes, there are globals. But whether that global has a value or not is not something the Lua script can control. A global is global after all; you can set a global variable from outside of the script (for example, see lua_setglobal). If you do, then a script that reads from it will read the value you set. But it doesn't know anything about that.

What you want is a static code analyzer/Lua linter. Take a look at Luacheck:
Luacheck is a static analyzer and a linter for Lua. Luacheck detects
various issues such as usage of undefined global variables, unused
variables, and values, accessing uninitialized variables, unreachable
code and more. Most aspects of checking are configurable: there are
options for defining custom project-related globals, for selecting set
of standard globals (version of Lua standard library), for filtering
warnings by type and name of related variables, etc. The options can
be used on the command line, put into a config or directly into
checked files as Lua comments.
There is also Lualint, and similar Lua linters for Atom, VSCode, or your fav IDE.

How to use HadoopJarStepConfig.StepProperties?

AWS docs state that this property is "A list of Java properties that are set when the job flow step runs. You can use these properties to pass key-value pairs to your main function in the JAR file."
But there is no explanation (at least, I failed to find any) how exactly they are passed, and how to properly access said collection of key-value pairs on a main function side.
Quick check proved that they aren't passed via environment nor command line arguments. Could be some other way?

Okay, seems that this map goes to Java system properties and is accessible from main function side via System.getProperties() call, but there are some non-obvious implications.
First thing to keep in the mind, that internally they are set via environment variable HADOOP_CLIENT_OPTS as -Dkey=value switches. But EMR does not bother itself to properly escape keys nor values by shell rules.
Also, it does not report any syntax errors if there are properties with non-printable characters, just omits setting them altogether. And it plays even worse with special shell characters like * ? ( ) \ and such — it'll fail the task execution without a proper explanation, and the log records will vaguely point only to obscure syntax errors in some eval() call deeply inside of EMR internal shell script wrappers.
Please be aware about that behaviour.
Properties must be shell-escaped, and in some cases even doubly shell-escaped.

How can I make a Stata plugin not depend on the order of variables?

I am writing a plugin for Stata in C++, and it seems to me that accessing the data depends on the order of variables passed to the plugin, as SF_vdata() only takes integer arguments to index the variables.
The best solution I have at the moment, is to first run ds, store the macro containing all variable names, and then call my plugin. My plugin can then search the macro for the variable that it is interested in, and get the index base don its position in the list.
This works, but I would like my plugin not to depend on certain Stata commands being run first. I know this is silly, as the plugin requires the dataset to be formatted in a specific way, but something feels wrong about first having to call ds and store a macro before calling my plugin.
Is there anyway to access the order of variable names from inside the plugin if ds is not called first?

I agree with Nick. Unfortunately your macro solution is the only answer, and is what I use. You can only access the data directly using the SF_data functions, as a "matrix", and that's all you get by default, there are no headers like in a table. I use macros to save all the data information and pass the whole dataset, reading the variable I'm interested in, just like you, and even wrote translators to retain the format settings, but have not yet used the value labels.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js