JSONata bit-manipulation - bit-manipulation

Is it possible to do somehow bit manipulation on integers in JSONata?
& is concatenation operator, and I didn't find anything about bit manipulation in docs(shifting, bitwise logical operation)
I would like to achieve C-type manipulation somehow. e.g.: a=0x10 & 0xEF where a should be 0x00 after evaluation.

As far as I'm aware, there are no "out of the box" bit manipulation functions available in JSONata, but you can extend JSONata by providing your own. For example:
Excerpt from Github Discussion about random values/UUIDs:
(
$binPad:=function($n, $len){$pad($formatBase($n,2),-$len,'0')};
$bitwise:=function($lb, $rb, $fn){$split($lb,'')~>$map(function($c, $i){$fn($c='1',$substring($rb,$i,1)='1')?$power(2,(7-$i)):0})~>$sum()};
$and:=function($l,$r){$bitwise($binPad($l,8),$binPad($r,8),function($1,$2){$1 and $2})};
$or:=function($l,$r){$bitwise($binPad($l,8),$binPad($r,8),function($1,$2){$1 or $2})};
...
)
Full disclosure: I did not write these functions, but I had seen this used in other scenarios and I am merely suggesting this as an option to the OP.

Related

Why alternative keywords are not famous in place of in-built ascii operators? [duplicate]

Or are we all sticking to our taught "&&, ||, !" way?
Any thoughts in why we should use one or the other?
I'm just wondering because several answers state thate code should be as natural as possible, but I haven't seen a lot of code with "and, or, not" while this is more natural.
I like the idea of the not operator because it is more visible than the ! operator. For example:
if (!foo.bar()) { ... }
if (not foo.bar()) { ... }
I suggest that the second one is more visible and readable. I don't think the same argument necessarily applies to the and and or forms, though.
"What's in a name? That which we call &&, || or !
By any other name would smell as sweet."
In other words, natural depends on what you are used to.
Those were not supported in the old days. And even now you need to give a special switch to some compilers to enable these keywords. That's probably because old code base may have had some functions || variables named "and" "or" "not".
One problem with using them (for me anyway) is that in MSVC you have to include iso646.h or use the (mostly unusable) /Za switch.
The main problem I have with them is the Catch-22 that they're not commonly used, so they require my brain to actively process the meaning, where the old-fashioned operators are more or less ingrained (kind of like the difference between reading a learned language vs. your native language).
Though I'm sure I'd overcome that issue if their use became more universal. If that happened, then I'd have the problem that some boolean operators have keywords while others don't, so if alternate keywords were used, you might see expressions like:
if ((x not_eq y) and (y == z) or (z <= something)) {...}
when it seems to me they should have alternate tokens for all the (at least comparison) operators:
if ((x not_eq y) and (y eq z) or (z lt_eq something)) {...}
This is because the reason the alternate keywords (and digraphs and trigraphs) were provided was not to make the expressions more readable - it was because historically there have been (and maybe still are) keyboards and/or codepages in some localities that do not have certain punctuation characters. For example, the invariant part of the ISO 646 codepage (surprise) is missing the '|', '^' and '~' characters among others.
Although I've been programming C++ from quite some time, I did not know that the keywords "and" "or" and "not" were allowed, and I've never seen it used.
I searched through my C++ book, and I found a small section mentioning alternative representation for the normal operators "&&", "||" and "!", where it explains those are available for people with non-standard keyboards that do not have the "&!|" symbols.
A bit like trigraphs in C.
Basically, I would be confused by their use, and I think I would not be the only one.
Using a representation which is non-standard, should really have a good reason to be used.
And if used, it should be used consistently in the code, and described in the coding standard.
The digraph and trigraph operators were actually designed more for systems that didn't carry the standard ASCII character set - such as IBM mainframes (which use EBCDIC). In the olden days of mechanical printers, there was this thing called a "48-character print chain" which, as its name implied, only carried 48 characters. A-Z (uppercase), 0-9 and a handful of symbols. Since one of the missing symbols was an underscore (which rendered as a space), this could make working with languages like C and PL/1 a real fun activity (is this 2 words or one word with an underscore???).
Conventional C/C++ is coded with the symbols and not the digraphs. Although I have been known to #define "NOT", since it makes the meaning of a boolean expression more obvious, and it's visually harder to miss than a skinny little "!".
I wish I could use || and && in normal speech. People try very hard to misunderstand when I say "and" or "or"...
I personally like operators to look like operators. It's all maths, and unless you start using "add" and "subtract" operators too it starts to look a little inconsistent.
I think some languages suit the word-style and some suit the symbols if only because it's what people are used to and it works. If it ain't broke, don't fix it.
There is also the question of precedence, which seems to be one of the reasons for introducing the new operators, but who can be bothered to learn more rules than they need to?
In cases where I program with names directly mapped to the real world, I tend to use 'and' and 'or', for example:
if(isMale or isBoy and age < 40){}
It's nice to use 'em in Eclipse+gcc, as they are highlighted. But then, the code doesn't compile with some compilers :-(
Using these operators is harmful. Notice that and and or are logical operators whereas the similar-looking xor is a bitwise operator. Thus, arguments to and and or are normalized to 0 and 1, whereas those to xor aren't.
Imagine something like
char *p, *q; // Set somehow
if(p and q) { ... } // Both non-NULL
if(p or q) { ... } // At least one non-NULL
if(p xor q) { ... } // Exactly one non-NULL
Bzzzt, you have a bug. In the last case you're testing whether at least one of the bits in the pointers is different, which probably isn't what you thought you were doing because then you would have written p != q.
This example is not hypothetical. I was working together with a student one time and he was fond of these literate operators. His code failed every now and then for reasons that he couldn't explain. When he asked me, I could zero in on the problem because I knew that C++ doesn't have a logical xor operator, and that line struck me as very odd.
BTW the way to write a logical xor in C++ is
!a != !b
I like the idea, but don't use them. I'm so used to the old way that it provides no advantage to me doing it either way. Same holds true for the rest of our group, however, I do have concerns that we might want to switch to help avoid future programmers from stumbling over the old symbols.
So to summarize: it's not used a lot because of following combination
old code where it was not used
habit (more standard)
taste (more math-like)
Thanks for your thoughts

How would you idiomatically extend arithmetric functions for other datatypes in Clojure?

So I want to use java.awt.Color for something, and I'd like to be able to write code like this:
(use 'java.awt.Color)
(= Color/BLUE (- Color/WHITE Color/RED Color/GREEN))
Looking at the core implementation of -, it talks specifically about clojure.lang.Numbers, which to me implies that there is nothing I do to 'hook' into the core implementation and extend it.
Looking around on the Internet, there seems to be two different things people do:
Write their own defn - function, which only knows about the data type they're interested in. To use you'd probably end up prefixing a namespace, so something like:
(= Color/BLUE (scdf.color/- Color/WHITE Color/RED Color/GREEN))
Or alternatively useing the namespace and use clojure.core/- when you want number math.
Code a special case into your - implementation that passes through to clojure.core/- when your implementation is passed a Number.
Unfortunately, I don't like either of these. The first is probably the cleanest, as the second makes the presumption that the only things you care about doing maths on is their new datatype and numbers.
I'm new to Clojure, but shouldn't we be able to use Protocols or Multimethods here, so that when people create / use custom types they can 'extend' these functions so they work seemlessly? Is there a reason that +,- etc doesn't support this? (or do they? They don't seem to from my reading of the code, but maybe I'm reading it wrong).
If I want to write my own extensions to common existing functions such as + for other datatypes, how should I do it so it plays nicely with existing functions and potentially other datatypes?
It wasn't exactly designed for this, but core.matrix might be of interest to you here, for a few reasons:
The source code provides examples of how to use protocols to define operations that work with with various different types. For example, (+ [1 2] [3 4]) => [4 6]). It's worth studying how this is done: basically the operators are regular functions that call a protocol, and each data type provides an implementation of the protocol via extend-protocol
You might be interested in making java.awt.Color work as a core.matrix implementation (i.e. as a 4D RGBA vector). I did something simiilar with BufferedImage here: https://github.com/clojure-numerics/image-matrix. If you implement the basic core.matrix protocols, then you will get the whole core.matrix API to work with Color objects. Which will save you a lot of work implementing different operations.
The probable reason for not making arithmetic operation in core based on protocols (and making them only work of numbers) is performance. A protocol implementation require an additional lookup for choosing the correct implementation of the desired function. Although from design point of view it may feel nice to have protocol based implementations and extend them whenever required, but when you have a tight loop that does these operations many times (and this is very common use case with arithmetic operations) you will start feeling the performance issues because of the additional lookup on each operation that happen at runtime.
If you have separate implementation for your own data types (ex: color/-) in their own namespace then it will be more performant due to a direct call to that function and it also make things more explicit and customizable for specific cases.
Another issue with these functions will be their variadic nature (i.e they can take any number of arguments). This is a serious issue in providing a protocol implementation as protocol extended type check only works on first parameter.
You can have a look at algo.generic.arithmetic in algo.generic. It uses multimethods.

How to store a long hex value message in C++

I'm learning a crypto class and one of the assignment asked us to xor a bunch of hex ciphertext and try to find the encrypted message.
I know that you can do '0x' in front of int or long to hold a hex value in a variable, but what if my message is this long:
271946f9bbb2aeadec111841a81abc300ecaa01bd8069d5cc91005e9fe4aad6e04d513e96d99de2569bc5e50eeeca709b50a8a987f4264edb6896fb537d0a716132ddc938fb0f836480e06ed0fcd6e9759f40462f9cf57f4564186a2c1778f1543efa270bda5e933421cbe88a4a52222190f471e9bd15f652b653b7071aec59a2705081ffe72651d08f822c9ed6d76e48b63ab15d0208573a7eef027
I would get an overflow. Is there a way to put the whole message into one variable? I could split the message into subparts, but I prefer it to be in variable instead of many (if that is possible). I tried to use string to hold the massage, but how can I use the operator, '^', for xor?
Or is there a more simple technique that I do not know of?
Thanks
For something like this, you'd typically use a string or a vector<char> to hold the data. You can't use the entire string/vector as an operand to ^, but you can apply it one byte at a time.
If you want to simplify the rest of the code, you could create a class that overloaded operator^ to do a byte-wise XOR, so your code would look something like result = key ^ message;.
You could use an array of, well, any size integer, and apply your operators to it an element at a time (which will probably be a bit more efficient than an array of characters). #JerryCoffin's idea of wrapping it inside a class w/ an overloaded operator is a good one, regardless of the actual representation you use.
Put it in a separate text file
read the file into a buffer
convert ascii chars to hex values
Jerry & Scott have sound suggestions. Another option is to use an existing library: for example, the GNU GMP arbitrary-precision maths library at http://gmplib.org, which supports XOR (see http://gmplib.org/manual/Integer-Logic-and-Bit-Fiddling.html#Integer-Logic-and-Bit-Fiddling) and a "scanf" style function to read in hex (see http://gmplib.org/manual/Formatted-Input-Strings.html#Formatted-Input-Strings), and explicitly aims to provide excellent support for cryptography.

Inserters and Extractors reading/writing binary data vs text

I've been trying to read up on iostreams and understand them better. Occasionally I find it stressed that inserters (<<) and extractors (>>) are meant to be used in textual serialization. It's a few places, but this article is a good example:
http://spec.winprog.org/streams/
Outside of the <iostream> universe, there are cases where the << and >> are used in a stream-like way yet do not obey any textual convention. For instance, they write binary encoded data when used by Qt's QDataStream:
http://doc.qt.nokia.com/latest/qdatastream.html#details
At the language level, the << and >> operators belong to your project to overload (hence what QDataStream does is clearly acceptable). My question would be whether it is considered a bad practice for those using <iostream> to use the << and >> operators to implement binary encodings and decodings. Is there (for instance) any expectation that if written to a file on disk that the file should be viewable and editable with a text editor?
Should one always be using other method names and base them on read() and write()? Or should textual encodings be considered merely a default behavior that classes integrating with the standard library iostream can elect to ignore?
UPDATE A key terminology issue on this seems to be the distinction of I/O that is "formatted" vs "unformatted" (as opposed to the terms "textual" vs "binary"). I found this question:
writing binary data (std::string) to an std::ofstream?
It has a comment from #TomalakGeret'kal saying "I'd not want to use << for binary data anyway, as my brain reads it as "formatted output" which is not what you're doing. Again, it's perfectly valid, but I just would not confuse my brain like that."
The accepted answer to the question says it's fine as long as you use ios::binary. That seems to bolster the "there's nothing wrong with it" side of the debate...but I still don't see any authoritative source on the issue.
Actually the operators << and >> are bit shift operators; using them for I/O is strictly speaking already a misuse. However that misuse is about as old as operator overloading itself, and I/O today is the most common usage of them, therefore they are widely regarded as I/O insertion/extraction operators. I'm pretty sure if there weren't the precedent of iostreams, nobody would use those operators for I/O (especially with C++11 which has variadic templates, solving the main problem which using those operators solved for iostreams, in a much cleaner way). On the other hand, from the language point of view, overloaded operator<< and operator>> can mean whatever you want them to mean.
So the question boils down to what would be an acceptable use of those operators. For this, I think one has to distinguish two cases: First, new overloads working on iostream classes, and second, new overloads working on other classes, possibly designed to work like iostreams.
Let's consider first new operators on iostream classes. Let me start with the observation that the iostream classes are all about formatting (and the reverse process, which could be called "deformatting"; "lexing" IMHO wouldn't be quite the right term here because the extractors don't determine the type, but only try to interpret the data according to the type given). The classes responsible for the actual I/O of raw data are the streambufs. However note that a proper binary file is not a file where you just dump internal raw data. Just like a text file (actually even more so), a binary file should have a well-specified encoding of the data it contains. Especially if the files are expected to be read on different systems. Therefore the concept of formatted output makes perfect sense also for binary files; just the formatting is different (e.g. writing a pre-determined number of bytes with the most significant one first for an integer value).
The iostreams themselves are classes which are intended to work on text files, that is, on files whose content is interpreted as textual representation of data. A lot of built-in behaviour is optimized for that, and may cause problems if used on binary files. An obvious example is that by default spaces are skipped before any input is attempted. For a binary file, this would be clearly the wrong behaviour. Also the use of locales doesn't make sense for binary files (although one might argue that there could be a "binary locale", but I don't think locales as defined for iostreams provide a suitable interface for that). Therefore I'd say writing binary operator<< or operator>> for iostream classes would be wrong.
The other case is where you define a separate class for binary input/output (possibly reusing the streambuf layer for doing the actual I/O). Since we are now speaking about different classes, the argumentation above doesn't apply any more. So the question now is: Should operator<< and operator>> on I/O be regarded as "text insertion/extraction operators" or more generally as "formatted data insertion/extraction operators"? The standard classes only use them for text, but then, there are no standard classes for binary I/O insertion/extraction at all, so the standard usage cannot distinguish between the two.
I personally would say that binary insertion/extraction is close enough to textual insertion/extraction that this usage is justified. Note that you also could make meaningful binary I/O manipulators, e.g. bigendian, littleendian and intwidth(n) to determine the format in which integers are to be output.
Beyond that there's also the use of those operators for things which are not really I/O (and where you wouldn't even think of using the streambuf layer), like reading from or inserting into a container. In my opinion, that already constitutes misuse of the operators, because there the data isn't translated into or out of a different format. It is just stored in a container.
The abstraction of the iostreams in the standard is that of a textually
formatted stream of data; there is no support for any non-text format.
That is the abstraction of iostreams. There's nothing wrong about
defining a different stream class whose abstraction is a binary format,
but doing so in an iostream will likely break existing code, and not
work.
The overloaded operators >> and << perform formatted IO. The rest IO functions (put, get, read, write, etc) perform unformatted IO. Unformatted IO means that the IO library only accepts a buffer, a sequence of unsigned character for its input. This buffer might contain textual message or a binary content. It’s the application’s responsibility to interpret the buffer. However the formatted IO would take the locale into consideration. In the case of text files, depending on the environment where the application runs, some special character conversion may occur in input/output operations to adapt them to a system-specific text file format. In many environments, such as most UNIX-based systems, it makes no difference to open a file as a text file or a binary file. Note that you could overload the operator >> and << for your own types. That means you are capable of applying the formatted IO without locale information to your own types, though that’s a bit tricky.

Is anybody using the named boolean operators?

Or are we all sticking to our taught "&&, ||, !" way?
Any thoughts in why we should use one or the other?
I'm just wondering because several answers state thate code should be as natural as possible, but I haven't seen a lot of code with "and, or, not" while this is more natural.
I like the idea of the not operator because it is more visible than the ! operator. For example:
if (!foo.bar()) { ... }
if (not foo.bar()) { ... }
I suggest that the second one is more visible and readable. I don't think the same argument necessarily applies to the and and or forms, though.
"What's in a name? That which we call &&, || or !
By any other name would smell as sweet."
In other words, natural depends on what you are used to.
Those were not supported in the old days. And even now you need to give a special switch to some compilers to enable these keywords. That's probably because old code base may have had some functions || variables named "and" "or" "not".
One problem with using them (for me anyway) is that in MSVC you have to include iso646.h or use the (mostly unusable) /Za switch.
The main problem I have with them is the Catch-22 that they're not commonly used, so they require my brain to actively process the meaning, where the old-fashioned operators are more or less ingrained (kind of like the difference between reading a learned language vs. your native language).
Though I'm sure I'd overcome that issue if their use became more universal. If that happened, then I'd have the problem that some boolean operators have keywords while others don't, so if alternate keywords were used, you might see expressions like:
if ((x not_eq y) and (y == z) or (z <= something)) {...}
when it seems to me they should have alternate tokens for all the (at least comparison) operators:
if ((x not_eq y) and (y eq z) or (z lt_eq something)) {...}
This is because the reason the alternate keywords (and digraphs and trigraphs) were provided was not to make the expressions more readable - it was because historically there have been (and maybe still are) keyboards and/or codepages in some localities that do not have certain punctuation characters. For example, the invariant part of the ISO 646 codepage (surprise) is missing the '|', '^' and '~' characters among others.
Although I've been programming C++ from quite some time, I did not know that the keywords "and" "or" and "not" were allowed, and I've never seen it used.
I searched through my C++ book, and I found a small section mentioning alternative representation for the normal operators "&&", "||" and "!", where it explains those are available for people with non-standard keyboards that do not have the "&!|" symbols.
A bit like trigraphs in C.
Basically, I would be confused by their use, and I think I would not be the only one.
Using a representation which is non-standard, should really have a good reason to be used.
And if used, it should be used consistently in the code, and described in the coding standard.
The digraph and trigraph operators were actually designed more for systems that didn't carry the standard ASCII character set - such as IBM mainframes (which use EBCDIC). In the olden days of mechanical printers, there was this thing called a "48-character print chain" which, as its name implied, only carried 48 characters. A-Z (uppercase), 0-9 and a handful of symbols. Since one of the missing symbols was an underscore (which rendered as a space), this could make working with languages like C and PL/1 a real fun activity (is this 2 words or one word with an underscore???).
Conventional C/C++ is coded with the symbols and not the digraphs. Although I have been known to #define "NOT", since it makes the meaning of a boolean expression more obvious, and it's visually harder to miss than a skinny little "!".
I wish I could use || and && in normal speech. People try very hard to misunderstand when I say "and" or "or"...
I personally like operators to look like operators. It's all maths, and unless you start using "add" and "subtract" operators too it starts to look a little inconsistent.
I think some languages suit the word-style and some suit the symbols if only because it's what people are used to and it works. If it ain't broke, don't fix it.
There is also the question of precedence, which seems to be one of the reasons for introducing the new operators, but who can be bothered to learn more rules than they need to?
In cases where I program with names directly mapped to the real world, I tend to use 'and' and 'or', for example:
if(isMale or isBoy and age < 40){}
It's nice to use 'em in Eclipse+gcc, as they are highlighted. But then, the code doesn't compile with some compilers :-(
Using these operators is harmful. Notice that and and or are logical operators whereas the similar-looking xor is a bitwise operator. Thus, arguments to and and or are normalized to 0 and 1, whereas those to xor aren't.
Imagine something like
char *p, *q; // Set somehow
if(p and q) { ... } // Both non-NULL
if(p or q) { ... } // At least one non-NULL
if(p xor q) { ... } // Exactly one non-NULL
Bzzzt, you have a bug. In the last case you're testing whether at least one of the bits in the pointers is different, which probably isn't what you thought you were doing because then you would have written p != q.
This example is not hypothetical. I was working together with a student one time and he was fond of these literate operators. His code failed every now and then for reasons that he couldn't explain. When he asked me, I could zero in on the problem because I knew that C++ doesn't have a logical xor operator, and that line struck me as very odd.
BTW the way to write a logical xor in C++ is
!a != !b
I like the idea, but don't use them. I'm so used to the old way that it provides no advantage to me doing it either way. Same holds true for the rest of our group, however, I do have concerns that we might want to switch to help avoid future programmers from stumbling over the old symbols.
So to summarize: it's not used a lot because of following combination
old code where it was not used
habit (more standard)
taste (more math-like)
Thanks for your thoughts