XTEA fragment: Which is better layout? - c++

I am doing a University assignment in which part of the assignment is to implement the XTEA block cipher in C++. We are allowed to use Internet sources, so I went to the XTEA Wikipedia page and found the code there.
As I was copying it^ I was thinking about code style/layout. I initially had
data_first32 = ( (( data_last32 << 4 ) ^ ( data_last32 >> 5 )) + data_last32 ) ^
( sum + key[ sum & 3 ] );
(Note: those are 2 lines not 1 line wrapped)
However I then thought maybe it would be easier to read if it was:
data_first32 =
(
(
( data_last32 << 4 )
^
( data_last32 >> 5 )
)
+
data_last32
)
^
( sum + key[ sum & 3 ] );
The latter one seems to be to be (slightly) easier to read, however it just seems 'wrong'.
Which is the better (Readability) style?
Why does the latter one seem 'wrong'?
^ Not using copy & paste, as I wanted to go through it manually and I wanted better variable names.

Related

Conditional itab grouping alike HAVING clause in SQL

This is a follow-up question for this one but instead of aggregation I want to process groups based on some condition and cannot figure out the proper syntax for this. I need to exclude groups which contain documents with status "deleted", if at least one of the members of the group has this status.
I tried so far GROUP...WITHOUT MEMBERS, LOOP...FOR GROUPS, REDUCE and this is the solution I end up with
DATA(lt_valid_doc) = VALUE tt_struct(
FOR ls_valid IN VALUE tt_struct(
FOR GROUPS <group_key> OF <wa> IN lt_ilot
GROUP BY ( guid = <wa>-guid guid2 = <wa>-guid2 ) ASCENDING
LET not_deleted = REDUCE #( INIT valid TYPE t_ref_s_struct
FOR <m> IN GROUP <group_key>
NEXT valid = COND #(
WHEN valid IS NOT BOUND OR <m>-stat = 'I1040'
THEN REF #( <m> ) ELSE valid ) )
IN ( not_deleted->* ) )
WHERE ( status NE 'I1040' )
( ls_valid ) ).
 
However, this solution seems redundant to me (I1040 filter indicated twice). Is there any syntax that allows doing this in one statement (REDUCE, GROUP or whatever) without constructing nested table on-the-fly and filtering it like I am doing now?
If I use WHERE condition on all of the above statements (GROUP...WITHOUT MEMBERS, LOOP...FOR GROUPS and REDUCE) it only filters base lines for grouping not the groups itself. I need somewhat similar to HAVING in SQL.
UPDATE OK, here is real-life compilable example based on BSEG table. The task is find only unreveresed docs, i.e.to exclude all docs with reversed (XNEGP = true) lines.
TYPES: BEGIN OF t_s_bseg,
bukrs TYPE bseg-bukrs,
belnr TYPE bseg-belnr,
gjahr TYPE bseg-gjahr,
buzei TYPE bseg-buzei,
xnegp TYPE bseg-xnegp,
END OF t_s_bseg,
tt_bseg TYPE SORTED TABLE OF t_s_bseg WITH EMPTY KEY.
TYPES: t_ref_s_bseg TYPE REF TO t_s_bseg.
DATA(lt_valid_fi_doc) = VALUE tt_bseg(
FOR ls_valid IN VALUE tt_bseg(
FOR GROUPS <group_key> OF <wa> IN lt_bseg
GROUP BY ( bukrs = <wa>-bukrs belnr = <wa>-belnr gjahr = <wa>-belnr ) ASCENDING
LET not_reversed = REDUCE #( INIT valid TYPE t_ref_s_bseg
FOR <m> IN GROUP <group_key>
NEXT valid = COND #(
WHEN valid IS NOT BOUND OR <m>-xnegp = abap_true
THEN REF #( <m> ) ELSE valid ) )
IN ( not_reversed->* ) )
WHERE ( xnegp NE abap_true )
( ls_valid ) ).
Input lines
bukrs belnr gjahr buzei xnegp
1000 0100000001 2019 1
1000 0100000001 2019 2
1000 0100000003 2019 1
1000 0100000003 2019 2
1000 0100000004 2019 1
1000 0100000004 2019 2 X
Doc 0100000004 has reversed line so result should be
bukrs belnr gjahr buzei xnegp
1000 0100000001 2019
1000 0100000003 2019
Here's a solution which doesn't repeat the selection, but one question remains, is that really "better"?
The solution is based on generating an empty line if the group of lines contains one line with status 'I1040', instead of keeping the unwanted line. I'm not sure, but maybe another similar solution could keep the reference to the line (not_deleted), plus adding an auxiliary variable to know whether the reference is to keep or not. I found it more intuitive to use table indexes (INDEX INTO), but that might not work if tt_struct is a hashed table type.
I provide the code with an ABAP Unit Test so that you can quickly try it yourself.
CLASS ltc_main DEFINITION FOR TESTING
DURATION SHORT RISK LEVEL HARMLESS.
PRIVATE SECTION.
METHODS test FOR TESTING.
METHODS cut.
TYPES : BEGIN OF ty_struct,
guid TYPE string,
stat TYPE string,
END OF ty_struct,
tt_struct TYPE STANDARD TABLE OF ty_struct WITH EMPTY KEY,
t_ref_s_struct TYPE REF TO ty_struct.
DATA: lt_ilot TYPE tt_struct,
lt_valid_doc TYPE tt_struct.
ENDCLASS.
CLASS ltc_main IMPLEMENTATION.
METHOD cut.
lt_valid_doc = VALUE #(
FOR ls_valid IN VALUE tt_struct(
FOR GROUPS <group_key> OF <wa> IN lt_ilot
GROUP BY ( guid = <wa>-guid ) ASCENDING
LET x1 = REDUCE #(
INIT x2 = 0
FOR <m> IN GROUP <group_key> INDEX INTO x3
NEXT x2 = COND #(
WHEN <m>-stat = 'I1040' THEN -1
ELSE COND #( WHEN x2 <> 0 THEN x2 ELSE x3 ) ) )
IN ( COND #( WHEN x1 <> -1 THEN lt_ilot[ x1 ] ) ) )
WHERE ( table_line IS NOT INITIAL )
( ls_valid ) ).
ENDMETHOD.
METHOD test.
lt_ilot = VALUE #(
( guid = 'A' stat = 'I1000' )
( guid = 'A' stat = 'I1040' )
( guid = 'B' stat = 'I1020' )
( guid = 'C' stat = 'I1040' )
( guid = 'D' stat = 'I1040' )
( guid = 'D' stat = 'I1000' ) ).
cut( ).
cl_abap_unit_assert=>assert_equals( act = lt_valid_doc
exp = VALUE tt_struct( ( guid = 'B' stat = 'I1020' ) ) ).
ENDMETHOD.
ENDCLASS.
I really hope that was some kind of personal test case, and you do not use this coding in a productive environment. If I would have to understand what you want to achieve looking only at the coding, I would hate you ;).
The key to easily solve this problem is sorting the table so that the delete condition is always in the first row of the group you want to process:
Solution1
with output of unique list:
DATA: lt_bseg TYPE STANDARD TABLE OF t_s_bseg.
SORT lt_bseg BY belnr xnegp DESCENDING.
DELETE ADJACENT DUPLICATES FROM lt_bseg COMPARING belnr.
DELETE lt_bseg WHERE xnegp = abap_true.
Solution2
with output of a non-unique list:
DATA: lt_bseg TYPE STANDARD TABLE OF t_s_bseg,
lf_prev_belnr TYPE belnr,
lf_delete TYPE char1.
SORT lt_bseg BY belnr xnegp DESCENDING.
LOOP AT lt_bseg ASSIGNING FIELD-SYMBOL(<ls_bseg>).
IF <ls_bseg>-belnr <> lf_prev_belnr.
lf_delete = <ls_bseg>-xnegp.
lf_prev_belnr = <ls_bseg>-belnr.
ENDIF.
IF lf_delete = abap_true.
DELETE lt_bseg.
ENDIF.
ENDLOOP.
the following solution might not be the prettiest one, but its simple. I's easier to remove a whole group if one member meets a condition than to add a whole group if all of them fail the condtition. Just an idea.
TYPES : BEGIN OF ty_struct,
guid TYPE string,
stat TYPE string,
END OF ty_struct,
tt_struct TYPE STANDARD TABLE OF ty_struct WITH EMPTY KEY.
DATA(lt_ilot) = VALUE tt_struct(
( guid = 'A' stat = 'I1000' )
( guid = 'A' stat = 'I1040' )
( guid = 'B' stat = 'I1020' )
( guid = 'C' stat = 'I1040' )
( guid = 'D' stat = 'I1040' )
( guid = 'D' stat = 'I1000' )).
LOOP AT lt_ilot INTO DATA(ls_ilot) WHERE stat = 'I1040'.
DELETE lt_ilot WHERE guid = ls_ilot-guid.
ENDLOOP.

Regex to pull out numbers and operands

I am trying to write a regex to parse out seven match objects: four numbers and three operands:
Individual lines in the file look like this:
[ 9] -21 - ( 12) - ( -5) + ( -26) = ______
The number in brackets is the line number which will be ignored. I want the four integer values, (including the '-' if it is a negative integer), which in this case are -21, 12, -5 and -26. I also want the operands, which are -, - and +.
I will then take those values (match objects) and actually compute the answer:
-21 - 12 - -5 + -26 = -54
I have this:
[\s+0-9](-?[0-9]+)
In Pythex it grabs the [ 9] but it also then grabs every integer in separate match objects (four additional match objects). I don't know why it does that.
If I add a ? to the end: [\s+0-9](-?[0-9]+)? thinking it will only grab the first integer, it doesn't. I get seventeen matches?
I am trying to say, via the regex: Grab the line number and it's brackets (that part works), then grab the first integer including sign, then the operand, then the next integer including sign, then the next operand, etc.
It appears that I have failed to explain myself clearly.
The file has hundreds of lines. Here is a five line sample:
[ 1] 19 - ( 1) - ( 4) + ( 28) = ______
[ 2] -18 + ( 8) - ( 16) - ( 2) = ______
[ 3] -8 + ( 17) - ( 15) + ( -29) = ______
[ 4] -31 - ( -12) - ( -5) + ( -26) = ______
[ 5] -15 - ( 12) - ( 14) - ( 31) = ______
The operands are only '-' or '+', but any combination of those three may appear in a line. The integers will all be from -99 to 99, but that shouldn't matter if the regex works. The goal (as I see it) is to extract seven match objects: four integers and three operands, then add the numbers
exactly as they appear. The number in brackets is just the line number and plays no role in the computation.
Much luck with regex, if you just need the result:
import re
s="[ 9] -21 - ( 12) - ( -5) + ( -26) = ______"
s = s[s.find("]")+1:s.find("=")] # cut away line nr and = ...
if not re.sub( "[+-0123456789() ]*","",s): # weak attempt to prevent python code injection
print(eval(s))
else:
print("wonky chars inside, only numbers, +, - , space and () allowed.")
Output:
-54
Make sure to read the eval()
and have a look into:
https://opensourcehacker.com/2014/10/29/safe-evaluation-of-math-expressions-in-pure-python/
https://softwareengineering.stackexchange.com/questions/311507/why-are-eval-like-features-considered-evil-in-contrast-to-other-possibly-harmfu/311510
https://www.kevinlondon.com/2015/07/26/dangerous-python-functions.html
Example for hundreds of lines:
import re
s="[ 9] -21 - ( 12) - ( -5) + ( -26) = ______"
def calcIt(line):
s = line[line.find("]")+1:line.find("=")]
if not re.sub( "[+-0123456789() ]*","",s):
return(eval(s))
else:
print(line + " has wonky chars inside, only numbers, +, - , space and () allowed.")
return None
import random
random.seed(42)
pattern = "[ {}] -{} - ( {}) - ( -{}) + ( -{}) = "
for n in range(1000):
nums = [n]
nums.extend([ random.randint(0,100),random.randint(-100,100),random.randint(-100,100),
random.randint(-100,100)])
c = pattern.format(*nums)
print (c, calcIt(c))
Ahh... I had a cup of coffee and sat down in front of Pythex again.
I figured out the correct regex:
[\s+0-9]\s+(-?[0-9]+)\s+([-|+])\s+\(\s+(-?[0-9]+)\)\s+([-|+])\s+\(\s+(-?[0-9]+)\)\s+([-|+])\s+\(\s+(-?[0-9]+)\)
Yields:
-21
-
12
-
-5
+
-26

Does python's Pool() for parallelization prevent writing to global variables?

In python 2.7 I am trying to distribute the computation of a two-dimensional array on all of the cores.
For that I have two arrays associated with a variable at the global scope, one to read from and one to write to.
import itertools as it
import multiprocessing as mp
temp_env = 20
c = 0.25
a = 0.02
arr = np.ones((100,100))
x = arr.shape[0]
y = arr.shape[1]
new_arr = np.zeros((x,y))
def calc_inside(idx):
new_arr[idx[0],idx[1]] = ( arr[idx[0], idx[1] ]
+ c * ( arr[idx[0]+1,idx[1] ]
+ arr[idx[0]-1,idx[1] ]
+ arr[idx[0], idx[1]+1]
+ arr[idx[0], idx[1]-1]
- arr[idx[0], idx[1] ]*4
)
- 2 * a
* ( arr[idx[0], idx[1] ]
- temp_env
)
)
inputs = it.product( range( 1, x-1 ),
range( 1, y-1 )
)
p = mp.Pool()
p.map( calc_inside, inputs )
#for i in inputs:
# calc_inside(i)
#plot arrays as surface plot to check values
Assume there is some additional initialization for the array arr with some different values other than that exemplary 1-s, so that the computation ( an iterative calculation of the temperature ) actually makes a sense.
When I use the commented out for-loop, instead of the Pool.map() method, everything works fine and the array actually contains values. When using the Pool() function, the variable new_array just stays in its initialized state ( meaning it contains only the zeros, as it was originally initialised with ).
Q1 : Does that mean that Pool() prevents writing to global variables?
Q2 : Is there any other way to tackle this problem with parallelization?
A1 :
your code actually does not use any variable declared using a syntax of global <variable>. Nevertheless, do not try to go into trying to use them, the less in going into distributed processing.
A2 :
Yes, going parallelised is possible, but better get a thorough sense of the costs of doing so, before spending ( well, wasting ) efforts, that never justify costs for doing so.
Why starting about costs?
Will you pay your bank clerk an amount of $2.00 to receive just a $1.00 banknote in exchange?
Guess no one would ever do.
The same is with trying to go parallel.
Syntax is "free" and "promising",
Syntax costs of the actual execution of a simple and nicely-looking syntax-constructor is not. Expect rather shocking surprises instead of receiving any dinner for free.
What are the actual costs? Benchmark. Benchmark. Benchmark!
The useful work
Your code actually does just a few memory accesses and a few floating point operations "inside" the block and exits. These FLOP-s take less than a few tens of [ns] max a few units of [us] on recent CPU frequencies ~ 2.6 ~ 3.4 [GHz]. Do benchmark it:
from zmq import Stopwatch; aClk = Stopwatch()
temp_env = 20
c = 0.25
a = 0.02
aClk.start()
_ = ( 1.0
+ c * ( 1.0
+ 1.0
+ 1.0
+ 1.0
- 1.0 * 4
)
- 2 * a
* ( 1.0
- temp_env
)
)
T = aClk.stop()
So, a pure [SERIAL]-process execution, on a Quad-core CPU, will not be worse than about a T+T+T+T ( having been executed one after another ).
+-+-+-+--------------------------------- on CpuCore[0]
: : : :
<start>| : : : :
|T| : : :
. |T| : :
. |T| :
. |T|
. |<stop>
|<----->|
= 4.T in a pure [SERIAL]-process schedule on CpuCore[0]
What will happen, if the same amount of usefull work will be enforced to happen inside some form of the now [CONCURRENT]-process execution ( using multiprocessing.Pool's methods ), potentially using more CPU-cores for the same purpose?
The actual compute-phase, will not last less than T again, right? Why ever would it be? Yes, never.
A.......................B...............C.D Cpu[0]
<start>| : : : :
|<setup a subprocess[A]>| : :
. | :
. |<setup a subprocess[B]>|
. | +............+........................... Cpu[!0]
. | : : |
. |T| : |
. |<ret val(s) | | +............+... Cpu[!0]
. | [A]->main| | : :
. |T| :
. |<ret val(s) |
. | [B]->main|
.
.
. .. |<stop>
|<--------------------------------------------------------- .. ->|
i.e. >> 4.T ( while simplified, yet the message is clear )
The overheads i.e. the Costs you will always pay per-call ( and that is indeed many times )
Subprocess setup + termination costs ( benchmark it to learn the scales of these cost ). Memory access costs ( latency, where zero cache-re-use happens to help, as you exit ).
Epilogue
Hope the visual message is clear enough to ALWAYS start accounting the accrued costs before deciding about any achievable benefits from going into re-engineering code into a distributed process-flow, having a multi-core or even many-core [CONCURRENT] fabric available.

filter dplyr's tbl_df using variable names

I am having trouble using dplyr's tbl_df, respectively the regular data.frame. I got a big tbl_df (500x30K) and need to filter it.
So what I would like to do is:
filter(my.tbl_df, row1>0, row10<0)
which would be similar to
df[df$row1>0 & df$row10<0,]
Works great. But I need to build the filter functions dynamically while running, so I need to access the DF/tbl_df columns by one or multiple variables.
I tried something like:
var=c("row1","row10")
op=c(">","<")
val=c(0,0)
filter(my.tbl_df, eval(parse(text=paste(var,op,val,sep="")))
Which gives me an error: not compatible with LGLSXP
This seems to be deeply rooted in the Cpp code.
I would be thankful for any hint. Also pointing out the "string to environment variable" conversion would be helpful, since I am pretty that I am doing it wrong.
With best,
Mario
This is related to this issue. In the meantime, one way could be to construct the whole expression, i.e.:
> my.tbl_df <- data.frame( row1 = -5:5, row10 = 5:-5)
> call <- parse( text = sprintf( "filter(my.tbl_df, %s)", paste(var,op,val, collapse="&") ) )
> call
expression(filter(my.tbl_df, row1 > 0&row10 < 0))
> eval( call )
row1 row10
1 1 -1
2 2 -2
3 3 -3
4 4 -4
5 5 -5

validate integer is some enum class item (C++11)

i have some enum class
enum class Foo { A=1, B=18 , Z=42 };
i want to check if some integer can be converted into a Foo.
What would be the ideal way to do this? this is for runtime check (the integer is not known yet at compile-time)
Obviously i can do this the hard way (write a function bool CheckEnum(Foo); with a big-ass switch returning true for all cases except the default one), but i was hoping a more elegant mechanism that avoided so much writing. MPL or Boost.Preprocessor would be a perfectly acceptable solution, but one of which i sadly know very little about
A solution to this problem is to ditch the enums, and replace it with some arrays which are created using XMACROs.
There is no "ideal" way to do it. All ways are going to have to involve some manual work.
You need to create a data structure that contains all of the acceptable values. Then search that data structure with the runtime value you need. A std::set or std::unordered_set would be adequate for this purpose.
Your main difficulty will be in maintaining that list, as it will need to be updated every time you change your enum class.
Ok i'm a little bit fed up with this issue (some of my enums are nearly 100 items)
so i decided to tackle it with code generation, which might not be everyone's cup of tea, but i've realised that is really no such a big deal.
Basically i went for Python Cog, which allows me to embed python snippets inside comments in my .h and .cpp files and auto-generate code. I use it basically like a really smart, imperative macro system:
i added the following to Test.h
/*[[[cog
#----------- definitions
import cog
def createCategoryConstants( enumVar , bitShift ):
categoryIndex = 0
for cat in enumVar:
cog.outl(' const unsigned int %s_op_mask = (%d << %d); ' %(cat[0] , categoryIndex , bitShift))
categoryIndex += 1
cog.outl('\n\n')
def createMultiCategoryEnum( enumVar , enumTypename ):
cog.outl(' enum class %s { ' % enumTypename )
categoryIndex = 0
for i in enumVar:
itemIndex = 0
catName = 'NotExpected'
remainingCategories = len(enumVar)- categoryIndex - 1
for j in i:
if (itemIndex == 0):
catName = j
itemIndex = 1
continue
enumItemIndex = 0
for enumItem in j:
remainingEnums = len(j) - enumItemIndex - 1
currentLine = ' %s = %s_op_mask | %d ' %(enumItem, catName, enumItemIndex)
if (remainingCategories != 0 or remainingEnums != 0):
currentLine += ' , '
cog.outl(currentLine)
enumItemIndex += 1
itemIndex += 1
cog.outl('') #empty line to separate categories
categoryIndex += 1
cog.outl(' };\n\n')
def createIndexFromEnumFunction( enumVar , enumTypename , functionName ):
cog.outl('uint32_t %s(%s a) { \n switch (a)\n {' % (functionName , enumTypename) )
absoluteIndex = 0
for cat in enumVar:
elemInCat = 0
for i in cat:
if elemInCat != 0:
for enumItem in i:
cog.outl('case %s:' % enumItem)
cog.outl(' return %d; \n' % absoluteIndex)
absoluteIndex += 1
elemInCat += 1
cog.outl(' } \n } \n\n ')
def createMultiEnum( enumVar , enumTypename ):
createCategoryConstants( enumVar , 4)
createMultiCategoryEnum( enumVar , enumTypename )
createIndexFromEnumFunction( enumVar , enumTypename , 'FromOpToIndex' )
#------------- generation
multiEnum =[ ['CatA', ['A1', 'A2' , 'A3_foo']] , ['CatSuper8' , ['Z1_bla' , 'Z10' , 'Z11']] ]
createMultiEnum( multiEnum , 'multiFooEnum')
]]]*/
//[[[end]]]
Then i added cog invocation in my Makefile pre-build step:
.build-pre:
# Add your pre 'build' code here...
python /usr/local/bin/cog.py -I../../../tools/cog/ -r *.h
And the results show up just below:
]]]*/
const unsigned int CatA_op_mask = (0 << 4);
const unsigned int CatSuper8_op_mask = (1 << 4);
enum class multiFooEnum {
A1 = CatA_op_mask | 0 ,
A2 = CatA_op_mask | 1 ,
A3_foo = CatA_op_mask | 2 ,
Z1_bla = CatSuper8_op_mask | 0 ,
Z10 = CatSuper8_op_mask | 1 ,
Z11 = CatSuper8_op_mask | 2
};
uint32_t FromOpToIndex(multiFooEnum a) {
switch (a)
{
case A1:
return 0;
case A2:
return 1;
case A3_foo:
return 2;
case Z1_bla:
return 3;
case Z10:
return 4;
case Z11:
return 5;
}
}
//[[[end]]]
So, now my enum validation is about making sure the code generation (invoked at compile time) is done correctly