How can one express bitwise logical operations in mainframe COBOL?
I have:
01 WRITE-CONTROL-CHAR.
03 WCC-NOP PIC X VALUE X'01'.
03 WCC-RESET PIC X VALUE X'02'.
03 WCC-PRINTER1 PIC X VALUE X'04'.
03 WCC-PRINTER2 PIC X VALUE X'08'.
03 WCC-START-PRINTER PIC X VALUE X'10'.
03 WCC-SOUND-ALARM PIC X VALUE X'20'.
03 WCC-KEYBD-RESTORE PIC X VALUE X'40'.
03 WCC-RESET-MDT PIC X VALUE X'80'.
In Micro Focus COBOL, I could do something like:
WCC-NOP B-AND WCC-RESET
but it seems there's no such operator on the mainframe (or at least not in Enterprise COBOL for z/OS).
Is there some (hopefully straightforward!) way to simulate/replicate bitwise logical operations in mainframe COBOL?
Your best bet appears to be 'CEESITST', as it appears to exist in z/OS COBOL. I found an example using that as well as the other bit manipulation programs.
http://esj.com/articles/2000/09/01/callable-service-manipulating-binary-data-at-the-bit-level.aspx
If you AND together values that don't share bits, you will always get zero, so I'm going to assume you meant OR. This makes sense since you tend to OR independent bits to construct a multi-bit value.
With that in mind, when the bit masks are independent of each other, as in a single term doesn't interact with other terms, there is no difference between:
termA OR termB
and:
termA + termB
Your terms are all independent here, being x'1', x'2' and so forth (no x'03' or x'ff') so adding them should work fine:
COMPUTE TARGET = WCC-NOP + WCC-RESET
Now that's good for setting bits starting with nothing, but not so useful for clearing them. However, you can use a similar trick for clearing:
COMPUTE TARGET = 255 - WCC-NOP - WCC-RESET
Setting or clearing them from an arbitrary starting point (regardless of their current state) is a little trickier and can't be done easily with addition and subtraction.
Related
I have a service with code that runs for years.
Now using the VS-2019 16.3.7/8/9 I get a 0xC000001D (The thread tried to execute an invalid instruction.) exceptions on the first floating point arithmetic in the boot phase of the service on some machines.
Installing the same code base compiled with 16.2.4/5 works.
I have a full memory crash dump from one machine. The crash happens on a call to _dtol3.
I can see in the assembly code this situation.
0149477B 83 3D B4 AC 55 01 06 cmp dword ptr [__isa_available (0155ACB4h)],6
01494782 7C 15 jl _dtol3_default (01494799h)
01494784 C5 FA 7E C0 vmovq xmm0,xmm0
01494788 62 F1 FD 08 7A C0 vcvttpd2qq xmm0,xmm0 <---- CRASH
__isa_available has the value 6. On my machine the value is 5. One machine were we can see the crash is a XEON Silver 4110 running our software virtualized. Same exe runs on a XEON E5-2620. The same exe runs on all my client machines in my company without any problem (a mix of old and new Intel machines).
Looking at the called code, I can see that there is a simple difference and division of to double values comparing it to a value greater or equal 1.0.
COleDateTime nowClient = COleDateTime::GetCurrentTime(),
nowDB = GetCurrentTime();
COleDateTimeSpan diff = nowDB-nowClient;
if (diff.GetTotalMinutes()>=1) // <----- CRASH
...
Is there any way to influence the code creation in the VS to avoid the calls to this code? (Any shim, compatibility setting)
Is there any known change in VS-2019 that influences the floating math since VS 16.2.4, that might has influence on my problem?
This is a bug in the 16.3.x update of Visual Studio. Here is a link to the bug report.
Read it carefully, it actually happens on machines that support AVX512, not older CPUs as the initial post describes. This also contains a couple of workarounds to avoid the issue until Microsoft has a fix.
You can enable or disable the generation of 'advanced' instructions for each project in its "Properties" sheet:
Right click the project in the Solution Explorer and select "Properties" from the pop-up menu.
Select/Open the "C/C++" tab in the left-hand pane and then select the "Code Generation" item.
In the list of properties then presented in the right-hand pane, select "Not Set" for the "Enabled Enhanced Instruction Set" property (or whichever option will give you code that is compatible with your target PCs).
The compiler's use of vmovq implies that you have (at least) "Advanced Vector Extensions" enabled; also, I think (not 100% sure) that vcvttpd2qq is in the "AVX2" category.
I'm very curious about observation I've made during development of my app.
Long story short, I was making app which reads data from smart electric meters. They have request frame like:
7E [hex-address] [crc1] [cmd] [crc2] 7E
CRC's algoritm is CRC-16/X-25, and they should calculated of whole left part of frame without 7e, so crc2 includes crc1. I've noticed that all the time crc2 is the same for any cmd, even having different hex-address.
I've used crccalc tool to ensure. Steps to reproduce:
put any (even any size) hex array
calc CRC-16/X-25
swap CRC bytes
put this swapped CRC in end of array
calc CRC again
This CRC would be same for any byte array. Why so? Is there any name for this phenomenon?
calc CRC again
If you append a CRC to data and calculate it again, the result is a constant value. If the CRC isn't post complemented (xorout = 0), the result is zero, but in this case it is post complemented (xorout = 0xffff), so the result is a non-zero constant, in this case 0x0f47 (assuming no errors occurred).
In the design specification that I'm veryifying the DUT against there is a requirement that the word clock and bit clock are being generated when the active_clk signal is high. I've little experience in using SVA, so was hoping that someone with a little more experience could point me in the correct direction, or better yet, provide a solution.
Have an always ON clock which you can use to predict rising/falling edges of other 2 clocks within some fixed/calculated duration. Something like below code:
bit aon_clk;
always #1 aon_clk = ~aon_clk;
property clk_chk;
#(aon_clk)
// Within say 25 Always ON Clks, you should expect a rise/fall of bit_clk
active_clk |=> ##[0:25] $rose(bit_clk) && active_clk ##[0:25] $fell(bit_clk) && active_clk;
endproperty
assert property (clk_chk) else $display($time," Clks not generated");
I want to realize an algorithm in GPU using CUDA. At the same time, I write a CPU edition using C++ to verify the results of GPU edition. However I got into trouble when using log() in CPU and GPU. A very simple piece of algorithm (used both on CPU and GPU) is shown below:
float U;
float R = U * log(U);
However, when I compare the results on CPU side, I find that there are many results (459883 out of 1843161) having small differences (max dif is 0.5). Some results are shown below:
U -- R (CPU side) -- R (GPU side) -- R using Python (U * math.log(U))
86312.0 -- 980998.375000 -- 980998.3125 -- 980998.3627440572
67405.0 -- 749440.750000 -- 749440.812500 -- 749440.7721980268
49652.0 -- 536876.875000 -- 536876.812500 -- 536876.8452369706
32261.0 -- 334921.250000 -- 334921.281250 -- 334921.2605240216
24232.0 -- 244632.437500 -- 244632.453125 -- 244632.4440747978
Can anybody give me some suggestions? Which one should I trust?
Which one should I trust?
You should trust the double-precision result computed by Python, that you could also have computed with CUDA or C++ in double-precision to obtain very similar (although likely not identical still) values.
To rephrase the first comment made by aland, if you care about an error of 0.0625 in 980998, you shouldn't be using single-precision in the first place. Both the CPU and the GPU result are “wrong” for that level of accuracy. On your examples, the CPU result happens to be more accurate, but you can see that both single-precision results are quite distant from the more accurate double-precision Python result. This is simply a consequence of using a format that allows 24 significant binary digits (about 7 decimal digits), not just for the input and the end result, but also for intermediate computations.
If the input is provided as float and you want the most accurate float result for R, compute U * log(U) using double and round to float only in the end. Then the results will almost always be identical between CPU and GPU.
By curiosity, I compared the last bit set in the significand (or in other words the number of trailing zeros in the significand)
I did it with Squeak Smalltalk because I'm more comfortable with it, but I'm pretty sure you can find equivalent libraries in Python:
CPU:
#(980998.375000 749440.750000 536876.875000 334921.250000 244632.437500)
collect: [:e | e asTrueFraction numerator highBit].
-> #(23 22 23 21 22)
GPU:
#(980998.3125 749440.812500 536876.812500 334921.281250 244632.453125)
collect: [:e | e asTrueFraction numerator highBit].
-> #(24 24 24 24 24)
That's interestingly not as random as we could expect, especially the GPU, but there is not enough clue at this stage...
Then I used an ArbitraryPrecisionFloat package to perform (emulate) the operations in extended precision, then round to nearest single precision float, the correct answer matches quite exactly the one of CPU:
#( 86312 67405 49652 32261 24232 ) collect: [:e |
| u r |
u := e asArbitraryPrecisionFloatNumBits: 80.
r = u*u ln.
(r asArbitraryPrecisionFloatNumBits: 24) asTrueFraction printShowingMaxDecimalPlaces: 100]
-> #('980998.375' '749440.75' '536876.875' '334921.25' '244632.4375')
It works as well with 64 bits.
But if I emulate the operations in single precision, then I can say the GPU matches the emulated results quite well too (except the second item):
#( 86312 67405 49652 32261 24232 ) collect: [:e |
| u r |
u := e asArbitraryPrecisionFloatNumBits: 24.
r = u*u ln.
r asTrueFraction printShowingMaxDecimalPlaces: 100]
-> #('980998.3125' '749440.75' '536876.8125' '334921.28125' '244632.453125')
So I'd say the CPU did probably use a double (or extended) precision to evaluate the log and perform the multiplication.
On the other side, the GPU did perform all the operations in single precision. Then the log function of ArbitraryPrecisionFloat package is correct to half ulp, but that's not a requirement of IEEE 754, so that can explain the observed mismatch on second item.
You may try to write the code so as to force float (like using logf instead of log if it's C99, or use intermediate results float ln=log(u); float r=u*ln;) and eventually use appropriate compilation flags to forbid extended precision (can't remember, I don't use C every day). But then you have very few guaranty to obtain 100% match on log function, the norms are too lax.
I'm writing a gameboy emulator, and am struggling with making sure opcodes are emulated correctly. Certain operations set flag registers, and it can be hard to track whether the flag is set correctly, and where.
I want to write some sort of testing framework, but thought it'd be worth asking here for some help. Currently I see a few options:
Unit test each and every opcode with several test cases. Issues are there are 256 8 bit opcodes and 50+ (can't remember exact number) 16 bit opcodes. This would take a long time to do properly.
Write some sort of logging framework that logs a stacktrace at each operation and compares it to other established emulators. This would be pretty quick to do, and allows a fairly rapid overview of what exactly went wrong. The log file would look a bit like this:
...
PC = 212 Just executed opcode 7c - Register: AF: 5 30 BC: 0 13 HL: 5 ce DE: 1 cd SP: ffad
PC = 213 Just executed opcode 12 - Register: AF: 5 30 BC: 0 13 HL: 5 ce DE: 1 cd SP: ffad
...
Cons are I need to modify the source of another emulator to output the same form. And there's no guarantee the opcode is correct as it assumes the other emulator is.
What else should I consider?
Here is my code if it helps: https://github.com/dbousamra/scalagb
You could use already established test roms. I would recommend Blargg's test roms. You can get them from here: http://gbdev.gg8.se/files/roms/blargg-gb-tests/.
To me the best idea is the one you already mentioned:
take an existing emulator that is well known and you have the source code. let's call it master emulator
take some ROM that you can use to test
test these ROMs in the emulator that is known to work well.
modify the master emulator so it produces log while it is running for each opcode that it executes.
do the same in your own emulator
compare the output
I think this one has more advantage:
you will have the log file from a good emulator
the outcome of the test can be evaluated much faster
you can use more than one emulator
you can go deeper later like putting memory to the log and see the differences between the two implementations.