What will this C++ function do when run? - c++

Can anybody tell me what the following function send_signal will do in real terms? I'm assuming it will send 1 million bytes of data to COM2, but that doesn't appear to match up with its real world implementation.
Update. The hardware on COM2 is a robotic arm and the function in question triggers the arm. However, the arm action only lasts for circa 10 seconds - but I'm guessing that 1 million bytes of data will take longer than 10 seconds to send and thus would trigger multiple actions (but it doesn't).
#define TXDATA 0x2F8 // COM2
#define SIGNAL 0x00 // This value gives +12V for 1 milli second
void send_signal(void)
{
long count;
for (count = 0; count < 1000000; count++)
_outp(TXDATA, (char) (SIGNAL + SIGNAL));
}

On an appropriate embedded system, it will indeed place one million bytes in the COM2 transmitter holding register. Since it does so faster than the port can actually transmit them, most of them will be lost. To actually send all one million bytes, it is necessary to read the status register and check for bit 5 (THR is empty) before copying each byte.
On a computer running a modern OS, it will fault because userspace is not permitted direct access to I/O ports, it has to go through a driver.

It calls the _outp function with 0x2F8 as first argument and 0x00 as second argument one milion times.
Without context one can only speculate. My guess would be for it to set a high state on COM2 for a certain amount of time.

this code send 1000000 of 0x00 in the 0x2F8 port
attention this _outp functions is obsolete

Related

I2C communication between RP2040 and adxl357 accelerometer ( C/C++ SDK )

I need to communicate via I2C to the adxl357 accelerometer and a few questions have arisen.
Looking at the RP2040 sdk documentation I see that there is a special method to send data to a certain address, such as i2c_write_blocking(). Its arguments include a 7-bit address and the data to be sent. My question is, since the accelerometer needs a Read/Write bit, is it still possible to use this function? Or should I go to the alternative i2c_write_raw_blocking()?
Also, I don't understand the notation of the Read / Write bit, it is reported with R/#W, would that mean that 1 is Read while 0 is write?
Thanks in advance for the help.
I2C addresses have 7 bits: these are sent in the high 7 bits of an 8-bit byte, and remaining bit (the least significant bit) is set to 1 for read, 0 for write.
The reason the documentation says it wants a 7-bit address is because it is telling you that the write function will left-shift the address by one and add a 1, and the read function function will left-shift the address by one and add a 0.
If it didn't tell you this you might pre-shift the address yourself, which would be wrong.

C++ network sockets, SCTP and packet size

I'm currently developing a server using connection-oriented SCTP to serve a small number of clients. After finishing the first prototype with a naive implementation, I'm now profiling the application to optimize. As it turns out, one of the two main consumers of CPU time is the networking part.
There are two questions about the efficiency of the application-level protocol I have implemented:
1) Packet size
Currently, I use a maximum packet size of 64 bytes. You can find many posts discussing packet sizes that are too big, but can they be too small? As SCTP allows me to read one packet at a time - similarly to UPD - while guaranteeing in-order delivery - similarly to TCP - this simplified implementation significantly. However, if I understand correctly, this will cost one syscall for each and every time that I send a packet. Does the amount of syscalls have a significant impact on performance? Would I be able to shave off a lot of CPU cycles by sending the messages in bunches in bigger packets, i.e. 1024 - 8192 bytes?
2) Reading and writing the buffers
I'm currently using memcpy to move data into and out of the application-level network buffers. I found many conflicting posts about what is more efficient, memcpy or normal assignment. I'm wondering if one approach will be significantly faster than the other in this scenario:
Option 1
void Network::ReceivePacket(char* packet)
{
uint8_t param1;
uint16_t param2
uint32_t param3;
memcpy(&param1, packet, 1);
memcpy(&param2, packet+1, 2);
memcpy(&param3, packet+3, 4);
// Handle the packet here
}
void Network::SendPacket(uint8_t param1, uint16_t param2, uint32_t param3)
{
char packet[7]
memcpy(&packet, &param1, 1);
memcpy(&packet+1, &param2, 2);
memcpy(&packet+3, &param3, 4);
// Send the packet here
}
Option 2
void Network::ReceivePacket(char* packet)
{
uint8_t param1;
uint16_t param2
uint32_t param3;
param1 = *((uint8_t*)packet);
param2 = *((uint16_t*)packet+1);
param3 = *((uint32_t*)packet+3);
// Handle the packet here
}
void Network::SendPacket(uint8_t param1, uint16_t param2, uint32_t param3)
{
char packet[7]
*((uint8_t*)packet) = param1;
*((uint16_t*)packet+1) = param2;
*((uint32_t*)packet+3) = param3;
// Send the packet here
}
The first one seems a lot cleaner to me, but I've found many posts indicating that maybe the second one is quite a bit faster.
Any kind of feedback is of course welcome.
As far as I know compilers optimize memcpy calls in particular so you should probably use it.
About your first question:
Summary: Make your packet size as big as you can and avoid the possibility of having an enlowered CPU performance.
A syscall, a system call, is your OS replying or processing your request and every time your request is being executed in kernel, which is a moderate amount of work.
To be honest I am not familiar with the SCTP concept, as a matter of fact I haven't dealt with socket programming since the last time I worked on some stuff and created a server via TCP. I remember the MTU for the relevant physical layer element was 1500, I also recall implementing my packet size as 1450-1460, as I was trying to get the maximum packet size underneath the 1500 cap.
So what Im saying is If I were you I would want my OS to be less active as it could so I wont get any trouble with CPU performance.
If you're looking to minimize the number of system calls, and you do find yourself sending and receiving multiple messages at a time, you might want to look at using the (Linux only) sendmmsg() and recvmmsg().
To use these, you would likely need to enqueue messages internally, which might add latency that would not exist otherwise.
I wouldn't go over 1024 for a buffer size personally. I've experienced some runtime problems when using packets over 1500, but 1024 is of course 4 to the 5th power making it wonderful to use.
It is possible, but I wouldn't advise it. I would make a separate thread for receiving packets, using recvmsg() so you can use multiple streams. I've found this to work wonderfully.
The main point of SCTP is multiple streams, so I would take full advantage of that. You just have to ensure that the data is put back in the correct order once everything is received (which takes some work to do).

Fetching variable length opcodes and CPU timing

I'm currently trying to write a NES emulator in C++ as a summer programming project to get ready for fall term next school year (I haven't coded in a while). I've already written a Chip8 emulator, so I thought the next step would be to try and write a NES emulator.
Anyways, I'm getting stuck. I'm using this website for my opcode table and I'm running into a road block. On the Chip8, all opcodes were two bytes long, so they were easy to fetch. However, the NES seems to have either 2 or 3 byte opcodes depending on what addressing mode the CPU is in. I can't think of any easy way to figure out how many bytes I need to read for each opcode (my only idea was to create really long if statements that check the first byte of the opcode to see how many more bytes to read).
I'm also having trouble with figuring how to count cycles. How do I create a clock within a programming language so that everything is in sync?
On an unrelated side note, since the NES is little-endian, do I need to read programCounter + 1 and then read programCounter to get the correct opcode?
However, the NES seems to have either 2 or 3 byte opcodes depending on what addressing mode the CPU is in. I can't think of any easy way to figure out how many bytes I need to read for each opcode.
The opcode is still only one byte. The extra bytes specify the operands for those instructions that have explicit operands.
To do the decoding, you can create a switch-block with 256 cases (actually it won't be 256 cases, because some opcodes are illegal). It could look something like this:
opcode = ReadByte(PC++);
switch (opcode) {
...
case 0x4C: // JMP abs
address = ReadByte(PC++);
address |= (uint16_t)ReadByte(PC) << 8;
PC = address;
cycles += 3;
break;
...
}
The compiler will typically create a jump table for the cases, so you'll end up with fairly efficient (albeit slightly bloated) code.
Another alternative is to create an array with one entry per opcode. This could simply be an array of function pointers, with one function per opcode - or the table could contain a pointer to one function for fetching the operands, one for performing the actual operation, plus information about the number of cycles that the instruction requires. This way you can share a lot of code. An example:
const Instruction INSTRUCTIONS[] =
{
...
// 0x4C: JMP abs
{&jmp, &abs_operand, 3},
...
};
I'm also having trouble with figuring how to count cycles. How do I create a clock within a programming language so that everything is in sync?
Counting CPU cycles is just a matter of incrementing a counter, like I showed in my code examples above.
To sync video with the CPU, the easiest way would be to run the CPU for the amount of cycles corresponding to the active display period of a single scanline, then draw one scanline, then run the CPU for the amount of cycles correspond to the horizontal blanking period, and start over again.
When you start involving audio, how you sync things can depend a bit on the audio API you're using. For example, some APIs might send you a callback to which you respond by filling a buffer with samples and returning the number of samples generated. In this case you could calculate the number of CPU cycles that have been emulated since the previous callback and determine how many samples to generate based on that.
On an unrelated side note, since the NES is little-endian, do I need to read programCounter + 1 and then read programCounter to get the correct opcode?
Since the opcode is a single byte and instructions on the 6502 aren't packed into a word like on some other CPU architectures, endianness doesn't really matter. It does become relevant for 16-bit operands, but on the other hand PCs and most mobile phones are also based on little-endian CPUs.
I wrote an emulator for 6502 some 25+ years back.
It's a pretty simple processor, so either a table of function pointers or a switch, with 256 entries for the bytes [the switch can be a bit shorter, since there aren't valid opcodes in all 256 entries, only about 200 of the opcodes are actually used].
Now, if you want to write a simulator that exactly simulates the timing of the instructions, then you'll have more fun. You basically will have to simulate much more of how each component works, and "ripple" through the units with a clock. This is quite a lot of work, so I would probably, if at all possible, ignore the timing, and just let the system's speed depend on the emulators speed.

Best practice to implement fixed byte serial protocol in C++?

I have a device connected via serial interface to a BeagleBone computer. I communicates in a simple binary format like
|MessagID (1 Byte) | Data (n Bytes) | checksum (2 bytes) |
The message length is fixed for each command, meaning that it is known how many bytes to read after the First byte of a command was received. After some initial setup communication it sends packets of data every 20 ms.
My approach would be to use either termios or something like serial lib and then start a loop doing like that (a:
while(keepRunning)
{
char* buffer[256];
serial.read(buffer, 1)
switch(buffer[0])
{
case COMMAND1:
serial.read(&buffer[1], sizeof(MessageHello)+2); //Read data + checksum
if (calculateChecksum(buffer, sizeof(MessageHello)+3) )
{
extractDatafromCommand(buffer);
}
else
{
doSomeErrorHandling(buffer[0]);
}
break;
case COMMAND2:
serial.read(&buffer[1], sizeof(MessageFoo)+2);
[...]
}
}
extractDatafromCommand would then create some structs like:
struct MessageHello
{
char name[20];
int version;
}
Put everything in an own read thread and signal the availability of a new packet to other parts of the program using a semaphore (or a simple flag).
Is this a viable solution or are there better improvements to do (I assume so)?
Maybe make a abstract class Message and derive the other messages?
It really depends. The two major ways would be threaded (like you mentioned) and evented.
Threaded code is tricky because you can easily introduce race conditions. Code that you tested a million times could occasionally stumble and do the wrong thing after working for days or weeks or years. It's hard to 'prove' that things will always behave correctly. Seemingly trivial things like "i++" suddenly become leaky abstractions. (See why is i++ not thread safe on a single core machine? )
The other alternative is evented programming. Basically, you have a main loop that does a select() on all your file handles. Anything that is ready gets looked at, and you try to read/write as many bytes as you can without blocking. (pass O_NONBLOCK). There are two tricky parts: 1) you must never do long calculations without having a way to yield back to the main loop, and 2) you must never do a blocking operation (where the kernel stops your process waiting for a read or write).
In practice, most programs don't have long computations and it's easier to audit a small amount of your code for blocking calls than for races. (Although doing DNS without blocking is trickier than it should be.)
The upside of evented code is that there's no need for locking (no other threads to worry about) and it wastes less memory (in the general case where you're creating lots of threads.)
Most likely, you want to use a serial lib. termios processing is just overhead and a chance for stray bytes to do bad things.

Python and C++ Sockets converting packet data

First of all, to clarify my goal: There exist two programs written in C in our laboratory. I am working on a Proxy Server (bidirectional) for them (which will also mainpulate the data). And I want to write that proxy server in Python. It is important to know that I know close to nothing about these two programs, I only know the definition file of the packets.
Now: assuming a packet definition in one of the C++ programs reads like this:
unsigned char Packet[0x32]; // Packet[Length]
int z=0;
Packet[0]=0x00; // Spare
Packet[1]=0x32; // Length
Packet[2]=0x01; // Source
Packet[3]=0x02; // Destination
Packet[4]=0x01; // ID
Packet[5]=0x00; // Spare
for(z=0;z<=24;z+=8)
{
Packet[9-z/8]=((int)(720000+armcontrolpacket->dof0_rot*1000)/(int)pow((double)2,(double)z));
Packet[13-z/8]=((int)(720000+armcontrolpacket->dof0_speed*1000)/(int)pow((double)2,(double)z));
Packet[17-z/8]=((int)(720000+armcontrolpacket->dof1_rot*1000)/(int)pow((double)2,(double)z));
Packet[21-z/8]=((int)(720000+armcontrolpacket->dof1_speed*1000)/(int)pow((double)2,(double)z));
Packet[25-z/8]=((int)(720000+armcontrolpacket->dof2_rot*1000)/(int)pow((double)2,(double)z));
Packet[29-z/8]=((int)(720000+armcontrolpacket->dof2_speed*1000)/(int)pow((double)2,(double)z));
Packet[33-z/8]=((int)(720000+armcontrolpacket->dof3_rot*1000)/(int)pow((double)2,(double)z));
Packet[37-z/8]=((int)(720000+armcontrolpacket->dof3_speed*1000)/(int)pow((double)2,(double)z));
Packet[41-z/8]=((int)(720000+armcontrolpacket->dof4_rot*1000)/(int)pow((double)2,(double)z));
Packet[45-z/8]=((int)(720000+armcontrolpacket->dof4_speed*1000)/(int)pow((double)2,(double)z));
Packet[49-z/8]=((int)armcontrolpacket->timestamp/(int)pow(2.0,(double)z));
}
if(SendPacket(sock,(char*)&Packet,sizeof(Packet)))
return 1;
return 0;
What would be the easiest way to receive that data, convert it into a readable python format, manipulate them and send them forward to the receiver?
You can receive the packet's 50 bytes with a .recv call on a properly connected socked (it might actually take more than one call in the unlikely event the TCP packet gets fragmented, so check incoming length until you have exactly 50 bytes in hand;-).
After that, understanding that C code is puzzling. The assignments of ints (presumably 4-bytes each) to Packet[9], Packet[13], etc, give the impression that the intention is to set 4 bytes at a time within Packet, but that's not what happens: each assignment sets exactly one byte in the packet, from the lowest byte of the int that's the RHS of the assignment. But those bytes are the bytes of (int)(720000+armcontrolpacket->dof0_rot*1000) and so on...
So must those last 44 bytes of the packet be interpreted as 11 4-byte integers (signed? unsigned?) or 44 independent values? I'll guess the former, and do...:
import struct
f = '>x4bx11i'
values = struct.unpack(f, packet)
the format f indicates: big-endian, 4 unsigned-byte values surrounded by two ignored "spare" bytes, 11 4-byte signed integers. Tuple values ends up with 15 values: the four single bytes (50, 1, 2, 1 in your example), then 11 signed integers. You can use the same format string to pack a modified version of the tuple back into a 50-bytes packet to resend.
Since you explicitly place the length in the packet it may be that different packets have different lenghts (though that's incompatible with the fixed-length declaration in your C sample) in which case you need to be a bit more accurate in receiving and unpacking it; however such details depend on information you don't give, so I'll stop trying to guess;-).
Take a look at the struct module, specifically the pack and unpack functions. They work with format strings that allow you to specify what types you want to write or read and what endianness and alignment you want to use.