possible to use htonl on a string - c++

I want to send a string over a socket, but have to worry about endianness. The only way I know how to fix this is using htonl, but to my knowledge that only works on strings. How can I send a string over a socket?

I don't believe you have to do anything for strings (I'm assuming char* UTF-8) since characters are only one byte in length, so you do not have to worry about endianness. You only need to worry about byte ordering if your data is more than 1 byte in length (e.g. short, long, etc.).
Here's a wiki article explaining this topic in much greater detail.

Related

What is the equivalent of an Python ASCII encoded byte string in C/C++ when dealing with sockets?

I have a mediocre background on C and C++ socket programming.
For the time being as part of a project I had to check socket programming in Python 3.
I was taking a look at the following example :
https://support.mecademic.com/support/solutions/articles/64000253388-python-example-simple-tcp-ip-socket-client
Regarding the command client.send(bytes(cmd+'\0','ascii'))
I understand that if one works with low level data connections like network sockets Python 3 transfers data as byte strings: data type bytes.
All of a sudden I was thinking what how this command would be implemented by a c/c++ client but I got confused with the encoding/decoding part of the buffers.
Let's take for example the following clients which are based on poco library
https://searchcode.com/codesearch/view/25852263/
In all the examples the buffers are char arrays.
What I'd like to ask is that if someone would like to replace the python client with a
c++ like one of the above, then the sending and receiving part remains the same ?
e.x
int n = ss.sendBytes("hello", 5);
char buffer[256];
n = ss.receiveBytes(buffer, sizeof(buffer));
Meaning that by default the C char arrays are ascii encoded?
Or should before sending the buffer to convert for example the ascii string to byte array as in the following example:
https://www.includehelp.com/c/convert-ascii-string-to-byte-array-in-c.aspx
Accordingly the reception needs any decoding as well ?
Please tell me if I am understanding your question correctly. You are trying to send and receive bytes. If your API takes a char[] array, then you should not convert the individual characters into bytes.
For example, you would pass "C8ICS33" not "43 38 49 43 53 33 33".
Sometimes in c++, API will take a std::uint8_t[] or std::byte[] array instead of a character. std::uint8_t is just an unsigned integer the size of a byte. std::byte is a bit more complex, but usually no one uses them.
So TL;DR: don't convert your char[] to "bytes"
Yes, in C and C++ the char arrays typically hold ASCII (or UTF-8, which can be thought of as a backward-compatible superset of ASCII as far as C-style strings are concerned) by default, so there isn’t any need to encode or decode those types of strings.

What is the data type of content sent over socket?

When using Berkeley socket api, what is the data type of content that is sent over the read/send or write/recv calls? For example -
char *msg = "Our Message!";
int len, bytes_sent;
len = strlen(msg);
bytes_sent = send(sockfd, msg, len, 0);
in this code, we are using char type, but are we limited to just char type since send/write/sendto usually take void * type. I've also seen arguments like if we send some int, it might actually be stored in little endian/big endian causing problems b/w source/dest if their endianess don't match. Then why doesn't char type suffers from this problem too?
Also different languages like C and C++ have different size of char too, then why isn't this a problem? If socket doesn't care any type and just sees the content as buffer, why don't we see random corruption of data when different tcp servers/clients are written in different languages and communicate with each other?
In short, what values(type) can I send safely through sockets?
You cannot safely send anything through a raw socket and expect the receiver to make sense of it. For example, the sending process might be on a machine where the character encoding is EBCDIC, and the receiving process might be on a machine where the character encoding was ASCII. It's up to the processes to either negotiate a protocol to sort this out, or to simply say in their specifications "We are using ASCII (or whatever)".
Once you have got the character encodings worked out, transmit the data in text is my advice. This avoids all endian problems, and is easier to debug and log.
The simplest answer is that the data is an uninterpreted stream of octets, that is to say 8-bit bytes. Any interepretation of it is done by the sender and receiver, and they better agree. You certainly need to take both the size and endianness of integers into account, and compiler alignment and padding rules too. This is why for example you should not use C structs as network protocols.

Send array of integers (winsock)

I have a client/server program I'm writing, I need to send information from client to server using the function send(SOCKET s, const char *buf, int len, int flags);
but apparently this function is made to said a string, array of characters, but what I'm sending is encrypter message and the characters values large and character type isn't enough.
is there another function that allows me to do so ? I looked the reference in microsoft website but I didn't get the other functions.
if there's another function I hope you can show me how or give me hints, or if there's another way to do then so be it.
Notes: I'm working with C++ under Windows 8, using Borland 6
This might be tricky to explain.
Your issue isn't in the function you're using, but in the concept you're trying to apply.
First of all, if your data is intended to be transmitted through network, you must assume that the destination endpoint endianness may differ from the transmitting endpoint.
With that in mind, it's advisable to convert the eligible data types prone to endianness interpretation to network byte order before transmitting any data. Take a look at the htons(), htonl(), ntohs() and ntohl() functions.
As you must deal with known data sizes, instead of declaring your array as int[], you should declare it through a stdint.h type, such as int16_t, int32_t, uint16_t, etc.
So, lets assume you've the following:
uint32_t a[4] = { 1, 2, 3, 4 };
If you want to transmit this array in a portable way, you should first convert its contents to network byte order:
uint32_t a_converted[4];
for (int i = 0; i < sizeof(a); i ++)
a_converted[i] = htonl(a[i]);
Now, if you want to transmit this array, you can do it using:
send(s, (char *) a_converted, sizeof(a_converted), flags);
Just remember that the code for receiving this data, should convert it from network byte order to host byte order, using, in this case, the ntohl() for each element received.
Hope this gives you some clues for further research.
Well doodleboodle, guess what, if you read the TCP RFC, you might under stand that the TCP protocol only transfers OCTET STREAMS and, if you need to transfer anything more complex than one byte, you need a protocol on top of TCP that defines your Application Protocol Unit message type.
send(SOCKET s, const char *buf, int len, int flags); is basically the way to do it.
It uses binary data in bytes to send the data. So if you want to send a complex structure/object, you'll need to serialize it to a byte array first.
In your case with the integers it's quite simple: just convert the integer array to a byte array. (keep track of the length though).
Of course it's more appropriate to build an abstraction layer on top of your TCP layer so it's easier to send/receive different kinds of data.

What's the standard-defined endianness of std::wstring?

I know the UTF-16 has two types of endiannesses: big endian and little endian.
Does the C++ standard define the endianness of std::wstring? or it is implementation-defined?
If it is standard-defined, which page of the C++ standard provide the rules on this issue?
If it is implementation-defined, how to determine it? e.g. under VC++. Does the compiler guarantee the endianness of std::wstring is strictly dependent on the processor?
I have to know this; because I want to send the UTF-16 string to others. I must add the correct BOM in the beginning of the UTF-16 string to indicate its endianness.
In short: Given a std::wstring, how should I reliably determine its endianness?
Endianess is MACHINE dependent, not language dependent. Endianess is defined by the processor and how it arranges data in and out of memory. When dealing with wchar_t (which is wider than a single byte), the processor itself upon a read or write aligns the multiple bytes as it needs to in order to read or write it back to RAM again. Code simply looks at it as the 16 bit (or larger) word as represented in a processor internal register.
For determining (if that is really what you want to do) endianess (on your own), you could try writing a KNOWN 32 bit (unsigned int) value out to ram, then read it back using a char pointer. Look for the ordering that is returned.
It would look something like this:
unsigned int aVal = 0x11223344;
char * myValReadBack = (char *)(&aVal);
if(*myValReadBack == 0x11) printf("Big endian\r\n");
else printf("Little endian\r\n");
Im sure there are other ways, but something like the above should work, check my little versus big though :-)
Further, until Windows RT, VC++ really only compiled to intel type processors. They really only have had 1 endianess type.
It is implementation-defined. wstring is just a string of wchar_t, and that can be any byte ordering, or for that matter, any old size.
wchar_t is not required to be UTF-16 internally and UTF-16 endianness does not affect how wchar's are stored, it's a matter of saving and reading it.
You have to use an explicit procedure of converting wstring to a UTF-16 bytestream before sending it anywhere. Internal endianness of wchar is architecture-dependent and it's better to use some opaque interfaces for converting than try to convert it manually.
For the purposes of sending the correct BOM, you don't need to know the endianness. Just use the code \uFEFF. That will be bigendian or little-endian depending on the endianness of your implementation. You don't even need to know whether your implementation is UTF-16 or UTF-32. As long as it is some unicode encoding, you'll end up with the appropriate BOM.
Unfortunately, neither wchars nor wide streams are guaranteed to be unicode.

C++ Byte order in socket programming

In C++ we send data using socket on the network. I am aware that we need to use htons() , ntohs() function to maintain byte order big endian and little endian.
support we have following data to be sent
int roll;
int id;
char name[100];
This can also be wrapped into struct.
My confusion here is, for roll and id, we can use htons() function. But for the string name, what should and how should we do it? Do we need to use any such function? will it work on every machine like mac, intel and other network?
I want to send all three fields in one packet.
You'd use htonl for int, not htons.
The name doesn't need to be reordered, since the bytes of the array correspond directly to bytes on the network.
The issue of byte-order only arises with words larger than a byte, since different architectures choose different ends at which to place the least-significant byte.
For char arrays this conversion is not necessary since they do not have a network byte order but are sequentially transmitted. The reason that ntohs and htons exist, is that some data types consist of lesser and more significant bits, which are interpreted differently on different architectures. This is not the case in strings.
To add to helpful comments here - if your structs get much more complex you could be better off considering a serialization library like Boost.Serialization or Google Protocol Buffers, which handle endianness for you under the covers.
When encoding the string, make sure you send a length (probably a short handled using htons) before the string itself, don't just send 100 chars every time.