Bit manipulation on character string

Bit manipulation on character string - c++

Can we apply bit manipulation on a character string?
If so, is it always possible to retrieve back a character string from the manipulated string?
I was hoping to use the XOR operator on two strings by converting them to binary and then back to character string.
I took up some code from another StackOverflow question but it only solves half the problem
std::string TextToBinaryString(string words)
{
string binaryString = "";
for (char& _char : words)
{
binaryString +=std::bitset<8>(_char).to_string();
}
return binaryString;
}
I don't know how to convert this string of ones and zeroes back to a string of characters.
I did read std::stio in some google search results as a solution but was not able to understand them.
The manipulation that I wish to do is
std::string message("Hello World");
int n = message.size();
bin_string = TextToBinaryString(message)
std::string left,right;
bin_string.copy(left,n/2,0);
bin_string.copy(right,n,n/2);
std::string result = left^right;
I know I can hardcode this by picking up every entry and applying the operation but it is the conversion of the binary string back to characters that are making me scratch my head.
*EDIT: *I am trying to implement a cipher framework called Feistel cipher (SORRY, should had made that clear before) there they use the property of XOR that when you XOR something with the same thing again it cancels out... For eg. (A^B)^B=A. I wanted to output the ciphered jibberish in the middle. Hence, the query.

Can we apply bit manipulation on a character string?
Yes.
A character is an integer type, so you can do anything to them you can do to any other integer. What happened when you tried?
If so, is it always possible to retrieve back a character string from the manipulated string?
No. It is sometimes possible to recover the original string, but some manipulations are not reversible.
XOR, the particular operation you asked about, is self-reversing, so it works in that case but not in general.
A cheesy example (depends on ASCII character set, don't do this in real code for converting case, etc. etc.)
#include <iostream>
#include <string>
int main() {
std::string s("a");
std::cout << "original: " << s << '\n';
s[0] ^= 0x20;
std::cout << "modified: " << s << '\n';
s[0] ^= 0x20;
std::cout << "restored: " << s << '\n';
}
shows (on an ASCII-compatible) system
original: a
modified: A
restored: a
Note that I'm not converting "a" into "1100001" first, and then using XOR (somehow) zero bit 5 giving "1000001" and then converting that back into "A". Why would I?
This part of your question suggests you don't understand the difference between values and representations: the character is always stored in binary. You can also always treat it as if it is stored in octal, or in decimal, or in hexadecimal - the choice of base only affects how we write (or print) the value, and not what the value is in itself.
Writing a Feistel cipher where the plaintext and key are the same length is trivial:
std::string feistel(std::string const &text, std::string const &key)
{
std::string result;
std::transform(text.begin(), text.end(), key.begin(),
std::back_inserter(result),
[](char a, char b) { return a^b; }
);
return result;
}
This doesn't work at all if the key is shorter, though - looping round the key appropriately is left as an exercise for the reader.
Oh, and printing the encoded string is unlikely to work nicely (unless your key is helpfully just a sequence of space characters, as above).

You probably want something like this:
#include<string>
#include<cassert>
using namespace std;
std::string someBitmanipulation(string words)
{
std::string manipulatedstring;
for (char& thechar : words)
{
thechar ^= 0x5A; // xor with 0x5A
}
return manipulatedstring;
}
int main()
{
std::string original{ "ABC" };
// xor each char of original with 0x5a at put result into manipulated
auto manipulated = someBitmanipulation(original);
// check if manipulating the manipulated string is the same as the original string
assert(original == someBitmanipulation(manipulated));
}
You don't need std::bitset at all.
Now change thechar ^= 0x5A; to say thechar |= 0x5A; and see what happens.

Related

bulletproof use of from_chars()

I have some literal strings which I want to convert to integer and even double. The base is 16, 10, 8, and 2.
At this time, I wonder about the behavior of std::from_chars() - I try to convert and the error code inside from_chars_result return holds success - even if it isn't as shown here:
#include <iostream>
#include <string_view>
#include <charconv>
using namespace std::literals::string_view_literals;
int main()
{
auto const buf = "01234567890ABCDEFG.FFp1024"sv;
double d;
auto const out = std::from_chars(buf.begin(), buf.end(), d, std::chars_format::hex);
if(out.ec != std::errc{} || out.ptr != buf.end())
{
std::cerr << buf << '\n'
<< std::string(std::distance(buf.begin(), out.ptr), ' ') << "^- here\n";
auto const ec = std::make_error_code(out.ec);
std::cerr << "err: " << ec.message() << '\n';
return 1;
}
std::cout << d << '\n';
}
gives:
01234567890ABCDEFG.FFp1024
^- here
err: Success
For convenience also at coliru.
In my use case, I'll check the character set before but, I'm not sure about the checks to make it bulletproof. Is this behavior expected (maybe my English isn't sufficient, or I didn't read carefully enough)? I've never seen such checks on iterators on blogs etc.
The other question is related to different base like 2 and 8. Base of 10 and 16 seems to be supported - what would be the way for the other two bases?
Addendum/Edit:
Bulletproof here means that I can have nasty things in the string. The obvious thing for me is that 'G' is not a hex character. But I would have expected an appropriate error code in some way! The comparison out.ptr != buf.end() I've never seen in blogs (or I didn't read the right ones :)
If I enter a crazy long hex float, at least a numerical result out of range comes up.
By bulletproof I also mean that I can find such impossible strings by length, for example, so that I can save myself the call to from_chars() - for float/doubles and integers (here I would 'strlen' compare digits10 from std::numeric_limits).

The from_chars utility is designed to convert the first number it finds in the string and to return a pointer to the point where it stopped. This allows you to parse strings like "42 centimeters" by first converting the number and then parsing the rest of the string yourself for what comes after it.
The comparison out.ptr != buf.end() I've never seen in blogs (or I didn't read the right ones :)
If you know that the entire string should be a number, then checking that the pointer in the result points to the end of the string is the normal way to ensure that from_chars read the entire string.

Proper way to convert HEX to ASCII read from a file C++

In my code bellow CODE 1 reading HEX from a file and storing in in string array won't convert it to ASCII when printed out.
#include <iostream>
#include <sstream>
#include <fstream>
int main(int argc, char** argv)
{
// CODE 1
std::ifstream input("C:\\test.txt"); // The test.txt contains \x48\x83\xEC\x28\x48\x83
std::stringstream sstr;
input >> sstr.rdbuf();
std::string test = sstr.str();
std::cout << "\nString from file: " << test;
//char* lol = new char[test.size()];
//memcpy(lol, test.data(), test.size());
////////////////////////////////////////////////////////
// CODE 2
std::string test_2 = "\x48\x83\xEC\x28\x48\x83";
std::cout << "\n\nHardcoded string: " << test_2 << "\n";
// Prints as ASCII "H(H" , which I want my CODE 1 to do.
}
In my CODE 2 sample, same HEX is used and it prints it as ASCII. Why is it not the same for CODE 1?

Okay, it looks like there is some confusion. First, I have to ask if you're SURE you know what is in your file.
That is, does it contain, oh, it looks like about 20 characters:
\
x
4
8
et cetera?
Or does it contain a hex 48 (one byte), a hex 83 (one byte), for a total of 5-ish characters?
I bet it's the first. I bet your file is about 20 characters long and literally contains the string that's getting printed.
And if so, then the code is doing what you expect. It's reading a line of text and writing it back out. If you want it to actually interpret it like the compiler does, then you're going to have to do the steps yourself.
Now, if it actually contains the hex characters (but I bet it doesn't), then that's a little different problem, and we'll have to look at that. But I think you just have a string of characters that includes \x in it. And reading / writing that isn't going to automatically do some magic for you.

When you read from file, the backslash characters are not escaped. Your test string from file is literally an array of chars: {'\\', 'x', '4', '8', ... }
Whereas your hardcoded literal string, "\x48\x83\xEC\x28\x48\x83"; is fully hex escaped by the compiler.
If you really want to store your data as a text file as a series of "backslash x NN" sequences, you'll need to convert after you read from file. Here's a hacked up loop that would do it for you.
std::string test = sstr.str();
char temp[3] = {};
size_t t = 0;
std::string corrected;
for (char c : test)
{
if (isxdigit(c))
{
temp[t] = c;
t++;
if (t == 2)
{
t = 0;
unsigned char uc = (unsigned char)strtoul(tmp, nullptr, 16);
corrected += (char)uc;
}
}
}

You can split the returned string in \x then make casting from string to int,
finally casting to char.
this resource will be helpful
strtok And convert

How do I make an alphabetized list of all distinct words in a file with the number of times each word was used?

I am writing a program using Microsoft Visual C++. In the program I must read in a text file and print out an alphabetized list of all distinct words in that file with the number of times each word was used.
I have looked up different ways to alphabetize a string but they do not work with the way I have my string initialized.
// What is inside my text file
Any experienced programmer engaged in writing programs for use by others knows
that, once his program is working correctly, good output is a must. Few people
really care how much time and trouble a programmer has spent in designing and
debugging a program. Most people see only the results. Often, by the time a
programmer has finished tackling a difficult problem, any output may look
great. The programmer knows what it means and how to interpret it. However,
the same cannot be said for others, or even for the programmer himself six
months hence.
string lines;
getline(input, lines); // Stores what is in file into the string
I expect an alphabetized list of words with the number of times each word was used. So far, I do not know how to begin this process.

It's rather simple, std::map automatically sorts based on key in the key/value pair you get. The key/value pair represents word/count which is what you need. You need to do some filtering for special characters and such.
EDIT: std::stringstream is a nice way of splitting std::string using whitespace delimiter as it's the default delimiter. Therefore, using stream >> word you will get whitespace-separated words. However, this might not be enough due to punctuation. For example: Often, has comma which we need to filter out. Therefore, I used std::replaceif which replaces puncts and digits with whitespaces.
Now a new problem arises. In your example, you have: "must.Few" which will be returned as one word. After replacing . with we have "must Few". So I'm using another stringstream on the filtered "word" to make sure I have only words in the final result.
In the second loop you will notice if(word == "") continue;, this can happen if the string is not trimmed. If you look at the code you will find out that we aren't trimming after replacing puncts and digits. That is, "Often," will be "Often " with trailing whitespace. The trailing whitespace causes the second loop to extract an empty word. This is why I added the condition to ignore it. You can trim the filtered result and then you wouldn't need this check.
Finally, I have added ignorecase boolean to check if you wish to ignore the case of the word or not. If you wish to do so, the program will simply convert the word to lowercase and then add it to the map. Otherwise, it will add the word the same way it found it. By default, ignorecase = true, if you wish to consider case, just call the function differently: count_words(input, false);.
Edit 2: In case you're wondering, the statement counts[word] will automatically create key/value pair in the std::map IF there isn't any key matching word. So when we call ++: if the word isn't in the map, it will create the pair, and increment value by 1 so you will have newly added word. If it exists already in the map, this will increment the existing value by 1 and hence it acts as a counter.
The program:
#include <iostream>
#include <map>
#include <sstream>
#include <cstring>
#include <cctype>
#include <string>
#include <iomanip>
#include <algorithm>
std::string to_lower(const std::string& str) {
std::string ret;
for (char c : str)
ret.push_back(tolower(c));
return ret;
}
std::map<std::string, size_t> count_words(const std::string& str, bool ignorecase = true) {
std::map<std::string, size_t> counts;
std::stringstream stream(str);
while (stream.good()) {
// wordW may have multiple words connected by special chars/digits
std::string wordW;
stream >> wordW;
// filter special chars and digits
std::replace_if(wordW.begin(), wordW.end(),
[](const char& c) { return std::ispunct(c) || std::isdigit(c); }, ' ');
// now wordW may have multiple words seperated by whitespaces, extract them
std::stringstream word_stream(wordW);
while (word_stream.good()) {
std::string word;
word_stream >> word;
// ignore empty words
if (word == "") continue;
// add to count.
ignorecase ? counts[to_lower(word)]++ : counts[word]++;
}
}
return counts;
}
void print_counts(const std::map<std::string, size_t>& counts) {
for (auto pair : counts)
std::cout << std::setw(15) << pair.first << " : " << pair.second << std::endl;
}
int main() {
std::string input = "Any experienced programmer engaged in writing programs for use by others knows \
that, once his program is working correctly, good output is a must.Few people \
really care how much time and trouble a programmer has spent in designing and \
debugging a program.Most people see only the results.Often, by the time a \
programmer has finished tackling a difficult problem, any output may look \
great.The programmer knows what it means and how to interpret it.However, \
the same cannot be said for others, or even for the programmer himself six \
months hence.";
auto counts = count_words(input);
print_counts(counts);
return 0;
}
I have tested this with Visual Studio 2017 and here is the part of the output:
a : 5
and : 3
any : 2
be : 1
by : 2
cannot : 1
care : 1
correctly : 1
debugging : 1
designing : 1

As others have already noted, an std::map handles the counting you care about quite easily.
Iostreams already have a tokenize to break an input stream up into words. In this case, we want to to only "think" of letters as characters that can make up words though. A stream uses a locale to make that sort of decision, so to change how it's done, we need to define a locale that classifies characters as we see fit.
struct alpha_only: std::ctype<char> {
alpha_only(): std::ctype<char>(get_table()) {}
static std::ctype_base::mask const* get_table() {
// everything is white space
static std::vector<std::ctype_base::mask>
rc(std::ctype<char>::table_size,std::ctype_base::space);
// except lower- and upper-case letters, which are classified accordingly:
std::fill(&rc['a'], &rc['z'], std::ctype_base::lower);
std::fill(&rc['A'], &rc['Z'], std::ctype_base::upper);
return &rc[0];
}
};
With that in place, we tell the stream to use our ctype facet, then simply read words from the file and count them in the map:
std::cin.imbue(std::locale(std::locale(), new alpha_only));
std::map<std::string, std::size_t> counts;
std::string word;
while (std::cin >> word)
++counts[to_lower(word)];
...and when we're done with that, we can print out the results:
for (auto w : counts)
std::cout << w.first << ": " << w.second << "\n";

Id probably start by inserting all of those words into an array of strings, then start with the first index of the array and compare that with all of the other indexes if you find matches, add 1 to a counter and after you went through the array you could display the word you were searching for and how many matches there were and then go onto the next element and compare that with all of the other elements in the array and display etc. Or maybe if you wanna make a parallel array of integers that holds the number of matches you could do all the comparisons at one time and the displays at one time.

EDIT:
Everyone's answer seems more elegant because of the map's inherent sorting. My answer functions more as a parser, that later sorts the tokens. Therefore my answer is only useful to the extent of a tokenizer or lexer, whereas Everyone's answer is only good for sorted data.
You first probably want to read in the text file. You want to use a streambuf iterator to read in the file(found here).
You will now have a string called content, which is the content of you file. Next you will want to iterate, or loop, over the contents of this string. To do that you'll want to use an iterator. There should be a string outside of the loop that stores the current word. You will iterate over the content string, and each time you hit a letter character, you will add that character to your current word string. Then, once you hit a space character, you will take that current word string, and push it back into the wordString vector. (Note: that means that this will ignore non-letter characters, and that only spaces denote word separation.)
Now that we have a vector of all of our words in strings, we can use std::sort, to sort the vector in alphabetical order.(Note: capitalized words take precedence over lowercase words, and therefore will be sorted first.) Then we will iterate over our vector of stringWords and convert them into Word objects (this is a little heavy-weight), that will store their appearances and the word string. We will push these Word objects into a Word vector, but if we discover a repeat word string, instead of adding it into the Word vector, we'll grab the previous entry and increment its appearance count.
Finally, once this is all done, we can iterate over our Word object vector and output the word followed by its appearances.
Full Code:
#include <vector>
#include <fstream>
#include <iostream>
#include <streambuf>
#include <algorithm>
#include <string>
class Word //define word object
{
public:
Word(){appearances = 1;}
~Word(){}
int appearances;
std::string mWord;
};
bool isLetter(const char x)
{
return((x >= 'a' && x <= 'z') || (x >= 'A' && x <= 'Z'));
}
int main()
{
std::string srcFile = "myTextFile.txt"; //what file are we reading
std::ifstream ifs(srcFile);
std::string content( (std::istreambuf_iterator<char>(ifs) ),
( std::istreambuf_iterator<char>() )); //read in the file
std::vector<std::string> wordStringV; //create a vector of word strings
std::string current = ""; //define our current word
for(auto it = content.begin(); it != content.end(); ++it) //iterate over our input
{
const char currentChar = *it; //make life easier
if(currentChar == ' ')
{
wordStringV.push_back(current);
current = "";
continue;
}
else if(isLetter(currentChar))
{
current += *it;
}
}
std::sort(wordStringV.begin(), wordStringV.end(), std::less<std::string>());
std::vector<Word> wordVector;
for(auto it = wordStringV.begin(); it != wordStringV.end(); ++it) //iterate over wordString vector
{
std::vector<Word>::iterator wordIt;
//see if the current word string has appeared before...
for(wordIt = wordVector.begin(); wordIt != wordVector.end(); ++wordIt)
{
if((*wordIt).mWord == *it)
break;
}
if(wordIt == wordVector.end()) //...if not create a new Word obj
{
Word theWord;
theWord.mWord = *it;
wordVector.push_back(theWord);
}
else //...otherwise increment the appearances.
{
++((*wordIt).appearances);
}
}
//print the words out
for(auto it = wordVector.begin(); it != wordVector.end(); ++it)
{
Word theWord = *it;
std::cout << theWord.mWord << " " << theWord.appearances << "\n";
}
return 0;
}
Side Notes
Compiled with g++ version 4.2.1 with target x86_64-apple-darwin, using the compiler flag -std=c++11.
If you don't like iterators you can instead do
for(int i = 0; i < v.size(); ++i)
{
char currentChar = vector[i];
}
It's important to note that if you are capitalization agnostic simply use std::tolower on the current += *it; statement (ie: current += std::tolower(*it);).
Also, you seem like a beginner and this answer might have been too heavyweight, but you're asking for a basic parser and that is no easy task. I recommend starting by parsing simpler strings like math equations. Maybe make a calculator app.

How could I copy data that contain '\0' character

I'm trying to copy data that conatin '\0'. I'm using C++ .
When the result of the research was negative, I decide to write my own fonction to copy data from one char* to another char*. But it doesn't return the wanted result !
My attempt is the following :
#include <iostream>
char* my_strcpy( char* arr_out, char* arr_in, int bloc )
{
char* pc= arr_out;
for(size_t i=0;i<bloc;++i)
{
*arr_out++ = *arr_in++ ;
}
*arr_out = '\0';
return pc;
}
int main()
{
char * out= new char[20];
my_strcpy(out,"12345aa\0aaaaa AA",20);
std::cout<<"output data: "<< out << std::endl;
std::cout<< "the length of my output data: " << strlen(out)<<std::endl;
system("pause");
return 0;
}
the result is here:
I don't understand what is wrong with my code.
Thank you for help in advance.

Your my_strcpy is working fine, when you write a char* to cout or calc it's length with strlen they stop at \0 as per C string behaviour. By the way, you can use memcpy to copy a block of char regardless of \0.

If you know the length of the 'string' then use memcpy. Strcpy will halt its copy when it meets a string terminator, the \0. Memcpy will not, it will copy the \0 and anything that follows.

(Note: For any readers who are unaware that \0 is a single-character byte with value zero in string literals in C and C++, not to be confused with the \\0 expression that results in a two-byte sequence of an actual backslash followed by an actual zero in the string... I will direct you to Dr. Rebmu's explanation of how to split a string in C for further misinformation.)
C++ strings can maintain their length independent of any embedded \0. They copy their contents based on this length. The only thing is that the default constructor, when initialized with a C-string and no length, will be guided by the null terminator as to what you wanted the length to be.
To override this, you can pass in a length explicitly. Make sure the length is accurate, though. You have 17 bytes of data, and 18 if you want the null terminator in the string literal to make it into your string as part of the data.
#include <iostream>
using namespace std;
int main() {
string str ("12345aa\0aaaaa AA", 18);
string str2 = str;
cout << str;
cout << str2;
return 0;
}
(Try not to hardcode such lengths if you can avoid it. Note that you didn't count it right, and when I corrected another answer here they got it wrong as well. It's error prone.)
On my terminal that outputs:
12345aaaaaaa AA
12345aaaaaaa AA
But note that what you're doing here is actually streaming a 0 byte to the stdout. I'm not sure how formalized the behavior of different terminal standards are for dealing with that. Things outside of the printable range can be used for all kinds of purposes depending on the kind of terminal you're running... positioning the cursor on the screen, changing the color, etc. I wouldn't write out strings with embedded zeros like that unless I knew what the semantics were going to be on the stream receiving them.
Consider that if what you're dealing with are bytes, not to confuse the issue and to use a std::vector<char> instead. Many libraries offer alternatives, such as Qt's QByteArray

Your function is fine (except that you should pass to it 17 instead of 20). If you need to output null characters, one way is to convert the data to std::string:
std::string outStr(out, out + 17);
std::cout<< "output data: "<< outStr << std::endl;
std::cout<< "the length of my output data: " << outStr.length() <<std::endl;

I don't understand what is wrong with my code.
my_strcpy(out,"12345aa\0aaaaa AA",20);
Your string contains character '\' which is interpreted as escape sequence. To prevent this you have to duplicate backslash:
my_strcpy(out,"12345aa\\0aaaaa AA",20);
Test
output data: 12345aa\0aaaaa AA
the length of my output data: 18

Your string is already terminated midway.
my_strcpy(out,"12345aa\0aaaaa AA",20);
Why do you intend to have \0 in between like that? Have some other delimiter if yo so desire
Otherwise, since std::cout and strlen interpret a \0 as a string terminator, you get surprises.
What I mean is that follow the convention i.e. '\0' as string terminator

Reverse string with non-ASCII characters

I want to change the order in the string with special characters like this:
ZAŻÓŁĆ GĘŚLĄ JAŹŃ
to
ŃŹAJ ĄŁŚĘG ĆŁÓŻAZ
I try to use std::reverse
std::string text("ZAŻÓŁĆ GĘŚLĄ JAŹŃ!");
std::cout << text << std::endl;
std::reverse(text.rbegin(), text.rend());
std::cout << text << std::endl;
but the output show me that:
ZAŻÓŁĆ GĘŚLĄ JAŹŃ!
!\203Ź\305AJ \204\304L\232Ř\304G \206āœû\305AZ <- reversed string
So i try do this "manually" :
std::string text1("ZAŻÓŁĆ GĘŚLĄ JAŹŃ!");
std::cout << text1 << std::endl;
int count = (int) floorf(text1.size() /2.f);
std::cout << count << " " << text1.size() << std::endl;
unsigned int maxIndex = text1.size() - 1;
for (int i = 0; i < count ; i++)
{
char tmp = text1[i];
text1[i] = text1[maxIndex];
text1[maxIndex] = tmp;
maxIndex--;
}
std::cout << text1 << std::endl;
But in this case I have a problem in text1.size() because every special character are counted twice:
ZAŻÓŁĆ GĘŚLĄ JAŹŃ!
13 27 <- second number is text1.size()
!\203Ź\305AJ \204\304L\232Ř\304G \206āœû\305AZ
How is the proper way to reverse a string with special characters?

Your code really does correctly reverse bytes in your string, there's nothing wrong here. The problem, however, is that your compiler stores your literal string "ZAŻÓŁĆ GĘŚLĄ JAŹŃ!" in UTF-8 encoding.
And UTF-8 stores all characters except those that match ASCII as variable-length sequences of bytes. This means that one char (one byte) is no longer one character, so reversing char's isn't now the same as reversing characters.
To achieve your goal you have at least two options:
Use some utf-8 library that will let you iterate characters instead of bytes. One example is http://utfcpp.sourceforge.net/
Somehow (and that depends a lot on the compiler and OS you are using) switch to utf-32 encoding that has constant character length and have good old constant-character-size strings without all this crazy variable-character-size troubles.
UPD: A nice link for you: http://www.joelonsoftware.com/articles/Unicode.html

You might code a reverseUt8 function by yourself:
std::string getMultiByteReversed(char ch1, char ch2)
{
if (ch == '\xc3') // most utf8 characters
return std::string(ch1)+ std::string(ch2);
} else {
return std::string(ch1);
}
}
std::string reverseMultiByteString(const std::string &s)
{
std::string result;
for (std::string::reverse_iterator it = s.rbegin(); it != s.rend(); ++it) {
std::string reversed;
if ( (it+1) != rbegin() && (reversed = getMultiByteReversed(*it, *it+1) ) {
result += reversed;
++it;
} else {
result += *it;
}
}
return result;
}
You can look up the utf8 codes at: http://www.utf8-chartable.de/

There are a couple of issues here. The answer is complex and can depend on exactly what you're trying to do.
First is that (as other answers have stated) if your string is UTF-8 encoded, one Unicode code point may consist of multiple bytes. If you just reverse the bytes, you'll break the UTF-8 encoding. The simplest (though not necessarily the best) fix for this is to convert the string to UTF-32 and reverse the 32-bit code points rather than bytes.
The next problem is that a single grapheme might consist of multiple Unicode code points. For example, a "é" might be encoded as the two code points U+0065 followed by U+0301. If you reverse the order of these, that will break it as the combining character U+301 will now be associate with a different base character. So "Pokémon" reversed this way would become "noḿekoP" with the accent over the "m" instead of the "e".
Now you might think that you can get around this problem by normalizing the string into a composed form first. That has its own problems, however, because not every grapheme can be represented by a single code point. For example, the Canadian flag emoji (🇨🇦) is represented by the code point U+1F1E8 followed by the code point U+1F1E6. There is no single code point for it. If you reverse its code points, you get the flag for Ascension Island (🇦🇨) instead.
Then you have languages where characters change form based on context, and I don't yet know much about dealing with those.
It may be closer to what you want to reverse grapheme clusters. See UAX29: Unicode text segmentation.

have you tried swapping characters one by one.
For example, if the string length is odd, swap the first character with the last, second with the second last, till the middle character is left. If the string lengt is even, swap 1st with last, 2nd with 2nd last, till both the middle characters are swapped. In that way, the string will be reversed.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js