Replace non printable character with octal representation in C++/CLI - regex

I need to replace any non printable character with its octal representation using C++/CLI. There are examples using C# which require lambda or linq.
// There may be a variety of non printable characters, not just the example ones.
String^ input = "\vThis has internal vertical quote and tab \t\v";
Regex.Replace(input, #"\p{Cc}", ??? );
// desired output string = "\013This has internal vertical quote and tab \010\013"
Is this possible in with C++/CLI?

Not sure if you can do it inline. I've used this type of logic.
// tested
String^ input = "\042This has \011 internal vertical quote and tab \042";
String^ pat = "\\p{C}";
String^ result = input;
array<Byte>^ bytes;
Regex^ search = gcnew Regex(pat);
for (Match^ match = search->Match(input); match->Success; match = match->NextMatch()) {
bytes = Encoding::ASCII->GetBytes(match->Value);
int x = bytes[0];
result = result->Replace(match->Value
, "\\" + Convert::ToString(x,8)->PadLeft(3, '0'));
}
Console::WriteLine("{0} -> {1}", input, result);

Related

Parse (replace) in C++ std::string

I have a "custom" string that has the following format. Example:
std::string MyString = "RndOrder%5d - RndCustomer%8s - RndHex%8x";
I would like to replace/parse the string:
the %5d (%NUM_d) would be replaced with a random 5-digit decimal
the %8s (%NUM_s) would be replaced with a random 8-chars
the %8x (%NUM_x) would be replaced with a random 8-digit hexadecimal
Is there any function that helps me parse those "special marks"? Not sure if I would have to parse the string char by char and check for every possible combination.
If the format can be variant (not always the fixed 3 arguments: %5d, %8s and %8x) and you want to be flexible in that manner, you should write your own implementation for that.
Assuming that count defined after % is a general digit (not only 5 or 8) you could try using the std::regex_search or std::regex_match to find the actual mnemonics you are looking for. For example your expression could look like %\d+[dsx]
Then you should parse it to find the COUNT and type and substitute with a random number acquired with the desired generator.
To parse you could try updating the above expression to %(\d+)([dsx]) and capturing groups.
A sample parse implementation for your case could look like this:
std::string text = "RndOrder%5d - RndCustomer%8s - RndHex%8x";
auto reg = std::regex("%(\\d+)([sdx])");
std::smatch match;
while (std::regex_search(text, match, reg))
{
const auto& full = match.str(); // in 1st iter contains "%5d"
const auto& count = match.str(1); // in 1st iter contains "5"
const auto& type = match.str(2); // in 1st iter contains "d"
// further processing: type conversion, number generation, string replacement
text = match.suffix().str();
}
For implementation example with search and group capturing you can also check out another question: Retrieving a regex search in C++
Ok, assuming that you're actually asking about string parsing here (and not random number/data generation)... have a look at this:
int iRandom1 = 12345; // 5-digit decimal
int iRandom3 = 0x12345678; // 8-digit hexadecimal
char cRandom2[9] = "RandomXY\0"; // Don't forget to NULL-terminate!
std::string sFormat = "RndOrder%5d - RndCustomer%8s - RndHex%8x";
char cResultBuffer[500]; // Make sure this buffer is big enough!
std::sprintf( cResultBuffer, sFormat.c_str(), iRandom1, cRandom2, iRandom3 );
std::string MyString = cResultBuffer; // MyString = "RndOrder12345 - RndCustomerRandomXY - RndHex12345678";
It's a candidate for std::snprintf (c++14), but take care to request the correct buffer size in one call, allocate a buffer and then format the string into the buffer:
#include <iostream>
#include <cstring>
#include <string>
template<class...Args>
std::string replace(const char* format, Args const&... args)
{
// determine number of characters in output
auto len = std::snprintf(nullptr, 0, format, args...);
// allocate buffer space
auto result = std::string(std::size_t(len), ' ');
// write string into buffer. Note the +1 is allowing for the implicit trailing
// zero in a std::string
std::snprintf(&result[0], len + 1, format, args...);
return result;
};
int main() {
auto s = replace("RndOrder%5d - RndCustomer%8s - RndHex%8x", 5, "foo", 257);
std::cout << s << std::endl;
}
expected output:
RndOrder 5 - RndCustomer foo - RndHex 101

C++ split string with \0 into list

I want to List the logical drives with:
const size_t BUFSIZE = 100;
char buffer[ BUFSIZE ];
memset(buffer,0,BUFSIZE);
//get available drives
DWORD drives = GetLogicalDriveStringsA(BUFSIZE,static_cast<LPSTR>(buffer));
The buffer then contains: 'C',':','\','0'
Now I want to have a List filled with "C:\","D:\" and so on. Therefore I tried something like this:
std::string tmp(buffer,BUFSIZE);//to split this string then
QStringList drivesList = QString::fromStdString(tmp).split("\0");
But it didn't worked. Is it even possible to split with the delimiter \0? Or is there a way to split by length?
The problem with String::fromStdString(tmp) is that it will create a string only from the first zero-terminated "entry" in your buffer, because that's how standard strings works. It is certainly possible, but you have to do it yourself manually instead.
You can do it by finding the first zero, extract the substring, then in a loop until you find two consecutive zeroes, do just the same.
Pseudoish-code:
current_position = buffer;
while (*current_position != '\0')
{
end_position = current_position + strlen(current_position);
// The text between current_position and end_position is the sub-string
// Extract it and add to list
current_position = end_position + 1;
}

How to parsing simple json in strings c++?

I have a string that is containing a small json containing only strings. I have used stringstream and boost::property_tree::read_json for reading it. I have seen that this is not very fast, moreover, the boost json parser is not thread safe (because of the streams). So I have tried to make it in another way:
std::vector< std::string > fields;
std::vector< std::string > values;
int separator = -1;
int prevSeparator = 0;
int fieldBegin = 0;
int fieldEnd = 0;
int valueBegin = 0;
int valueEnd = 0;
int64 t0 = cv::getTickCount();
do
{
prevSeparator = separator + 1;
separator = jsonStream.substr(prevSeparator, jsonStream.size() - prevSeparator - 1).find_first_of(',') + prevSeparator;
std::string element = jsonStream.substr(prevSeparator, separator - prevSeparator);
int fvSeparator = element.find_first_of(':');
std::string field = element.substr(0, fvSeparator);
std::string value = element.substr(fvSeparator + 1, element.size() - fvSeparator - 1);
fieldBegin = field.find_first_of('\"') + 1;
fieldEnd = field.find_last_of('\"');
fields.push_back(field.substr(fieldBegin, fieldEnd - fieldBegin));
valueBegin = value.find_first_of('\"') + 1;
valueEnd = value.find_last_of('\"');
values.push_back(value.substr(valueBegin, valueEnd - valueBegin));
} while (prevSeparator - separator <= 0);
Do you think it is good enough or what shall I improve?
If i understand your description of the input right, you have a Json-Array containing strings. That means, it starts with [", then follows a sequence of strings separated by ",". Finally you have "].
Here is a high level algorithm for you:
Split input by ", watching for escaped quotes.
Remove strings [ and ] from the ends (there can be whitespace in there, too).
Remove strings , that appear in between the desired strings (there can be whitespace in there, too).
Unescape based on Json escaping rules, in case there are any escapes.

How do i detect white space or numbers when using UTF8CPP?

This is my code:
std::vector<std::string> InverseIndex::getWords(std::string line)
{
std::vector<std::string> words;
char* str = (char*)line.c_str();
char* end = str + strlen(str) + 1;
unsigned char symbol[5] = {0,0,0,0,0};
while( str < end ){
utf8::uint32_t code = utf8::next(str, end);
if(code == 0) continue;
utf8::append(code, symbol);
// TODO detect white spaces or numbers.
std::string word = (const char*)symbol;
words.push_back(word);
}
return words;
}
Input : "你 好 啊 哈哈 1234"
Output :
你
??
好
??
啊
??
哈
哈
??
1??
2??
3??
4??
Expected output :
你
好
啊
哈
哈
Is there anyway to skip the white space or numbers , thanks?
UTF8-CPP is nothing more than a tool for encoding and decoding strings into/outof UTF-8. Classification of Unicode codepoints is well outside the scope of that tool. You'll need to use a serious localization tool like Boost.Locale or ICU for that.
UTF-8 is "ASCII compatible" in the following sense:
If one of the bytes of the encoded string is equal to ASCII value - such as space, new line, or digits 0-9, this means that it is not a part of encoded sequence longer than a byte. It is actually this very character.
This means, that you can do isdigit() on a byte in UTF8 string as if it was an ASCII string, and it is guaranteed to work correctly.
For more information, see http://utf8everywhere.org the section on search.

How to send an SMS in hebrew with clickatell

How can I send an SMS in hebrew through Clickatell?
It arrives on the device as gibberish.
I couldn't find any working example so, i wrote my own:
Try this:
UnicodeEncoding unicode = new UnicodeEncoding(true, false);
return string.Concat(unicode.GetBytes(val).Select(c => string.Format("{0:x2}", c)));
Is it in unicode ? If I remember correctly they require unicode to be escaped into hexadecimal representation. This should be in their docs.
However, I found out when I did this that this is not the only issue, many phones do not support displaying unicode characters properly.
Also, sending unicode may incur a higher cost since it may be split up.
Encode your message as unicode, see this FAQ page for details.
Ran into the same issue... you need to encode to unicode and then convert to hex. The strange thing is that you need to take the last value and append it to the front in order to get it to work. I found this out by comparing the results of my code against the output of their online tool.
private string ToUnicode(string val)
{
Encoding utf8 = Encoding.UTF8;
Encoding unicode = Encoding.Unicode;
byte[] utf8Bytes = utf8.GetBytes(val);
byte[] unicodeBytes = Encoding.Convert(utf8, unicode, utf8Bytes);
var result = ByteArrayToString(unicodeBytes);
result = result.Substring(result.Length - 2, 2) + result.Substring(0, result.Length - 2);
return result;
}
public static string ByteArrayToString(byte[] ba)
{
StringBuilder hex = new StringBuilder(ba.Length * 2);
foreach (byte b in ba)
hex.AppendFormat("{0:x2}", b);
return hex.ToString();
}
I used following logic for arabic .. IT needs more testing . Language is VB.Net
If txtArabic.Text.Trim.Length > 0 Then
Dim unicodeString As String = txtArabic.Text
Dim unicode As Encoding = Encoding.Unicode
' Convert the string into a byte array.
Dim unicodeBytes As Byte() = unicode.GetBytes(unicodeString)
Dim sb As String = ToUnicode(txtArabic.Text)
End If
Here is the conversion part
Private Function ToUnicode(ByVal strVal As String)
Dim unicode As Encoding = New UnicodeEncoding(True, False)
' Encoding.Unicode
Dim utf8 As Encoding = Encoding.UTF8
Dim utf8Bytes() As Byte = unicode.GetBytes(strVal)
Dim unicodeBytes() As Byte = Encoding.Convert(utf8, unicode, utf8Bytes)
Dim result As String = ByteArrayToString(unicodeBytes)
Return result
End Function
Private Function ByteArrayToString(ByVal ba() As Byte)
Dim hex As StringBuilder = New StringBuilder(ba.Length)
For i As Integer = 0 To ba.Length - 1
If (((i - 2) Mod 4.0) = 0) Then
Else
hex.AppendFormat("{0:x00}", ba(i))
' hex.Append(ba(i))
End If
' hex.Append("-")
Next
Return hex.ToString
End Function