Str.global_replace in OCaml putting carats where they shouldn't be - regex

I am working to convert multiline strings into a list of tokens that might be easier for me to work with.
In accordance with the specific needs of my project, I'm padding any carat symbol that appears in my input with spaces, so that "^" gets turned into " ^ ". I'm using something like the following function to do so:
let bad_function string = Str.global_replace (Str.regexp "^") " ^ " (string)
I then use something like the below function to then turn this multiline string into a list of tokens (ignoring whitespace).
let string_to_tokens string = (Str.split (Str.regexp "[ \n\r\x0c\t]+") (string));;
For some reason, bad_function adds carats to places where they shouldn't be. Take the following line of code:
(bad_function " This is some
multiline input
with newline characters
and tabs. When I convert this string
into a list of tokens I get ^s showing up where
they shouldn't. ")
The first line of the string turns into:
^ This is some \n ^
When I feed the output from bad_function into string_to_tokens I get the following list:
string_to_tokens (bad_function " This is some
multiline input
with newline characters
and tabs. When I convert this string
into a list of tokens I get ^s showing up where
they shouldn't. ")
["^"; "This"; "is"; "some"; "^"; "multiline"; "input"; "^"; "with";
"newline"; "characters"; "^"; "and"; "tabs."; "When"; "I"; "convert";
"this"; "string"; "^"; "into"; "a"; "list"; "of"; "tokens"; "I"; "get";
"^s"; "showing"; "up"; "where"; "^"; "they"; "shouldn't."]
Why is this happening, and how can I fix so these functions behave like I want them to?

As explained in the Str module.
^ Matches at beginning of line: either at the beginning of the
matched string, or just after a '\n' character.
So you have to quote the '^' character using the escape character "\".
However, note that (also from the doc)
any backslash character in the regular expression must be doubled to
make it past the OCaml string parser.
This means you have to put a double '\' to do what you want without getting a warning.
This should do the job:
let bad_function string = Str.global_replace (Str.regexp "\\^") " ^ " (string);;

Related

Using one cout command to print multiple strings with each string placed on a different (text editor) line

Take a look at the following example:
cout << "option 1:
\n option 2:
\n option 3";
I know,it's not the best way to output a string,but the question is why does this cause an error saying that a " character is missing?There is a single string that must go to stdout but it just consists of a lot of whitespace charcters.
What about this:
string x="
string_test";
One may interpret that string as: "\nxxxxxxxxxxxxstring_test" where x is a whitespace character.
Is it a convention?
That's called multiline string literal.
You need to escape the embedded newline. Otherwise, it will not compile:
std::cout << "Hello world \
and stackoverflow";
Note: Backslashes must be immediately before the line ends as they need to escape the newline in the source.
Also you can use the fun fact "Adjacent string literals are concatenated by the compiler" for your advantage by this:
std::cout << "Hello World"
"Stack overflow";
See this for raw string literals. In C++11, we have raw string literals. They are kind of like here-text.
Syntax:
prefix(optional) R"delimiter( raw_characters )delimiter"
It allows any character sequence, except that it must not contain the
closing sequence )delimiter". It is used to avoid escaping of any
character. Anything between the delimiters becomes part of the string.
const char* s1 = R"foo(
Hello
World
)foo";
Example taken from cppreference.

VB.NET - Regex.Replace error with [ character

I want to remove some characters from a textbox. It works, but when i try to replace the "[" character it gives a error. Why?
Return Regex.Replace(html, "[", "").Replace(",", " ").Replace("]", "").Replace(Chr(34), " ")
When i delete the "[", "").Replace( part it works great?
Return Regex.Replace(html, ",", " ").Replace("]", "").Replace(Chr(34), " ")
The problem is that since the [ character has a special meaning in regex, It must be escaped in order to use it as part of a regex sequence, therefore to escape it all you have to do is add a \ before the character.
Therefore this would be your proper regex code Return Regex.Replace(html, "\[", "").Replace(",", " ").Replace("]", "").Replace(Chr(34), " ")
Because [ is a reserved character that regex patterns use. You should always escape your search patterns using Regex.Escape(). This will find all reserved characters and escape them with a backslash.
Dim searchPattern = Regex.Escape("[")
Return Regex.Replace(html, searchPattern, ""). 'etc...
But why do you need to use regex anyway? Here's a better way of doing it, I think, using StringBuilder:
Dim sb = New StringBuilder(html) _
.Replace("[", "") _
.Replace(",", " ") _
.Replace("]", "") _
.Replace(Chr(34), " ")
Return sb.ToString()

C++ Qt QString replace double backslash with one

I have a QString with following content:
"MXTP24\\x00\\x00\\xF4\\xF9\\x80\r\n"
I want it to become:
"MXTP24\x00\x00\xF4\xF9\x80\r\n"
I need to replace the "\x" to "\x" so that I can start parsing the values. But the following code, which I think should do the job is not doing anything as I get the same string before and after:
qDebug() << "BEFORE: " << data;
data = data.replace("\\\\x", "\\x", Qt::CaseSensitivity::CaseInsensitive);
qDebug() << "AFTER: " << data;
Here, no change!
Then I tried like this:
data = data.replace("\\x", "\x", Qt::CaseSensitivity::CaseInsensitive);
Then compiler complaines that \x used with no following hex digits!
any ideas?
First let's look at what this piece of code does:
data.replace("\\\\x", "\\x", ....
First string becomes \\x in compiled code, and is used as regular expression. In reqular expression, backslash is special, and needs to be escaped with another backslash to mean actual single backslash character, and your regexp does just this. 4 backslashes in C+n string literal regexp means matching single literal backslash in target text. So your reqular expression matches literal 2-character string \x.
Then you replace it. Replacement isn't a reqular expression, so backslash doesn't need double escaping here, so you end up using literal 2-char replacement string \x, which is same as what you matched, so even if there is a match, nothing changes.
However, this is not your problem, your problem is how qDebug() prints strings. It prints them escaped. That \" at start of output means just plain double quote, 1 char, in the actual string because double quote is escaped. And those \\ also are single backslash char, because literal backslash is also escaped (because it is the escape char and has special meaning for the next char).
So it seems you don't need to do any search replace at all, just remove it.
Try printing the QString in one of these ways to get is shown literally:
std::cout << data << std::endl;
qDebug() << data.toLatin1().constData();

Tokenize a string based on quotes

I am trying to read data from a text file and split the read line based on quotes. For example
"Hi how" "are you" "thanks"
Expected output
Hi how
are you
thanks
My code:
getline(infile, line);
ch = strdup(line.c_str());
ch1 = strtok(ch, " ");
while (ch1 != NULL)
{
a3[i] = ch1;
ch1 = strtok(NULL, " ");
i++;
}
I don't know what to specify as delimiter string. I am using strtok() to split, but it failed. Can any one help me?
Please have a look at the example code here. You should provide "\"" as delimiter string to strtok.
For example,
ch1 = strtok (ch,"\"");
Probably your problem is related with representing escape sequences. Please have a look here for a list of escape sequences for characters.
Given your input: "Hi how" "are you" "thanks", if you use strtok with "\"" as the delimiter, it'll treat the spaces between the quoted strings as if they were also strings, so if (for example) you printed out the result strings, one per line, surrounded by square brackets, you'd get:
[Hi how]
[ ]
[are you]
[ ]
[thanks]
I.e., the blank character between each quoted string is, itself, being treated as a string. If the delimiter you supplied to strtok was " \"" (i.e., included both a quote and a space) that wouldn't happen, but then it would also break on the spaces inside the quoted strings.
Assuming you can depend on every item you care about being quoted, you want to skip anything until you get to a quote, ignore the quote, then read data into your input string until you get to another quote, then repeat the whole process.

How do I insert format str and don't remove the matched regular expression in input string in boost::regex_replace() in C++?

I want to put space between punctuations and other words in a sentence. But boost::regex_replace() replaces the punctuation with space, and I want to keep a punctuation in the sentence!
for example in this code the output should be "Hello . hi , "
regex e1("[.,]");
std::basic_string<char> str = "Hello.hi,";
std::basic_string<char> fmt = " ";
cout<<regex_replace(str, e1, fmt)<<endl;
Can you help me?
You need to use a replacement variable in your fmt string. If I understand the documentation correctly, then in the absence of a flags field, you'll want to use a Boost-Extended format string.
In that sub-language, you use $& to mean whatever was matched, so you should try defining fmt as:
std::basic_string<char> fmt = " $& ";
That should change each punctuation into that same character, surrounded by spaces.