right now I have a String equal to "...\n...\n...". In my code I want to write this as a list (like ['a','b','c']), but how would this work with the \n? I checked in ghci if string == ['.','.','.','.','.','.','.','.','.'] and it said no, so does anyone know how I would write the \n's in a Char list, thank you.
A String is a list of Characters, so "foo" and ['f', 'o', 'o'] are exactly the same.
For a new line character '\n' [wiki] you can escape this, so your string "...\n...\n..." is equivalent to:
['.', '.', '.', '\n', '.', '.', '.', '\n', '.', '.', '.']
Here '\n' is a single character, not two: it maps to an ASCII character with codepoint 0a as hexadecimal value (10 as decimal value). The compiler thus sees \n and replaces that with a single character.
You can thus for example filter with filter ('\n' /=) some_string to filter out new line characters from a String.
Related
What are the rules for the escape character \ in string literals? Is there a list of all the characters that are escaped?
In particular, when I use \ in a string literal in gedit, and follow it by any three numbers, it colors them differently.
I was trying to create a std::string constructed from a literal with the character 0 followed by the null character (\0), followed by the character 0. However, the syntax highlighting alerted me that maybe this would create something like the character 0 followed by the null character (\00, aka \0), which is to say, only two characters.
For the solution to just this one problem, is this the best way to do it:
std::string ("0\0" "0", 3) // String concatenation
And is there some reference for what the escape character does in string literals in general? What is '\a', for instance?
Control characters:
(Hex codes assume an ASCII-compatible character encoding.)
\a = \x07 = alert (bell)
\b = \x08 = backspace
\t = \x09 = horizonal tab
\n = \x0A = newline (or line feed)
\v = \x0B = vertical tab
\f = \x0C = form feed
\r = \x0D = carriage return
\e = \x1B = escape (non-standard GCC extension)
Punctuation characters:
\" = quotation mark (backslash not required for '"')
\' = apostrophe (backslash not required for "'")
\? = question mark (used to avoid trigraphs)
\\ = backslash
Numeric character references:
\ + up to 3 octal digits
\x + any number of hex digits
\u + 4 hex digits (Unicode BMP, new in C++11)
\U + 8 hex digits (Unicode astral planes, new in C++11)
\0 = \00 = \000 = octal ecape for null character
If you do want an actual digit character after a \0, then yes, I recommend string concatenation. Note that the whitespace between the parts of the literal is optional, so you can write "\0""0".
\0 will be interpreted as an octal escape sequence if it is followed by other digits, so \00 will be interpreted as a single character. (\0 is technically an octal escape sequence as well, at least in C).
The way you're doing it:
std::string ("0\0" "0", 3) // String concatenation
works because this version of the constructor takes a char array; if you try to just pass "0\0" "0" as a const char*, it will treat it as a C string and only copy everything up until the null character.
Here is a list of escape sequences.
\a is the bell/alert character, which on some systems triggers a sound. \nnn, represents an arbitrary ASCII character in octal base. However, \0 is special in that it represents the null character no matter what.
To answer your original question, you could escape your '0' characters as well, as:
std::string ("\060\000\060", 3);
(since an ASCII '0' is 60 in octal)
The MSDN documentation has a pretty detailed article on this, as well cppreference
I left something like this as a comment, but I feel it probably needs more visibility as none of the answers mention this method:
The method I now prefer for initializing a std::string with non-printing characters in general (and embedded null characters in particular) is to use the C++11 feature of initializer lists.
std::string const str({'\0', '6', '\a', 'H', '\t'});
I am not required to perform error-prone manual counting of the number of characters that I am using, so that if later on I want to insert a '\013' in the middle somewhere, I can and all of my code will still work. It also completely sidesteps any issues of using the wrong escape sequence by accident.
The only downside is all of those extra ' and , characters.
With the magic of user-defined literals, we have yet another solution to this. C++14 added a std::string literal operator.
using namespace std::string_literals;
auto const x = "\0" "0"s;
Constructs a string of length 2, with a '\0' character (null) followed by a '0' character (the digit zero). I am not sure if it is more or less clear than the initializer_list<char> constructor approach, but it at least gets rid of the ' and , characters.
ascii is a package on linux you could download.
for example
sudo apt-get install ascii
ascii
Usage: ascii [-dxohv] [-t] [char-alias...]
-t = one-line output -d = Decimal table -o = octal table -x = hex table
-h = This help screen -v = version information
Prints all aliases of an ASCII character. Args may be chars, C \-escapes,
English names, ^-escapes, ASCII mnemonics, or numerics in decimal/octal/hex.`
This code can help you with C/C++ escape codes like \x0A
I need a regular expression that will do the following transformation:
Input: ab\xy
Output: aby
Input: ab\\xy
Output: ab\xy
Consider all of those backslashes as LITERAL backslashes. That is, the first input is the sequence of characters ['a', 'b', '\', 'x', 'y'], and the second is ['a', 'b', '\', '\', 'x', 'y'].
The rule is "in a string of characters, if a backslash is encountered, delete it and the following character ... unless the following character is a backslash, in which case delete only one of the two backslashes."
This is escape sequence hell and I can't seem to find my way out.
You may use
(?s)\\(\\)|\\.
and replace with $1 to restore the \ when a double backslash is found.
Details:
(?s) - a dotall modifier so that . could match any chars inlcuding line break chars
\\(\\) - matches a backslash and then matches and captures another backslash into Group 1
| - or
\\. - matches any escape sequence (a backslash + any char).
See the regex demo and a PHP demo:
$re = '/\\\\(\\\\)|\\\\./s';
$str = 'ab\\xy ab\\\\xy ab\\\\\\xy';
echo $result = preg_replace($re, '$1', $str);
// => aby ab\xy ab\y
How can I write a regex that matches letters ('R', 'L'), numbers and first character is always letter.
E.G.
I want regex to accept string like "R12L", "L1" that start with either 'R' or 'L' only.
I believe you want to match words that:
Start with any letter
contain numbers, 'R' and 'L'
Here is: \b[a-zA-Z][0-9RL]*\b
In case the first letter must be either 'R' or 'L', then this will be better:
`\b[RL][0-9RL]*\b`
Explanation:
\b is a word boundary, a zero length match
[RL] is a character class, it matches either R or L
[0-9] is a range within the character class, it matches anything between 0 and 9.
You can play with this demo.
Using owa_pattern.change in oracle 9i.
Is it possible to remove a number and the trailing special character (pls note only the trailing) special character in a string?
I refer to special character as characters that is neither a word nor a number.
e.g _ , # , # ,$ etc ...
For example.
String = TEST_STRING_10
desired output would be TEST_STRING (notice only the trailing special character _ was removed).
I have already figured out how to remove the number but is stuck in the special character part.
I have this code so far.
OWA_PATTERN.CHANGE (string, '\d', '', 'g');
Appreciate any inputs.
Thanks!
Try the following.
OWA_PATTERN.CHANGE (string, '[^a-zA-Z]+$', '');
Regular expression
[^a-zA-Z]+ any character except: 'a' to 'z', 'A' to 'Z'
(1 or more times (matching the most amount possible))
$ before an optional \n, and the end of the string
This will do it:
DECLARE
result VARCHAR2(255);
BEGIN
string := 'TEST_STRING_10';
result := REGEXP_REPLACE(string, '([[:alnum:]_].*)_[[:digit:]]+', '\1', 1, 0, 'c');
END;
This question already has answers here:
Remove all punctuation except apostrophes in R
(4 answers)
Closed 9 years ago.
I'm cleaning text strings in R. I want to remove all the punctuation except apostrophes and hyphens. This means I can't use the [:punct:] character class (unless there's a way of saying [:punct:] but not '-).
! " # $ % & ( ) * + , . / : ; < = > ? # [ \ ] ^ _ { | } ~. and backtick must come out.
For most of the above, escaping is not an issue. But for square brackets, I'm really having issues. Here's what I've tried:
gsub('[abc]', 'L', 'abcdef') #expected behaviour, shown as sanity check
# [1] "LLLdef"
gsub('[[]]', 'B', 'it[]') #only 1 substitution, ie [] treated as a single character
# [1] "itB"
gsub('[\[\]]', 'B', 'it[]') #single escape, errors as expected
Error: '[' is an unrecognized escape in character string starting "'[["
gsub('[\\[\\]]', 'B', 'it[]') #double escape, single substitution
# [1] "itB"
gsub('[\\]\\[]', 'B', 'it[]') #double escape, reversed order, NO substitution
# [1] "it[]"
I'd prefer not to used fixed=TRUE with gsub since that will prevent me from using a character class. So, how do I include square brackets in a regex character class?
ETA additional trials:
gsub('[[\\]]', 'B', 'it[]') #double escape on closing ] only, single substitution
# [1] "itB"
gsub('[[\]]', 'B', 'it[]') #single escape on closing ] only, expected error
Error: ']' is an unrecognized escape in character string starting "'[[]"
ETA: the single substitution was caused by not setting perl=T in my gsub calls. ie:
gsub('[[\\]]', 'B', 'it[]', perl=T)
You can use [:punct:], when you combine it with a negative lookahead
(?!['-])[[:punct:]]
This way a [:punct:]is only matched, if it is not in ['-]. The negative lookahead assertion (?!['-]) ensures this condition. It failes when the next character is a ' or a - and then the complete expression fails.
Inside a character class you only need to escape the closing square bracket:
Try using '[[\\]]' or '[[\]]' (I am not sure about escaping the backslash as I don't know R.)
See this example.