If I have a character list already, how can I remove certain character, or replace all occurrences of such character with another character?
# remove any 'l'
'hello world' 'l'-
puts
# replace all 'l' with 'X'
'hello world' 'l'/ 'X'*
puts
Please also have a look into the page for GolfScript built-ins.
Related
Want cleanup some texts. So, want remove anything but \w and \s, but also want keep the single ' inside the word. (e.g. want keep it in words like don't.
I could do
perl -plE "s/[^\w\s']//g" <<< "'a:b/c d????ef' don't"
which keeps the ' but it keeps it also at the begining or end of string, e.g. it prints
'abc def' don't
I'm unable to implement the keep this (?<\w)'(?=\w), e.g. remove the ' unless it is between two word characters.
The wanted result:
abc def don't
How to do this?
You could do this:
s/[^\w\s']|(?<!\w)'|'(?!\w)//g
Delete everything that is either
a character that is not (a word character or a space or '), or
a ' that is not preceded by a word character, or
a ' that is not followed by a word character
The first clause will match (and remove) all characters that we obviously don't want to keep.
The second and third clause will remove all ' characters unless they're surrounded by word characters on both sides.
You can also use a global research instead of a replacement, this way you only have to describe what you want to keep and the pattern becomes more simple:
perl -ne"print /[\w\s]|\b'\b/g" <<< "'a:b/c d????ef' don't"
I wanna check if there is special character in a line inside a text file using Regex in shell script.
assume there is sentence "assccÑasas"
how to check if there is 'Ñ' inside the line, so it should be output as error instead.
I also wanna check if there is symbol such as '/' or '^' or '&', etc
my code :
VALID='^[a-zA-Z_0-9&.<>/|\-]+$' #myregex
checkError(){
if [[ $line =~ $VALID ]]; then
echo "tes"
else
echo "not okay"
exit 1
fi
}
while read line
do
checkRusak
done < $1
so the example like if there is sentence "StÀck Overflow;" then it will output error. if there is sentence "stack overflow;" still output error.
but if only "stack overflow" (no symbol or special character), it will output "test"
so far it can check for symbol ('/' or '\' etc) but still problem in special character.
Any help really appreciated,
thank you in advance
You could either
define, which characters are valid and only allow these or
define, which characters are invalid and check if there is one
For the 2nd approach, you could use this regex: [^a-zA-Z_0-9\s].
The ^ inside of square brackets negate the character class, so it matches on any string, that contains a character that is not a letter A-Z, a-z, a number, an underscore or a white space.
Since you want to detect a single character, you don't need a quantifier.
Demo
I did the following to my string $text
$text =~ tr/a-zåàâäæçéèêëîïôöœßùûüÿA-ZÅÀÂÄÆÇÉÈÊËÎÏÔÖŒÙÛÜŸ'()\-,.?!:;/\n/cs;
What this did was to split the string in newlines. This is what I wanted to do
but I dont get why it does this?
I thought that this line would take all chars a-zåàâäæçéèêëîïôöœßùûüÿA-ZÅÀÂÄÆÇÉÈÊËÎÏÔÖŒÙÛÜŸ'()-,.?!:; and replace each of them with \n
I dont get what cs in the end does either. Here you can get an explanation of cs but I dont understand what it means:
"c - is used to specify that the SEARCHLIST character set is
complemented"
"s - is used to specify that the sequences of characters that were
transliterated to the same character are squashed down to a single
instance of the character"
Example:
$text= "a ar? å ..";
gives
a
ar?
å
..
c - is used to specify that the SEARCHLIST character set is complemented
In this usage, "complemented" is similar to "negated" or "reversed", so instead of replacing the characters listed in your expression every character not found in your expression is replaced. In your example string this means that all of the spaces are replaced with a newline because every other character is included in the set.
If you want to turn all spaces into newlines, listing out all the things which are not spaces is cumbersome and you're likely to forget some. You can instead work directly on the spaces with a regex.
s{\s+}{\n}g;
s{...}{...} is a "search and replace" using regular expressions rather than just characters. \s is regex speak for "whitespace" which includes spaces, tabs and newlines. + says to match 1 or more of them, so multiple spaces in a row will be turned into one newline. The g modifier says to do it "globally" or across every character in the string, otherwise it would stop at the first match.
foo bar baz
Becomes
foo
bar
baz
"c - is used to specify that the SEARCHLIST character set is complemented"
This means that it will replace anything not in the search list with \n. In your example, the only character not in the search list is a space. Therefore each space gets replaced with a newline. As Schwern pointed out, this is not a good way to do this.
"s - is used to specify that the sequences of characters that were transliterated to the same character are squashed down to a single instance of the character"
This means that if three characters in a row are translated (resulting in three \n in a row), the three \n will be "squashed" into a single \n. If you added some spaces to your example input, you could see this in action:
# Multiple spaces separating words
my $str = "a ar? å";
Without squashing:
$str =~ tr/a-zåàâäæçéèêëîïôöœßùûüÿA-ZÅÀÂÄÆÇÉÈÊËÎÏÔÖŒÙÛÜŸ'()\-,.?!:;/\n/c;
Outputs:
a
ar?
å
With squashing:
$str =~ tr/a-zåàâäæçéèêëîïôöœßùûüÿA-ZÅÀÂÄÆÇÉÈÊËÎÏÔÖŒÙÛÜŸ'()\-,.?!:;/\n/cs;
Outputs:
a
ar?
å
I have the following text in Notepad++
A
B
C
D
I would like to "parameterize" this text and turn it into this using a regex or some other native Notepad++ command(s) or plugin:
'A', 'B', 'C', 'D'
Note that I want the end text to be on one line and no trailing comma, if possible. This question gets me close but I am left with a trailing comma and the text is not compacted to one line. Is there anyway to accomplish this in Notepad++ without using a macro?
Try this in Regex Search Mode.
Search for (\w)\r\n
Replace with ('\1', )
But you will have to remove the space and a comma manually from the end of the line.
You can do it in two steps:
Search for e.g. (\w+) and replace with '$1'
The \w+ will find the letters (and digits and the underscore), at least one.
Search for (\s+) and replace with ,
\s+ will find whitespace characters, that means here the newline characters at the end of a row. If you have whitespace in your text, you want to keep, use [\r\n]+ instead.
This way, if there is no newline after the last letter, there will be no trailing comma.
I am trying to parse the last chunk of the following line:
77 0 wl1271/wpa_supplicant_lib/driver_ti.h
Unfortunately, the spaces are not always the same length. I assume that printf or something similar was used to output the data so it lined up in columns. This means that sometimes I have spaces, and sometimes I have tab characters.
I have successfully gotten the first two numbers through the use of regex in perl. The way I thought I would get the last bit would be to search of the last occurrence of any whitespace character and then grab the rest of the string starting there. I tried using rindex but that only accepts a character for the searchable parameter and not a regex (I thought that \s would do the trick).
Can anyone solve the issue I'm having here either by walking my through how to get the last whitespace character or by helping me with a solution to grab that string some other way?
Why not split?
use strict;
use warnings;
my $string = '77 0 wl1271/wpa_supplicant_lib/driver_ti.h';
my ( $num1, $num2, $lastPart ) = split ' ', $string;
print "$num1\n$num2\n$lastPart";
Output:
77
0
wl1271/wpa_supplicant_lib/driver_ti.h
Why not just match the regex \S+$ - namely, the last set of non-whitespace characters in the string?
\S = non-whitespace character
$ = end of line
Edit: You really should use split though, as suggested by Kenosis.