Bash- How to convert non-alphanumerical character to "_"

Bash- How to convert non-alphanumerical character to "_" - regex

I am trying to store user input in a variable and clean that variable in order to keep only alphanumerical caract + some others (I mean [a-zA-Z0-9-_]).
I tried using this but it isn't exhaustive :
SERVICE_NAME=$(echo $SERVICE_NAME | tr A-Z a-z | tr ' ' _ | tr \' _ | tr \" _)
Do you have some help for this?

Bash's string substitution is a fine thing: ${var//pat/rep}
val='Foo$%!*#BAR###baZ'
echo ${val//[^a-zA-Z_-]/_}
Foo_____BAR___baZ
A small explanation: The slash introduces a search/replace, a little like in sed (where it just delimits patterns). But you use a single slash for one replacement:
val='Foo$%!*#BAR###baZ'
echo ${val/[^a-zA-Z_-]/_}
Foo_%!*#BAR###baZ
Two slashes // mean replace all. Uncommon, but it has some logic, multiple slashes to mean multiple replace (please excuse my poor English).
And note how the $ is separated from the variable, but it is hard to modify a literal constant this way (which would be nice for testing). Modifying $1 isn't a no-brainer as well, afaik.

$ echo 'asd!#QCW##D' | tr A-Z a-z | sed -e 's/[^a-zA-Z0-9\-]/_/g'
asd__qcw__d
I would use sed for this and use the ^ (not) operator in your set of valid characters and replace everything else with an underscore. The above shows the syntax with the output.
And, as a bonus, if you want to replace a run of invalid characters with one underscore, just add + to your regular expression (and use the -r switch to sed to make it use extended regular expressions:
$ echo 'asd!#QCW##D' | tr A-Z a-z | sed -r 's/[^a-zA-Z0-9\-]+/_/g'
asd_qcw_d

I believe it can all be done in 1 single sed command like this:
echo 'Foo$%!*#BAR###baZ' | sed -e 's/[A-Z]/\L&/g' -e 's/[^a-z0-9\-]/_/g'
OUTPUT
foo_____bar___baz

perl way:
perl -ple 's/[^\w\-]/_/g'
pure bash way
a='foo-BAR_123,.:goo'
echo ${a//[^[:alnum:]-]/_}
produces:
foo-BAR_123___goo

Related

Get substring using either perl or sed

I can't seem to get a substring correctly.
declare BRANCH_NAME="bugfix/US3280841-something-duh";
# Trim it down to "US3280841"
TRIMMED=$(echo $BRANCH_NAME | sed -e 's/\(^.*\)\/[a-z0-9]\|[A-Z0-9]\+/\1/g')
That still returns bugfix/US3280841-something-duh.
If I try an use perl instead:
declare BRANCH_NAME="bugfix/US3280841-something-duh";
# Trim it down to "US3280841"
TRIMMED=$(echo $BRANCH_NAME | perl -nle 'm/^.*\/([a-z0-9]|[A-Z0-9])+/; print $1');
That outputs nothing.
What am I doing wrong?

Using bash parameter expansion only:
$: # don't use caps; see below.
$: declare branch="bugfix/US3280841-something-duh"
$: tmp="${branch##*/}"
$: echo "$tmp"
US3280841-something-duh
$: trimmed="${tmp%%-*}"
$: echo "$trimmed"
US3280841
Which means:
$: tmp="${branch_name##*/}"
$: trimmed="${tmp%%-*}"
does the job in two steps without spawning extra processes.
In sed,
$: sed -E 's#^.*/([^/-]+)-.*$#\1#' <<< "$branch"
This says "after any or no characters followed by a slash, remember one or more that are not slashes or dashes, followed by a not-remembered dash and then any or no characters, then replace the whole input with the remembered part."
Your original pattern was
's/\(^.*\)\/[a-z0-9]\|[A-Z0-9]\+/\1/g'
This says "remember any number of anything followed by a slash, then a lowercase letter or a digit, then a pipe character (because those only work with -E), then a capital letter or digit, then a literal plus sign, and then replace it all with what you remembered."
GNU's manual is your friend. I look stuff up all the time to make sure I'm doing it right. Sometimes it still takes me a few tries, lol.
An aside - try not to use all-capital variable names. That is a convention that indicates it's special to the OS, like RANDOM or IFS.

You may use this sed:
sed -E 's~^.*/|-.*$~~g' <<< "$BRANCH_NAME"
US3280841
Ot this awk:
awk -F '[/-]' '{print $2}' <<< "$BRANCH_NAME"
US3280841

sed 's:[^/]*/\([^-]*\)-.*:\1:'<<<"bugfix/US3280841-something-duh"

Perl version just has + in wrong place. It should be inside the capture brackets:
TRIMMED=$(echo $BRANCH_NAME | perl -nle 'm/^.*\/([a-z0-9A-Z]+)/; print $1');

Just use a ^ before A-Z0-9
TRIMMED=$(echo $BRANCH_NAME | sed -e 's/\(^.*\)\/[a-z0-9]\|[^A-Z0-9]\+/\1/g')
in your sed case.
Alternatively and briefly, you can use
TRIMMED=$(echo $BRANCH_NAME | sed "s/[a-z\/\-]//g" )
too.

type on shell terminal
$ BRANCH_NAME="bugfix/US3280841-something-duh"
$ echo $BRANCH_NAME| perl -pe 's/.*\/(\w\w[0-9]+).+/\1/'
use s (substitute) command instead of m (match)
perl is a superset of sed so it'd be identical 'sed -E' instead of 'perl -pe'

Another variant using Perl Regular Expression Character Classes (see perldoc perlrecharclass).
echo $BRANCH_NAME | perl -nE 'say m/^.*\/([[:alnum:]]+)/;'

How to process a regular expression after being evaluated (sed)

I need to replace each character of a regular expression, once evaluated, with each character plus the # symbol.
For example:
If the regular expression is: POS[AB]
and the input text is: POSA_____POSB
I want to get this result: P#O#S#A#_____P#O#S#B#
Please, using sed or awk.
I have tried this:
$ echo "POSA_____POSB" | sed "s/POS[AB]/&#/g"
POSA#_____POSB#
$ echo "POSA_____POSB" | sed "s/./&#/g"
P#O#S#A#_#_#_#_#_#P#O#S#B#
But what I need is:
P#O#S#A#_____P#O#S#B#
Thank you in advance.
Best regards,
Octavio

Perl to the resuce!
perl -pe 's/(POS[AB])/$1 =~ s:(.):$1#:gr/ge'
The /e interprets the replacement as code, and it contains another substitution which replaces each character with itself plus #.
In ancient Perls before 5.14 (i.e. without the /r modifier), you need to use a bit more complex
perl -pe 's/(POS[AB])/$x = $1; $x =~ s:(.):$1#:g; $x/ge'

echo "POSA_____POSB" | sed "s/[^_]/&#/g"
or
echo "POSA_____POSB" | sed "s/[POSAB]/&#/g"

Try this regex:
echo "POSA_____POSB" | sed "s/[A-Z]/&#/g"
Output:
P#O#S#A#_____P#O#S#B#

You may replace regex pattern using awk with sub (first matching substring, sed "s///") or gsub (substitute matching substrings globally, sed "s///g") commands. The regex themselves will not differ between sed and awk. In your case you want:
Solution 1
EDIT: edited to match the comments
The following awk will limit substitution to a given substring (e.g.'POSA_____POSB'):
echo "OOPS POSA_____POSB" | awk '{str="POSA_____POSB"}; {gsub(/[POSAB]/,"&#",str)}; {gsub(/'POSA_____POSB'/, str); print $0} '
If your input consist only of matched string, try this:
echo "POSA_____POSB" | awk '{gsub(/[POSAB]/,"&#");}1'
Explanation:
Separate '{}' for each action and explicit print are for clarity sake.
The gsub accepts 3 arguments gsub(pattern, substitution [, target]) where target must be variable (gsub will change it inplace and store result there).
We use var named 'str' and initialize it with value (your string) before doing any substitutions.
The second gsub is there to put modified str into $0 (matches the whole record/line).
The expressions are greedy by default --- they will match the longest string possible.
[] introduces set of characters to be matched: every occurence of any char will be matched. The expression above says awk to match each occurence of any of "POSAB".
Your first regexp does not work as expected for you told sed to match POS ending in any of [AB] (the whole string at once).
In the other expression you told it to match any single character (including "_") when you used: '.' (dot).
If you want to generalize this solution you may use: [\w] expression which will match any of [a-zA-Z0-9_] or [a-z], [A-Z], [0-9] to match lowercase, uppercase letters and numbers respectively.
Solution 2
Note that you might negate character sets with [^] so: [^_] would also work in this particular case.
Explanation:
Negation means: match anything but the character between '[]'. The '^' character must come as first char, right after opening '['.
Sidenotes:
Also it may be good idea to directly indicate you want to match one character at a time with [POSAB]? or [POSAB]{1}.
Also note that some implementations of sed might need -r switch to use extended (more complicated) regexps.

With the given example you can use
echo "POSA_____POSB" | sed -r 's/POS([AB])/P#O#S#\1#/g'
This will fail for more complicated expressions.
When your input is without \v and \r, you can use
echo "POSA_____POSB" |
sed -r 's/POS([AB])/\v&\r/g; :loop;s/\v([^\r])/\1#\v/;t loop; s/[\v\r]//g'

Linux shell extracting substring between matching patterns

Let's say I have a string poskek|gfgfd|XLSE|a1768|d234|uijjk and I want to extract just the LSE part.
I only know that there will be |X directly before LSE, and | directly after the part I am interested in LSE.

The other answer using sed should work, but I always find sed to be a bit awkward for regex selection, as it's really intended for replacement (hence why either side of the pattern needs to be flanked with .* and the part you actually want needs to be in parentheses). Here's a solution using grep:
grep -Po '\|X\K[^|]+'
-P signals grep to use Perl's regex engine which is more advanced
-o only prints the matching part of the line
\|X match a literal vertical bar and a capital X
\K forget what has currently been matched (do not include it in the final output)
[^|]+ one or more characters other than vertical bars

As a pure bash solution, please try:
str='poskek|gfgfd|XLSE|a1768|d234|uijjk'
ext=${str#*|X}
ext=${ext%%|*}
echo "$ext"
If regex is available, following also works:
if [[ $str =~ .*\|X([^|]+) ]]; then
echo "${BASH_REMATCH[1]}"
fi

echo 'poskek|gfgfd|XLSE|a1768|d234|uijjk' | sed -n 's/.*|X\([^|]\+\).*/\1/p'
That ought to do the trick.
Explained:
sed -n will not print anything unless specified
s/ - search and replace
.*|X - match everything up to and including |X
\([^|]\+\) - capture multiple (at least one) character that isn't a |
.* - match the rest of the text (just to "eat it up")
/\1/p - Replace all matched text with the first capture, and print

For this particular case, you could do the rather unconventional:
awk '$1=="X"{$1="";print}' FS= OFS= RS=\|

try this
echo 'poskek|gfgfd|XLSE|a1768|d234|uijjk' |
awk -F "|" '{for(i=1;i<=NF;++i) printf "%s", (substr($i,1,1)=="X"?substr($i,2):"")}'
where
-F is field seperator => '|'
NF is number of fields

replace more than one special character with sed

I´m a nooby in regex so i have my headache with sed.
I need help to replace all special characters from the given company names with "-".
So this is the given string:
FML Finanzierungs- und Mobilien Leasing GmbH & Co. KG
I want the result:
FML-Finanzierungs-und-Mobilien-Leasing-GmbH-&-Co-KG
I tried the following:
nr = $(echo "$name" | sed -e 's/ /-/g'))
so this replace all whitespaces with -, but what the right expression to replace the others? My one search via google are not very successful.

That depends on what you consider to be a special character -- I say this because you appear to consider & a regular character but not ., which seems a bit odd. Anyway, I imagine something of the form
nr=$(echo "$name" | sed 's/[^[:alnum:]&]\+/-/g')
would serve you best. Here [^[:alnum:]&] matches any character that is not alphanumeric or &, and [^[:alnum:]&]\+ matches a sequence of one or more such characters, so the sed call replaces all such sequences in $name with a hyphen. If there are other characters that you consider regular, add them to the set. Note that the handling of umlauts and suchlike depends on your locale.
Also note that echo may cause trouble if $name begins with a hyphen (it could be parsed as options for echo), so if you can tether yourself to bash,
nr=$(sed 's/[^[:alnum:]&]\+/-/g' <<< "$name")
might be more robust.

Apparently you wan to remove - and . and then replace spaces with -.
This would do it, by saying sed -e 'one thing' -e 'another thing':
$ echo "$name" | sed -e 's/[-\.]//g' -e 's/ /-/g'
FML-Finanzierungs-und-Mobilien-Leasing-GmbH-&-Co-KG
Note we enclose within square backets all the characters that we want to treat equally: [-\.] means either - or . (we need to escape it, otherwise it would match any character).

Do this help you:
awk -vOFS=- '{gsub(/[.-]/,"");$1=$1}1' <<< "$name"
FML-Finanzierungs-und-Mobilien-Leasing-GmbH-&-Co-KG
gsub(/[.-]/,"") Removes . and _
-vOFS=- sets new field separator to -
$1=$1 reconstruct the line so it uses new field separator
1 print the line.
To get it to a variable
nr=$(awk -vOFS=- '{gsub(/[.-]/,"");$1=$1}1' <<< "$name")

Try this way also
echo "name" | sed 's/ \|- \|\. /-/g'
OutPut :
FML-Finanzierungs-und-Mobilien-Leasing-GmbH-&-Co-KG

Replace slash in Bash

Let's suppose I have this variable:
DATE="04\Jun\2014:15:54:26"
Therein I need to replace \ with \/ in order to get the string:
"04\/Jun\/2014:15:54:26"
I tried tr as follows:
echo "04\Jun\2014:15:54:26" | tr '\' '\\/'
But this results in: "04\Jun\2014:15:54:26".
It does not satisfy me. Can anyone help?

No need to use an echo + a pipe + sed.
A simple substitution variable is enough and faster:
echo ${DATE//\//\\/}
#> 04\/Jun\/2014:15:54:26

Use sed for substitutions:
sed 's#/#\\/#g' < filename.txt > newfilename.txt
You usually use "/" instead of the "#", but as long as it is there, it doesn't matter.
I am writing this on a windows PC so I hope it is right, you may have to escape the slashes with another slash.
sed explained, the -e lets you edit the file in place. You can use -i to create a backup automatically.
sed -e s/STRING_TO_REPLACE/STRING_TO_REPLACE_IT/g index.html

here you go:
kent$ echo "04/Jun/2014:15:54:26"|sed 's#/#\\/#g'
04\/Jun\/2014:15:54:26
your tr line was not correct, you may mis-understand what tr does, tr 'abc' 'xyz' will change a->x, b->y, c->z,not changing whole abc->xyz..

You can also escape the slashes, with a slightly less readable solution than with hashes:
echo "04/Jun/2014:15:54:26" | sed 's/\//\\\//g'

This has not been said in other answers so I thought I'd add some clarifications:
tr uses two sets of characters for replacement, and the characters from the first set are replaced with those from the second set in a one-to-one correspondance. The manpage states that
SET2 is extended to length of SET1 by repeating its last character as necessary. Excess characters of SET2 are ignored.
Example:
echo abca | tr ab de # produces decd
echo abca | tr a de # produces dbcd, 'e' is ignored
echo abca | tr ab d # produces ddcd, 'd' is interpreted as a replacement for 'b' too
When using sed for substitutions, you can use another character than '/' for the delimiter, which will make your expression clearer (I like to use ':', #n34_panda proposed '#' in their answer). Don't forget to use the /g modifier to replace all occurences: sed 's:/:\\/:g' with quotes or sed s:/:\\\\/:g without (backslashes have to be escaped twice).
Finally your shortest solution will probably be #Luc-Olivier's answer, involving substitution, in the following form (don't forget to escape forward slashes too when part of the expected pattern):
echo ${variable/expected/replacement} # will replace one occurrence
echo ${variable//expected/replacement} # will replace all occurrences

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Bash- How to convert non-alphanumerical character to "_" - regex

I believe it can all be done in 1 single sed command like this: echo 'Foo$%!*#BAR###baZ' | sed -e 's/[A-Z]/\L&/g' -e 's/[^a-z0-9\-]/_/g' OUTPUT foo___bar_baz

perl way: perl -ple 's/[^\w\-]/_/g' pure bash way a='foo-BAR_123,.:goo' echo ${a//[^[:alnum:]-]/_} produces: foo-BAR_123___goo

Related

Get substring using either perl or sed

How to process a regular expression after being evaluated (sed)

Linux shell extracting substring between matching patterns

replace more than one special character with sed

Replace slash in Bash

Categories

Resources

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Bash- How to convert non-alphanumerical character to "_" - regex

I believe it can all be done in 1 single sed command like this: echo 'Foo$%!*#BAR###baZ' | sed -e 's/[A-Z]/\L&/g' -e 's/[^a-z0-9\-]/_/g' OUTPUT foo_____bar___baz

perl way: perl -ple 's/[^\w\-]/_/g' pure bash way a='foo-BAR_123,.:goo' echo ${a//[^[:alnum:]-]/_} produces: foo-BAR_123___goo

Related

Get substring using either perl or sed

How to process a regular expression after being evaluated (sed)

Linux shell extracting substring between matching patterns

replace more than one special character with sed

Replace slash in Bash

Categories

Resources

I believe it can all be done in 1 single sed command like this: echo 'Foo$%!*#BAR###baZ' | sed -e 's/[A-Z]/\L&/g' -e 's/[^a-z0-9\-]/_/g' OUTPUT foo___bar_baz