Sed replace character after every number

Sed replace character after every number - regex

I want to replace a character after every integer number with sed.
Example:
444 d
should go:
444,d
EDIT: The answer of stribizhev helped me out to find a solution with sed (GNU sed) 4.2.2
sed -r 's/([0-9]+)./\1,/g'
replaces an arbitary character after a number with a comma. The only problem is, that also a number at the end of the line creates an additional comma.

You can use capturing groups to do that:
sed 's/\(\d\+\)./\1,/g'
or (since with GNU sed you can avoid all the escaped parenthesis by using extended regular expressions)
sed -r 's/([0-9]+)./\1,/g'
Here is a demo showing what the regex does.
The [0-9]+ pattern matches an integer number (without decimals) even Iinside longer strings, even within longer words.

Related

Replace only single instance of a character using sed

I need to replace only single instance of backslash.
Input: \\apple\\\orange\banana\\\\grape\\\\\
Output: \\apple\\\orangebanana\\\\grape\\\\\
Tried using sed 's/\\//g' which is replacing all backslashes
Note: The previous character to single backslash can be anything including alphanumeric or special characters. And it's a multiline text file.
Appreciate your help on this.

If you want to consider perl then lookahead and lookahead is exactly what you need here:
perl -pe 's~(?<!\\)\\(?!\\)~~g' file
\\apple\\\orangebanana\\\\grape\\\\\
Details
(?<!\\): Negative lookbehind to make sure that previous char is not \
\\: Match a \
(?!\\): Negative lookahead to make sure that next char is not \
If you want to use sed only then I suggest:
sed -E -e ':a' -e 's~(^|[^\\])\\([^\\]|$)~\1\2~g; ta' g

When you want to replace at most one single backslash, you can use
sed -r 's/(.*[^\]|^)\\([^\].*|$)/\1\2/g'
The command is ugly due to the possibility of a line starting or ending with a backslash (need to include the possibility ^ and $).
When you want to get rid off '\al\l \\sin\gle\slas\hes \\\on \\\\a \\\\\l\i\n\e\' , you can remove a backslash from any sequence of backslashes and afterwards put one back at any place where at least one backslash is left:
sed -r 's/\\([\]*)/\1/g;s/([\]+)/\\\1/g'
or, as suggested by #potong,
sed -E 's/\\(\\*)/\1/g;s/(\\+)/\\&/g'
I like the solution, as it mimics someone who removes one of any sequence of backslashes and tries to undo his last operation. The "bug" in his attempt is that the resulting output is missing the single slashes.

With your shown samples, please try following sed code. Written and tested with GNU sed.
sed -E 's/^(\\\\[^\]*\\\\\\)([^\]*)\\(.*)/\1\2\3/' Input_file
Explanation: Using -E option to enable ERE(extended regular expression) for this program. Then using sed's back reference capability(to save matched part into temp buffer which could be used later in substitution part) here. Creating 1st capturing group which has \\apple\\\ in it. In 2nd capturing group it has orange in it then in 3rd capturing group it has rest of line in it. Now if you see carefully we have left \ between orange and banana, which is needed as per OP's required output.

This might work for you (GNU sed):
sed 's/\>\\\<//g' file
Delete a single \ between word boundaries.

How to use grep/sed/awk, to remove a pattern from beginning of a text file

I have a text file with the following pattern written to it:
TIME[32.468ms] -(3)-............."TEXT I WANT TO KEEP"
I would like to discard the first part of each line containing
TIME[32.468ms] -(3)-.............
To test the regular expression I've tried the following:
cat myfile.txt | egrep "^TIME\[.*\]\s\s\-\(3\)\-\.+"
This identifies correctly the lines I want. Now, to delete the pattern I've tried:
cat myfile.txt | sed s/"^TIME\[.*\]\s\s\-\(3\)\-\.+"//
but it just seems to be doing the cat, since it shows the content of the complete file and no substitution happens.
What am I doing wrong?
OS: CentOS 7

With your shown samples, please try following grep command. Written and tested with GNU grep.
grep -oP '^TIME\[\d+\.\d+ms\]\s+-\(\d+\)-\.+\K.*' Input_file
Explanation: Adding detailed explanation for above code.
^TIME\[ ##Matching string TIME from starting of value here.
\d+\.\d+ms\] ##Matching digits(1 or more occurrences) followed by dot digits(1 or more occurrences) followed by ms ] here.
\s+-\(\d+\)-\.+ ##Matching spaces91 or more occurrences) followed by - digits(1 or more occurrences) - and 1 or more dots.
\K ##Using \K option of GNU grep to make sure previous match is found in line but don't consider it in printing, print next matched regex part only.
.* ##to match till end of the value.
2nd solution: Adding awk program here.
awk 'match($0,/^TIME\[[0-9]+\.[0-9]+ms\][[:space:]]+-\([0-9]+\)-\.+/){print substr($0,RSTART+RLENGTH)}' Input_file
Explanation: using match function of awk, to match regex ^TIME\[[0-9]+\.[0-9]+ms\][[:space:]]+-\([0-9]+\)-\.+ which will catch text which we actually want to remove from lines. Then printing rest of the text apart from matched one which is actually required by OP.

This awk using its sub() function:
awk 'sub(/^TIME[[][^]]*].*\.+/,"")' file
"TEXT I WANT TO KEEP"
If there is replacement, sub() returns true.

$ cut -d'"' -f2 file
TEXT I WANT TO KEEP

You may use:
s='TIME[32.468ms] -(3)-............."TEXT I WANT TO KEEP"'
sed -E 's/^TIME\[[^]]*].*\.+//'
"TEXT I WANT TO KEEP"

The \s regex extension may not be supported by your sed.
In BRE syntax (which is what sed speaks out of the box) you do not backslash round parentheses - doing that turns them into regex metacharacters which do not match themselves, somewhat unintuitively. Also, + is just a regular character in BRE, not a repetition operator (though you can turn it into one by similarly backslashing it: \+).
You can try adding an -E option to switch from BRE syntax to the perhaps more familiar ERE syntax, but that still won't enable Perl regex extensions, which are not part of ERE syntax, either.
sed 's/^TIME\[[^][]*\][[:space:]][[:space:]]-(3)-\.*//' myfile.txt
should work on any reasonably POSIX sed. (Notice also how the minus character does not need to be backslash-escaped, though doing so is harmless per se. Furthermore, I tightened up the regex for the square brackets, to prevent the "match anything" regex you had .* from "escaping" past the closing square bracket. In some more detail, [^][] is a negated character class which matches any character which isn't (a newline or) ] or [; they have to be specified exactly in this order to avoid ambiguity in the character class definition. Finally, notice also how the entire sed script should normally be quoted in single quotes, unless you have specific reasons to use different quoting.)
If you have sed -E or sed -r you can use + instead of * but then this complicates the overall regex, so I won't suggest that here.

A simpler one for sed:
sed 's/^[^"]*//' myfile.txt

If the "text you want to keep" always surrounded by the quote like this and only them having the quote in the line starting with "TIME...", then:
sed -n '/^TIME/p' file | awk -F'"' '{print $2}'
should get the line starting with "TIME..." and print the text within the quotes.

Thanks all, for your help.
By the end, I've found a way to make it work:
echo 'TIME[32.468ms] -(3)-.............TEXT I WANT TO KEEP' | grep TIME | sed -r 's/^TIME\[[0-9]+\.[0-9]+ms\]\s\s-\(3\)-\.+//'
More generally,
grep TIME myfile.txt | sed -r ‘s/^TIME\[[0-9]+\.[0-9]+ms\]\s\s-\(3\)-\.+//’
Cheers,
Pedro

How to replace with one sed command first n letter to uppercase

I would like to replace with one sed command first n letter to uppercase.
Example 'madrid' to 'MADrid'. (n=3)
I know how to change first letter to uppercase with this command:
sed -e "s/\b\(.\)/\U\1/g"
but I dont know how to change this command for my problem.
I tried to change
sed -e "s/\b\(.\)/\U\1/g"
to
sed -e "s/\b\(.\)/\U\3/g"
but this didnt work. Also, I googled and searched on this site but exact answer with my problem I couldnt find.
Thank you.

I infer from your use of \U that you're using GNU sed:
n=3
echo 'madrid' | sed -r 's/\<(.{'"$n"'})/\U\1/g' # -> 'MADrid'
I've omitted the unnecessary -e option
I have added -r to enable support for extended regular expressions, which have more familiar syntax and also offer more features.
I'm using a single-quoted sed script with a shell-variable value spliced in so as to avoid confusion between what the shell expands up front and what is interpreted by sed itself.
\< is used instead of \b, because unlike the latter it only matches at the start of a word.Thanks, Casimir et Hippolyte
The above replaces any 3 characters at the start of a word, however.
To limit it to at most $n letters:
sed -r 's/\<([[:alpha:]]{1,'"$n"'})/\U\1/g'
As for what you've tried:
The \3 in your attempt sed -e "s/\b\(.\)/\U\3/g" refers to the 3rd capture group (parenthesized subexpression, (...)) in the regex (which doesn't exist), it does not refer to 3 repetitions.
Instead, you have to make sure that your one and only capture group (which you can reference as \1 in the substitution) itself captures as many characters as desired - which is what the {<n>} quantifier is for; the related {<m>,<n>} construct matches a range of repetitions.

This might work for you (GNU sed):
sed -r 's/[a-z]/&\n/'"$n"';s/^([^\n]*)\n/\U\1/' file
Where $n is the first n letters. Putting the question of word boundaries aside this converts n letters of a-z consecutive or non-consecutive to upper case i.e. A-Z
N.B. this is two sed commands not one!

Extract numbers from a string using sed and regular expressions

Another question for the sed experts.
I have a string representing an pathname that will have two numbers in it. An example is:
./pentaray_run2/Trace_220560.dat
I need to extract the second of these numbers - ie 220560
I have (with some help from the forums) been able to extract all the numbers together (ie 2220560) with:
sed "s/[^0-9]//g"
or extract only the first number with:
sed -r 's|^([^.]+).*$|\1|; s|^[^0-9]*([0-9]+).*$|\1|'
But what I'm after is the second number!! Any help much appreciated.
PS the number I'm after is always the second number in the string.

is this ok?
sed -r 's/.*_([0-9]*)\..*/\1/g'
with your example:
kent$ echo "./pentaray_run2/Trace_220560.dat"|sed -r 's/.*_([0-9]*)\..*/\1/g'
220560

You can extract the last numbers with this:
sed -e 's/.*[^0-9]\([0-9]\+\)[^0-9]*$/\1/'
It is easier to think this backwards:
From the end of the string, match zero or more non-digit characters
Match (and capture) one or more digit characters
Match at least one non-digit character
Match all the characters to the start of the string
Part 3 of the match is where the "magic" happens, but it also limits your matches to have at least a non-digit before the number (ie. you can't match a string with only one number that is at the start of the string, although there is a simple workaround of inserting a non-digit to the start of the string).
The magic is to counter-act the left-to-right greediness of the .* (part 4). Without part 3, part 4 would consume all it can, which includes the numbers, but with it, matching makes sure that it stops in order to allow at least a non-digit followed by a digit to be consumed by parts 1 and 2, allowing the number to be captured.

If grep is welcome :
$ echo './pentaray_run2/Trace_220560.dat' | grep -oP '\d+\D+\K\d+'
220560
And more portable with Perl with the same regex :
echo './pentaray_run2/Trace_220560.dat' | perl -lne 'print $& if /\d+\D+\K\d+/'
220560
I think the approach is cleaner & more robust than using sed

This might work for you (GNU sed):
sed -r 's/([^0-9]*([0-9]*)){2}.*/\2/' file
This extracts the second number:
sed -r 's/([^0-9]*([0-9]*)){1}.*/\2/' file
and this extracts the first.

Insert space after period using sed

I've got a bunch of files that have sentences ending like this: \#.Next sentence. I'd like to insert a space after the period.
Not all occurrences of \#. do not have a space, however, so my regex checks if the next character after the period is a capital letter.
Because I'm checking one character after the period, I can't just do a replace on \#. to \#., and because I don't know what character is following the period, I'm stuck.
My command currently:
sed -i .bak -E 's/\\#\.[A-Z]/<SOMETHING IN HERE>/g' *.tex
How can I grab the last letter of the matching string to use in the replacement regex?
EDIT: For the record, I'm using a BSD version of sed (I'm using OS X) - from my previous question regarding sed, apparently BSD sed (or at least, the Apple version) doesn't always play nice with GNU sed regular expressions.

The right command should be this:
sed -i.bak -E "s/\\\#.(\S)/\\\#. \1/g" *.tex
Whith it, you match any \# followed by non whitespace (\S) and insert a whitespace (what is made by replacing the whole match with '\# ' plus the the non whitespace just found).

Use this sed command:
sed -i.bak -E 's/(\\#\.)([A-Z])/\1 \2/g' *.tex
OR better:
sed -i.bak -E 's/(\\#\.)([^ \t])/\1 \2/g' *.tex
which will insert space if \#. is not followed by any white-space character (not just capital letter).

This might work for you:
sed -i .bak -E 's/\\#\. \?/\\#. /g' *.tex
Explanation:
If there's a space there replace it with a space, otherwise insert a space.

I think the following would be correct:
s/\\#\.[^\s]/\\#. /g
Only replace the expression if it is not followed by a space.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Sed replace character after every number - regex

Related

Replace only single instance of a character using sed

How to use grep/sed/awk, to remove a pattern from beginning of a text file

How to replace with one sed command first n letter to uppercase

Extract numbers from a string using sed and regular expressions

Insert space after period using sed

Categories

Resources