sed: cannot seem to match pattern to line

sed: cannot seem to match pattern to line - regex

If I have sed script like this:
35185222p
And run it as:
sed -n -f seddy.sed infile.xml
It correctly prints out the dodgy line of XML I want to fix.
If I change my script to:
35185222s#^.*$#<load address='11b38c56' size='08' />#p
It doesn't make the match (ie no output is made). What have I got wrong?
OK: I think I get this now - unfortunately the corruption in this line in the original file means characters won't match to a . - so how do I fix that?
Further update This is what the line looks like when I cut and paste it:
<load address='11c1�����ze='08' />

Try the sed c command to change the contents of the line:
35185222c\<load address='11b38c56' size='08' />
I frankly don't know why the regex ^.*$ would not match on that line. My guess is that it has something to do with your locale and character encodings, but it seems like it has to be a bug either way.

The real issue appears to be a clash of locales.
Running
LANG=c sed -f seddy.sed input.xml
Fixes the problem. Of course, I could have used the c command instead.

35185222s#[^].*[$]#<load address='11b38c56' size='08' />#p
^ and $ should be escaped or between [ ] at least if not, there meaning (^ = begin, $ = end) is used and there is nothing before a begining nor after a end.
be carrefull also with the ', it depend of your string delimiter from your sed and must mybe are escpaed too

Related

White spaces in sed search string

I want to substitute a String from a file which is:
# - "server1"
My first attempt was something like this:
sed -i 's/#\ -\ "\server1"\.*/ChangedWord/g' file
But I get an error if I try it like this.
So there is to be another way to handle whitespaces, I guess I have to use \s or [[:space:]]. But for some how I am not able to make it work.

I think you are complicating the expression too much. This should be enough:
sed 's/^#[[:space:]]*-[[:space:]]*"server1".*/ChangedWord/' file
It looks for those lines starting with # followed by 0 to n spaces, then "server1" and then anything. In such case, it replaces the line with ChangedWord.
Note I am using [[:space:]] to match the spaces, since it is a more compatible way (thanks Tom Fenech in comments).
Note also there is no need to use g in the sed expression, because the pattern can occur just once per line.
Test
$ cat a
hello
# - "server1"
hello# - "server1"
$ sed 's/^#[[:space:]]*-[[:space:]]*"server1".*/ChangedWord/' a
hello
ChangedWord
hello# - "server1"

The actual fault was the missing escaping from the double quotes:
ssh -i file root#IP sed 's/^#[[:space:]]*-[[:space:]]*\"server1\".*/ChangedWord/' file
That did it for me. Thanks for all your support

rghome is right, you don't need those backslashes in front of spaces as the expression is wrapped in quotes. In fact, they're causing the error: sed is telling you that \<Space> is not a valid option. Just remove them and it should work as expected:
sed -i 's/# - "server1"/ChangedWord/' file

sed replace exact match

I want to change some names in a file using sed. This is how the file looks like:
#! /bin/bash
SAMPLE="sample_name"
FULLSAMPLE="full_sample_name"
...
Now I only want to change sample_name & not full_sample_name using sed
I tried this
sed s/\<sample_name\>/sample_01/g ...
I thought \<> could be used to find an exact match, but when I use this, nothing is changed.
Adding '' helped to only change the sample_name. However there is another problem now: my situation was a bit more complicated than explained above since my sed command is embedded in a loop:
while read SAMPLE
do
name=$SAMPLE
sed -e 's/\<sample_name\>/$SAMPLE/g' /path/coverage.sh > path/new_coverage.sh
done < $1
So sample_name should be changed with the value attached to $SAMPLE. However when running the command sample_name is changed to $SAMPLE and not to the value attached to $SAMPLE.

I believe \< and \> work with gnu sed, you just need to quote the sed command:
sed -i.bak 's/\<sample_name\>/sample_01/g' file

In GNU sed, the following command works:
sed 's/\<sample_name\>/sample_01/' file
The only difference here is that I've enclosed the command in single quotes. Even when it is not necessary to quote a sed command, I see very little disadvantage to doing so (and it helps avoid these kinds of problems).
Another way of achieving what you want more portably is by adding the quotes to the pattern and replacement:
sed 's/"sample_name"/"sample_01"/' script.sh
Alternatively, the syntax you have proposed also works in GNU awk:
awk '{sub(/\<sample_name\>/, "sample_01")}1' file
If you want to use a variable in the replacement string, you will have to use double quotes instead of single, for example:
sed "s/\<sample_name\>/$var/" file
Variables are not expanded within single quotes, which is why you are getting the the name of your variable rather than its contents.

#user1987607
You can do this the following way:
sed s/"sample_name">/sample_01/g
where having "sample_name" in quotes " " matches the exact string value.
/g is for global replacement.
If "sample_name" occurs like this ifsample_name and you want to replace that as well
then you should use the following:
sed s/"sample_name ">/"sample_01 "/g
So that it replaces only the desired word. For example the above syntax will replace word "the" from a text file and not from words like thereby.
If you are interested in replacing only first occurence, then this would work fine
sed s/"sample_name"/sample_01/
Hope it helps

regexp find and replace: bash variables inside sed

I would like to remove this sequence when present at the beginning of the line:
ATCGGAAGAGCACACGTCTGAACTCCAGTCACTGACCAATCTCGTATGCCGTCTTCTGCTTG followed by at least 3 A characters.
Both, sequence and multiple A should be removed and the rest of the file should be preserved.
My input files look like this:
#M00946:3:000000000-A2WF2:1:1101:18115:1962 1:N:0:2
GATCGGAAGAGCACACGTCTGAACTCCAGTCACTGACCAATCTCGTATGCCGTCTTCTGCTTGAAAAAAAAAACATTTTCTTTCTTACTTCGTTCACTTTCCACTTCTTTCTCCCTATCTTCCCCCTTCTGTCTGCCCCAGCTGTCTATCCCACTTATTGTCTCCCCCCACTGCCCCACACTCCTACCTTCTTCATCTTCACCTAACACCTCCCGCTCCCTCCTTATCGTCTCTTATCCTTTCCTTGTTCC
+
????????DDDDDDDDGGGGGGHHIIIIHHHIIIIFHIIIH/CGFHHIIIIHEDHHIIIIHI=5EEGFEHHEC+5,,4#,#,,....--..+77,,.6..6.....7.4..7.76=..-5.>.4-)134-.5....-3*))0***1*********10*0**01*1*)''..0***.)0'))*****00*11******01***0****0*)**0)'''...*0)0*11********1****1*0********
#M00946:3:000000000-A2WF2:1:1101:19888:2900 1:N:0:2
GATCGGAAGAGCACACGTCTGAACTCCAGTCACTGACCAATCTCGTATGCCGTCTTCTGCTTGAAAAAAAAAAAACACAAATACCGTTCCAATATCTTTTTGTTTCATGTCTAATAAC
+
<<??????BB?BBBBBCAFFFCFHF;>EFCDFGFFHFBGHCA=FHA>EFGEE7CF>F?FFHB=?EEGF>>DH5<)++,++,4,,4+=:,,,,5,,,,,,,,),33?,3,3,3,,,,33
I was trying to use script replace.sh which looks like this
file=$1;
adapter_sequence=$2;
sed -r "s/${adapter_sequence}A{3}//" $file
from the command line:
./replace.sh file.fastq GATCGGAAGAGCACACGTCTGAACTCCAGTCACTGACCAATCTCGTATGCCGTCTTCTGCTTG
It did not work. Any help in any script language will be appreciated.

I believe your have $1, $2 reversed. Have it like this:
adapter_sequence=$2
sed "s/$adapter_sequence//" $1
In the ideal case I would like to remove all adapter sequences
starting at the beginning of line followed by at least three A
letters,
Try this sed:
sed -r "s/^${adapter_sequence}A{3,}//" file

Regular Expression to parse Common Name from Distinguished Name

I am attempting to parse (with sed) just First Last from the following DN(s) returned by the DSCL command in OSX terminal bash environment...
CN=First Last,OU=PCS,OU=guests,DC=domain,DC=edu
I have tried multiple regexs from this site and others with questions very close to what I wanted... mainly this question... I have tried following the advice to the best of my ability (I don't necessarily consider myself a newbie...but definitely a newbie to regex..)
DSCL returns a list of DNs, and I would like to only have First Last printed to a text file. I have attempted using sed, but I can't seem to get the correct function. I am open to other commands to parse the output. Every line begins with CN= and then there is a comma between Last and OU=.
Thank you very much for your help!

I think all of the regular expression answers provided so far are buggy, insofar as they do not properly handle quoted ',' characters in the common name. For example, consider a distinguishedName like:
CN=Doe\, John,CN=Users,DC=example,DC=local
Better to use a real library able to parse the components of a distinguishedName. If you're looking for something quick on the command line, try piping your DN to a command like this:
echo "CN=Doe\, John,CN=Users,DC=activedir,DC=local" | python -c 'import ldap; import sys; print ldap.dn.explode_dn(sys.stdin.read().strip(), notypes=1)[0]'
(depends on having the python-ldap library installed). You could cook up something similar with PHP's built-in ldap_explode_dn() function.

Two cut commands is probably the simplest (although not necessarily the best):
DSCL | cut -d, -f1 | cut -d= -f2
First, split the output from DSCL on commas and print the first field ("CN=First Last"); then split that on equal signs and print the second field.

Using sed:
sed 's/^CN=\([^,]*\).*/\1/' input_file
^ matches start of line
CN= literal string match
\([^,]*\) everything until a comma
.* rest

http://www.gnu.org/software/gawk/manual/gawk.html#Field-Separators
awk -v RS=',' -v FS='=' '$1=="CN"{print $2}' foo.txt

I like awk too, so I print the substring from the fourth char:
DSCL | awk '{FS=","}; {print substr($1,4)}' > filterednames.txt

This regex will parse a distinguished name, giving name and val a capture groups for each match.
When DN strings contain commas, they are meant to be quoted - this regex correctly handles both quoted and unquotes strings, and also handles escaped quotes in quoted strings:
(?:^|,\s?)(?:(?<name>[A-Z]+)=(?<val>"(?:[^"]|"")+"|[^,]+))+
Here is is nicely formatted:
(?:^|,\s?)
(?:
(?<name>[A-Z]+)=
(?<val>"(?:[^"]|"")+"|[^,]+)
)+
Here's a link so you can see it in action:
https://regex101.com/r/zfZX3f/2
If you want a regex to get only the CN, then this adapted version will do it:
(?:^|,\s?)(?:CN=(?<val>"(?:[^"]|"")+"|[^,]+))

How to remove nonnumeric junk from a file

Here's an output from less:
487451
487450<A3><BA>1<A3><BA>1
487449<A3><BA>1<A3><BA>1
487448<A3><BA>1<A3><BA>1
487447<A3><BA>1<A3><BA>1
487446<A3><BA>1<A3><BA>1
487445<A3><BA>1<A3><BA>1
484300<A3><BA>1<A3><BA>1
484299<A3><BA>1<A3><BA>1
484297<A3><BA>1<A3><BA>1
484296<A3><BA>1<A3><BA>1
484295<A3><BA>1<A3><BA>1
484294<A3><BA>1<A3><BA>1
484293<A3><BA>1<A3><BA>1
483496
483495
483494
483493
483492
483491
I see a bunch of nonprintable characters here. How do I remove them using sed/tr?
My try was 's/\([0-9][0-9]*\)/\1/g', but it doesn't work.
EDIT: Okay, let's go further down the source. The numbers are extracted from this file:
487451"><img src="Manage/pic/20100901/Adidas running-429.JPG" alt="Adidas running-429" height="120" border="0" class="BK01" onload='javascript:if(this.width>160){this.width=160}' /></a></td>
487450"><img src="Manage/pic/20100901/Adidas fs 1<A3><BA>1-060.JPG" alt="Adidas fs 1<A3><BA>1-060" height="120" border="0" class="BK01" onload='javascript:if(this.width>160){this.width=160}' /></a></td>
The first line is perfectly normal and what most of the lines are. The second is "corrupted". I'd just like to extract the number at the beginning (using 's/\([0-9][0-9]*\).*/\1/g', but somehow the nonprintables get into the regex, which should stop at ".
EDIT II: Here's a clarification: There are no brackets in the text file. These are character codes of nonprintable characters. The brackets are there because I copied the file from less. Mac's Terminal, on the other hand, uses ?? to represent such characters. I bet xterm on my Ubuntu would print that white oval with a question mark.

Classic job for either sed's or Unix's tr command.
sed 's/[^0-9]//g' $file
(Anything that is not a digit - or newline - is deleted.)
tr -cd '0-9\012' < $file > $file.1
Delete (-d) the complement (-c) of the digits and newline...

You missed the bit where you match the rest of the line.
sed 's/\([0-9][0-9]*\)[^0-9]*/\1/g'
^^^^^^^

Try this sed command:
sed 's/^\([0-9][0-9]*\).*$/\1/' file.txt
OUTPUT (running above command on the input file you provided)
487451
487450
487449
487448
487447
487446
487445
484300
484299
484297
484296
484295
484294
484293
483496
483495
483494
483493
483492
483491

If you know the crap will always be inside brackets, why not delete that crap?
sed 's/<[^>]*>//g'
EDIT: Thanks, Mike that makes sense. In that case, how about:
sed 's/([0-9]+).*/\1/g'

If the data always is like the sample, deleting from the less-than to the end of the line would work fine.
sed -i "s/<.*$//" file

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

sed: cannot seem to match pattern to line - regex

The real issue appears to be a clash of locales. Running LANG=c sed -f seddy.sed input.xml Fixes the problem. Of course, I could have used the c command instead.

Related

White spaces in sed search string

sed replace exact match

regexp find and replace: bash variables inside sed

Regular Expression to parse Common Name from Distinguished Name

How to remove nonnumeric junk from a file

Categories

Resources