Sed wildcards- replace in the middle of certain characters - regex

I have a line like this
"RU_28" "CDM_279" "CDI_45"
"RU_567" "CDM_528" "CDI_10000"
I want to obtain the result below
"RU_28" "CDM_Unusued" "CDI_45"
"RU_567" "CDM_Unusued" "CDI_10000"
Do this for all the lines in the file
I'm using this commands:
sed 's/\"CDM_\w*\"/\"Unusued\"/g' File1.txt > File2.txt
It doesn't seem to works.
Thanks in advance!!!

You can use:
sed -i.bak 's/"\(CDM_\)[^"]*"/"\1Unused"/' file1.txt

You're not actually leaving "CDM_" on the right side. Your substitution says "Replace CDM_ and any number of words with Unusued." The common way to do what you actually want to do is to use parentheses around the section on the left you want to keep, and then use backreferences to indicate where they go on the right. In this case, you just need a single backreference, indicated by \1:
sed 's/"\(CDM_\)\w*"/"\1Unusued"/g' File1.txt > File2.txt
Note the backslashes before the parentheses on the left side; these can be omitted if using sed with -r (I think), for extended regexps, but as-is, they're necessary so sed knows they're not literal.
Edit: I've updated the command in response to the accurate comment by Birei, noting that the extraneous escapes for the double-quotes. (Note that the ones for the parentheses are still necessary).

Related

Using ampersand in sed

I have a csv file full of lines like the following:
Aity Chel Jenni,Hendaland 229,2591 TE Amsterdam
I want to create a sed pattern for in an automated batch script that changes the info in this kind of formatting into the following formatting:
Aity Chel Jenni,Hendaland 30,2591 TE, Amsterdam
With a bit of research, I found out that I had to create a regex, then use an ampersand (&) character to have it change things around using the & to define the location of the regex.
I have tried the following:
sed 's/([1-9] [A-Z]{2}/&,/' file1 >file2
And have been trying variants of that trying to get the regexes down, but it doesn't seem to change anything.
Am I making a mistake in the usage of the ampersand or is my regex wrong?
Reading through the internet I can't seem to wrap my head around this function, can someone give me any examples/explain to me how to properly do this?
You are saying
sed 's/([1-9] [A-Z]{2}/&,/' file1 >file2
^
But you don't have to capture with () to use &. Instead, just say:
sed 's/[1-9] [A-Z]\{2\}/&,/' file
Note you need to escape the elements in the { } quantifier, unless you use -r:
sed -r 's/[1-9] [A-Z]{2}/&,/' file
Try the following:
sed -r 's:[0-9] [A-Z]{2}\b:&,:' file > out
About your own pattern, you're missing the closing parenthesis. And, iirc, you need to escape ( inside sed patterns to not match them literally.
The -r option enabled sed to use extended regex, which provides the {2} expansion.

sed replace exact match

I want to change some names in a file using sed. This is how the file looks like:
#! /bin/bash
SAMPLE="sample_name"
FULLSAMPLE="full_sample_name"
...
Now I only want to change sample_name & not full_sample_name using sed
I tried this
sed s/\<sample_name\>/sample_01/g ...
I thought \<> could be used to find an exact match, but when I use this, nothing is changed.
Adding '' helped to only change the sample_name. However there is another problem now: my situation was a bit more complicated than explained above since my sed command is embedded in a loop:
while read SAMPLE
do
name=$SAMPLE
sed -e 's/\<sample_name\>/$SAMPLE/g' /path/coverage.sh > path/new_coverage.sh
done < $1
So sample_name should be changed with the value attached to $SAMPLE. However when running the command sample_name is changed to $SAMPLE and not to the value attached to $SAMPLE.
I believe \< and \> work with gnu sed, you just need to quote the sed command:
sed -i.bak 's/\<sample_name\>/sample_01/g' file
In GNU sed, the following command works:
sed 's/\<sample_name\>/sample_01/' file
The only difference here is that I've enclosed the command in single quotes. Even when it is not necessary to quote a sed command, I see very little disadvantage to doing so (and it helps avoid these kinds of problems).
Another way of achieving what you want more portably is by adding the quotes to the pattern and replacement:
sed 's/"sample_name"/"sample_01"/' script.sh
Alternatively, the syntax you have proposed also works in GNU awk:
awk '{sub(/\<sample_name\>/, "sample_01")}1' file
If you want to use a variable in the replacement string, you will have to use double quotes instead of single, for example:
sed "s/\<sample_name\>/$var/" file
Variables are not expanded within single quotes, which is why you are getting the the name of your variable rather than its contents.
#user1987607
You can do this the following way:
sed s/"sample_name">/sample_01/g
where having "sample_name" in quotes " " matches the exact string value.
/g is for global replacement.
If "sample_name" occurs like this ifsample_name and you want to replace that as well
then you should use the following:
sed s/"sample_name ">/"sample_01 "/g
So that it replaces only the desired word. For example the above syntax will replace word "the" from a text file and not from words like thereby.
If you are interested in replacing only first occurence, then this would work fine
sed s/"sample_name"/sample_01/
Hope it helps

How to remove the milliseconds from timestamps with sed?

My input file is as follows:
12/13/2011,07:14:13.724,12/13/2011 07:14:13.724,231.56.3.245,LasVegas,US
I wish to get the following:
12/13/2011,07:14:13,12/13/2011 07:14:13,231.56.3.245,LasVegas,US
I tried this, but with no success:
sed "s/[0-9]{2}\:[0-9]{2}\:[0-9]{2}\(\.[0-9]{1,3}\)/\1/g" input_file.csv > output.csv
sed 's/\(:[0-9][0-9]\)\.[0-9]\{3\}/\1/g' input_file.csv > output.csv
You were almost there. In classic sed, you have to use backslashes in front of parentheses and braces to make them into metacharacters. Some versions of sed may have a mechanism to invert operations, so that the braces and parentheses are metacharacters by default, but that's not reliable across platforms.
Also (strong recommendation): use single quotes around the sed command. Otherwise, the shell gets a crack at interpreting those backslashes (and any $ signs, etc) before sed sees it. Usually, this confuses the coder (and especially the maintaining coder). In fact, use single quotes around arguments to programs whenever you can. Don't get paranoid about it - if you need to interpolate a variable, do so. But single-quoting is generally easier to code, and ultimately easier to understand.
I chose to work on just one time unit; you were working on three. Ultimately, given systematically formed input data, there is no difference in the result - but there is a (small) difference in the readability of the script.
Try:
sed 's,\(:[0-9]\{2\}\).[0-9]\{3\},\1,g'
Also, try \d instead of [0-9], your version of sed may support that.
You were near but some characters are special in sed (in my version, at least): {, }, (, ), but not :. So you need to escape them with a back-slash.
And \1 takes expression between paretheses, it should be the first part until seconds, not the second one.
A modification of your version could be:
sed "s/\([0-9]\{2\}:[0-9]\{2\}:[0-9]\{2\}\)\.[0-9]\{1,3\}/\1/g" input_file.csv > output.csv
This might work for you:
sed 's/\....//;s/\....//' input_file.csv >output_file.csv
Since the sed solution has already been posted, here is an alternate awk solution:
[jaypal:~/Temp] cat inputfile
12/13/2011,07:14:13.724,12/13/2011 07:14:13.724,231.56.3.245,LasVegas,US
[jaypal:~/Temp] awk -F"," -v ORS="," '
{for(i=1;i<NF;i+=1)
if (i==2||i==3) {sub(/\..*/,"",$i);print $i}
else print $i;printf $NF"\n"}' inputfile
12/13/2011,07:14:13,12/13/2011 07:14:13,231.56.3.245,LasVegas,US
Explanation:
Set the Field Separator to , and Output Record Separator to ,.
Using a for loop we will loop over each fields.
Using an if loop we would do substitution to the fields when the for loop parses over second and third fields.
If the fields are not 2nd and 3rd then we just print out the fields.
Lastly since we have used the for loop for <NF we just print out $NF which is the last field. This won't cause a , to be printed after last field.

using sed to copy lines and delete characters from the duplicates

I have a file that looks like this:
#"Afghanistan.png",
#"Albania.png",
#"Algeria.png",
#"American_Samoa.png",
I want it to look like this
#"Afghanistan.png",
#"Afghanistan",
#"Albania.png",
#"Albania",
#"Algeria.png",
#"Algeria",
#"American_Samoa.png",
#"American_Samoa",
I thought I could use sed to do this but I can't figure out how to store something in a buffer and then modify it.
Am I even using the right tool?
Thanks
You don't have to get tricky with regular expressions and replacement strings: use sed's p command to print the line intact, then modify the line and let it print implicitly
sed 'p; s/\.png//'
Glenn jackman's response is OK, but it also doubles the rows which do not match the expression.
This one, instead, doubles only the rows which matched the expression:
sed -n 'p; s/\.png//p'
Here, -n stands for "print nothing unless explicitely printed", and the p in s/\.png//p forces the print if substitution was done, but does not force it otherwise
That is pretty easy to do with sed and you not even need to use the hold space (the sed auxiliary buffer). Given the input file below:
$ cat input
#"Afghanistan.png",
#"Albania.png",
#"Algeria.png",
#"American_Samoa.png",
you should use this command:
sed 's/#"\([^.]*\)\.png",/&\
#"\1",/' input
The result:
$ sed 's/#"\([^.]*\)\.png",/&\
#"\1",/' input
#"Afghanistan.png",
#"Afghanistan",
#"Albania.png",
#"Albania",
#"Algeria.png",
#"Algeria",
#"American_Samoa.png",
#"American_Samoa",
This commands is just a replacement command (s///). It matches anything starting with #" followed by non-period chars ([^.]*) and then by .png",. Also, it matches all non-period chars before .png", using the group brackets \( and \), so we can get what was matched by this group. So, this is the to-be-replaced regular expression:
#"\([^.]*\)\.png",
So follows the replacement part of the command. The & command just inserts everything that was matched by #"\([^.]*\)\.png", in the changed content. If it was the only element of the replacement part, nothing would be changed in the output. However, following the & there is a newline character - represented by the backslash \ followed by an actual newline - and in the new line we add the #" string followed by the content of the first group (\1) and then the string ",.
This is just a brief explanation of the command. Hope this helps. Also, note that you can use the \n string to represent newlines in some versions of sed (such as GNU sed). It would render a more concise and readable command:
sed 's/#"\([^.]*\)\.png",/&\n#"\1",/' input
I prefer this over Carles Sala and Glenn Jackman's:
sed '/.png/p;s/.png//'
Could just say it's personal preference.
or one can combine both versions and apply the duplication only on lines matching the required pattern
sed -e '/^#".*\.png",/{p;s/\.png//;}' input

select part of filename using regex

I got a file that looks like
dcdd62defb908e37ad037820f7 /sp/dir/su1/89/asga.gz
7d59319afca23b02f572a4034b /sp/dir/su2/89/sfdh.gz
ee1d443b8a0cc27749f4b31e56 /sp/dir/su3/89/24.gz
33c02e311fd0a894f7f0f8aae4 /sp/dir/su4/89/dfad.gz
43f6cdce067f6794ec378c4e2a /sp/dir/su5/89/adf.gz
2f6c584116c567b0f26dfc8703 /sp/dir/su6/895/895.gz
a864b7e327dac1bb6de59dedce /sp/dir/su7/895/895.gz
How do i use sed to substitue all the su* such that I can replace with a single value like
sed "s/REXEXP/newfolder/g" myfile
thanks in advance
I think you want
sed 's/su./newfolder/g'
If you actually want to keep the number in su1...su7 as a part of newfolder (for example newfolder1...newfolder7), you can do:
sed 's/su\(.\)/newfolder\1/g'
It also depends upon how "strict" do you want your patterns to be. The above will match su followed by any character and do the replacement. On the other hand, a command like s#/su\([0-9]\)/#/newfolder\1/#g will only match /su followed by a digit, followed by /. So you may need to adjust your pattern accordingly.
$ sed -e 's|/su[^/]*|/newfolder|' /tmp/files\
dcdd62defb908e37ad037820f7 /sp/dir/newfolder/89/asga.gz
7d59319afca23b02f572a4034b /sp/dir/newfolder/89/sfdh.gz
...
If you want to get rid of the checksums as well:
$ sed -r -e 's|/su[^/]*|/newfolder|' -e 's/^[^ ]+ +//' /tmp/files\
/sp/dir/newfolder/89/asga.gz
/sp/dir/newfolder/89/sfdh.gz
...
su[0-9] will match a single digit.
sed requires a dirty amount of metacharacter escaping, some of it may be slightly off.
sed -i -e 's/\/su[^\/]+\//\/newFolder\//g' myfile
I vote for Wayne Conrad's answer as the most likely to be what the OP wants, but I'd suggest using an alternate character for the sed expression separator, thus:
sed 's|/su[^/]*|/newfolder|' /tmp/files
That makes it a bit cleaner.
Note also that the trailing 'g' is probably not wanted.
use awk. since there is a delimiter you can use , '/'. after that, column 4 is what you want to change. So if you have paths like /sp/su3dir/su2/89/sfdh.gz , su3dir will not be affected.
awk -F"/" '{$4="newfolder";}1' OFS="/" file