Using sed to replace one line (that might change) with another - replace

I want to run a script that changes a line in the HTML code, indicating when the page was last updated. So for instance, I have the line
<d>This page was last updated on 29.04.2013 at 00:34 UTC</d>
and I am updating it now, so I want to replace that line with
<d>This page was last updated on 15.05.2013 at 15:50 UTC</d>
This is the only line in my source code that has the <d> tag, so hopefully that helps. I already have some code that generates the new string with the current date and time, but I can't figure out a way to replace the old one (which changes, so I don't know exactly what it is).
I've tried putting in a comment <!--date--> in the previous line, deleting the whole line that has <d> (with grep), and then putting in a new line after the comment that is the new string, but that fails. For example, if I want to just insert the string text after the comment, and use
sed -i 's/<!--date-->/<!--date-->text/' file.html
I get invalid command code j. I think it might be because there are some special characters like <,!, and > in the strings, but if I want to put in the date string above, I will have even more, like : and /. Thanks for any ideas on how to fix this.

This will change the text only on lines that contain <d>:
sed -i.bak "/<d>/s/on .* at [^<]*/on newdate at newtime/" file.html
I've tested this with the BSD sed that ships with MacOS X 10.8.3

You don't need your <!--date--> hack. You can use regular expressions and another delimiter besides "/" in your sed command:
sed -i.bak 's#<d>This page was last updated on.*</d>#<d>This page was last updated on 12.05.2013 at 00:38 UTC</d>#' whatever.html
Or, if you have your update in a variable called $replacement:
sed -i.bak "s#<d>This page was last updated on.*</d>#$replacement#" whatever.html

When using the command line, try escaping special characters like this:
! ===> \!

Related

Insert newline before/after match for TSV

I'm going grey trying to figure out how to accomplish some regex matching to insert new lines. Example input/output below...
Example TSV Data:
Name Monitoring Tags
i-RBwPyvq8wPbUhn495 enabled "some:tags:with:colons=some:value:with:colons-and-dashes/and/slashes/yay606-values-001 some:other:tag:with-colons-and-hypens=MACHINE NAME Name=NAMETAG backup=true"
i-sMEwh2MXj3q47yWWP enabled "description=RANDOM BUSINESS INT01 backup=true Name=SOMENAME"
Desired Output:
Name Monitoring Tags
i-RBwPyvq8wPbUhn495 enabled "some:tags:with:colons=some:value:with:colons-and-dashes/and/slashes/yay606-values-001
some:other:tag:with-colons-and-hyphens=MACHINE NAME
Name=NAMETAG
backup=true"
i-sMEwh2MXj3q47yWWP enabled "description=RANDOM BUSINESS INT01
backup=true
Name=SOMENAME"
I can guarantee each key=value within those quotes are separated by hard/literal tabs, although it may not appear that way with how the StackOverflow code block is displayed in HTML they did carry over into the code block editor, the data under the column Tags is in quotes so that even though they are tab separated they stay within the Tags column. For whatever reason I'm not able to successfully get the desired results.
In my measly attempts, I've been basically capturing everything between the "" as if tabs aren't separated in my regex searches because of my use of wildcards [TAB].*=.*[TAB] is obviously not working because then I'm losing everything in between the first/last occurrence for each line. I've attempted storing them in capture groups without any success.
I'm looking for a unix toolset solution (sed, awk, perl and the like). Any/All help is appreciated!
This will work using any awk in any shell on any UNIX box:
$ awk 'match($0,/".*"/){str=substr($0,RSTART,RLENGTH); gsub(/\t/,"\n",str); $0=substr($0,1,RSTART-1) str substr($0,RSTART+RLENGTH)} 1' file
Name Monitoring Tags
i-RBwPyvq8wPbUhn495 enabled "some:tags:with:colons=some:value:with:colons-and-dashes/and/slashes/yay606-values-001
some:other:tag:with-colons-and-hypens=MACHINE NAME
Name=NAMETAG
backup=true"
i-sMEwh2MXj3q47yWWP enabled "description=RANDOM BUSINESS INT01
backup=true
Name=SOMENAME"
It just extracts a string between "s from the current record, replaces all tabs with newlines within that string, then puts the record back together before it's printed.
You can try this sed (GNU sed) 4.4
sed -E ':A;s/(".*)\t(.*")/\1\n\2/;tA' TSV_Data_File
With OSX sed, you can try this one.
I think the \t is ok.
sed -E '
:A
s/(".*)\t(.*")/\1\
\2/
tA
' TSV_Data_File
brief explain :
Catch the text inside "
Substitute the last \t by \n
If a substitution occur jump to A else continue
With awk :
awk -v RS='"' 'NR%2==0{gsub("\t","\n")}1' ORS='"' TSV_Data_File
This is basically ctac_'s awk answer converted to perl:
perl -pe'1 while s/(".*)\t(.*")/$1\n$2/s' file.tsv
Where the \t might be replaced by \t\s* if you want just one newline out of each tab-and-then-some.
This might work for you (GNU sed):
sed 's/\S\+=\S\+/\n&/2g' file
Insert a newline in before the second or more non-empty strings containing an =.

sed command to delete text until match is found for each line of a csv

I have a csv file and I am trying to delete all characters from the beginning of the line till it finds the first occurrence of "2015". I want to do this for each line in the csv file.
My csv file structure is as follows:
Field1 , Field2 , Field3 , Field4
sometext1 , 2015-07-15 , sometext2, sometext3
sometext1 , 2015-07-14 , sometext2, sometext3
sometext1 , 2015-07-13 , sometext2, sometext3
I cannot use the cut command or sed for the first occurrence of a comma because the text in the Field1 sometimes has commas in them too, which is making it complicated for parsing. I figured if I search for the first occurrence of the text 2015 for each line and replace all the preceding characters with nothing, then that should work.
FYI I only want to do this for the FIRST occurrence of 2015 only. There is another text field with 2015 in it within another column and I don't any text prior to that to be affected.
For example, if my original line is:
sometext1,#015,2015-07-10,sometext2,2015,sometext3
I want it to return:
2015-07-10,sometext2,2015,sometext3
Does anyone know the sed command to do this?
Any help will be appreciated!
Thanks
Here is a way to do it with sed assuming "#####" never occurs in a line:
sed -e 's/2015/#####&/'|sed -e 's/.*#####//'
For example:
> echo sometext1,#015,2015-07-10,sometext2,2015,sometext3\
|sed -e 's/2015/#####&/'|sed -e 's/.*#####//'
2015-07-10,sometext2,2015,sometext3
The first sed command prefixes "#####" to the first occurence of 2015 and the second sed command removes everything from the beginning to the end of the "#####" prefix.
The basic reason for using this two stage method is that sed's regular expression matcher has only greedy wildcards that always pick the longest match and does not support lazy matching which picks the shortest match.
If "#####" may occur in a line a more unlikely string could be substituted for it such as "7z#dNjm_wG8a3!esu#Rhv=".
To do this with sed without Perl-style non-greedy operators, you need to mark the first instance with something you know won't be in the line, as Tris describes. However, that solution requires knowledge of what won't be in the file. Fortunately, you can guarantee that a newline won't be in the line because that's what terminated the line. Thus you can do something like:
sed 's/2015/\n&/;s/.*\n//' input.txt > output.txt
NOTE: this won't modify the header row which you would have to treat specially.

Find text enclosed by patterns using sed

I have a config file like this:
[whatever]
Do I need this? no!
[directive]
This lines I want
Very much text here
So interesting
[otherdirective]
I dont care about this one anymore
Now I want to match the lines in between [directive] and [otherdirective] without matching [directive] or [otherdirective].
Also if [otherdirective] is not found all lines till the end of file should be returned. The [...] might contain any number or letter.
Attempt
I tried this using sed like this:
sed -r '/\[directive\]/,/\[[[:alnum:]+\]/!d
The only problem with this attempt is that the first line is [directive]and the last line is [otherdirective].
I know how to pipe this again to truncate the first and last line but is there a sed solution to this?
You can use the range, as you were trying, and inside it use // negated. When it's empty it reuses last regular expression matched, so it will skip both edge lines:
sed -n '/\[directive\]/,/\[otherdirective\]/ { //! p }' infile
It yields:
This lines I want
Very much text here
So interesting
Here is a nice way with awk to get section of data.
awk -v RS= '/\[directive\]/' file
[directive]
This lines I want
Very much text here
So interesting
When setting RS to nothing RS= it divides the file up in records based on blank line.
So when searching for [directive] it will print that record.
Normally a record is one line, but due to the RS (record selector) is change, it gives the block.
Okay damn after more tries I found the solution or merely one solution:
sed -rn '/\[buildout\]/,/\[[[:alnum:]]+\]/{
/\[[[:alnum:]]+\]/d
p }'
is this what you want?
\[directive\](.*?)\[
Look here

Delete all lines without # textmate regex

I have a huge file that I need to filter out all lines (comma delimited file) that do not contain an email address (determining that by # character).
Right now what I have is this to find all lines containing the # sign:
.*,.*,.*#.*,.*$
basically you have 4 values and the 3rd value has the email address.
the replace with: value would be empty.
You have about 10 different ways to do this in TextMate and even more from the command line. Here are some of the easier ways...
From TextMate:
Command-control-t, start typing some part of the command "Copy Non-Matching Lines into New Document", use # (nothing else) for the pattern.
Same as above, except the command you're looking for is "Distill Document / Selection"
Find and select an # symbol. Then do the same as the above but search for the command "Strip Lines Matching Selection/Clipboard". You may not have it as I may have developed this one myself.
From the command line:
Type one of the following commands, replacing FILE with the filename, including the filepath if it's not in your current working directory. The filtered content can be found in FILE-new.
Using egrep: egrep -v '#' FILE > FILE-new
Using sed: cat FILE | sed -e "/#/D" > FILE-new
For both of the above, use diff to see what you accomplished: diff FILE{,-new}
That should probably do, I'm guessing...
try replace ^[^#]*$ with nothing. Alternatively, grep the file with your regex and redirect the result into a new file.

How to remove nonnumeric junk from a file

Here's an output from less:
487451
487450<A3><BA>1<A3><BA>1
487449<A3><BA>1<A3><BA>1
487448<A3><BA>1<A3><BA>1
487447<A3><BA>1<A3><BA>1
487446<A3><BA>1<A3><BA>1
487445<A3><BA>1<A3><BA>1
484300<A3><BA>1<A3><BA>1
484299<A3><BA>1<A3><BA>1
484297<A3><BA>1<A3><BA>1
484296<A3><BA>1<A3><BA>1
484295<A3><BA>1<A3><BA>1
484294<A3><BA>1<A3><BA>1
484293<A3><BA>1<A3><BA>1
483496
483495
483494
483493
483492
483491
I see a bunch of nonprintable characters here. How do I remove them using sed/tr?
My try was 's/\([0-9][0-9]*\)/\1/g', but it doesn't work.
EDIT: Okay, let's go further down the source. The numbers are extracted from this file:
487451"><img src="Manage/pic/20100901/Adidas running-429.JPG" alt="Adidas running-429" height="120" border="0" class="BK01" onload='javascript:if(this.width>160){this.width=160}' /></a></td>
487450"><img src="Manage/pic/20100901/Adidas fs 1<A3><BA>1-060.JPG" alt="Adidas fs 1<A3><BA>1-060" height="120" border="0" class="BK01" onload='javascript:if(this.width>160){this.width=160}' /></a></td>
The first line is perfectly normal and what most of the lines are. The second is "corrupted". I'd just like to extract the number at the beginning (using 's/\([0-9][0-9]*\).*/\1/g', but somehow the nonprintables get into the regex, which should stop at ".
EDIT II: Here's a clarification: There are no brackets in the text file. These are character codes of nonprintable characters. The brackets are there because I copied the file from less. Mac's Terminal, on the other hand, uses ?? to represent such characters. I bet xterm on my Ubuntu would print that white oval with a question mark.
Classic job for either sed's or Unix's tr command.
sed 's/[^0-9]//g' $file
(Anything that is not a digit - or newline - is deleted.)
tr -cd '0-9\012' < $file > $file.1
Delete (-d) the complement (-c) of the digits and newline...
You missed the bit where you match the rest of the line.
sed 's/\([0-9][0-9]*\)[^0-9]*/\1/g'
^^^^^^^
Try this sed command:
sed 's/^\([0-9][0-9]*\).*$/\1/' file.txt
OUTPUT (running above command on the input file you provided)
487451
487450
487449
487448
487447
487446
487445
484300
484299
484297
484296
484295
484294
484293
483496
483495
483494
483493
483492
483491
If you know the crap will always be inside brackets, why not delete that crap?
sed 's/<[^>]*>//g'
EDIT: Thanks, Mike that makes sense. In that case, how about:
sed 's/([0-9]+).*/\1/g'
If the data always is like the sample, deleting from the less-than to the end of the line would work fine.
sed -i "s/<.*$//" file