sed while loop text format - regex

i need to change some lines i another format using this while loop.
while IFS= read -r line;
do
var=$(echo "$line" | grep -oE "http://img[0-9].domain.xy/t/[0-9][0-9][0-9]/[0-9][0-9][0-9]/" | uniq);
echo "$line" | sed -e 's|http://img[0-9].domain.xy/t/[0-9][0-9][0-9]/[0-9][0-9][0-9]/||g' -e "s|.*|&"${var}"|g" >> newFile;
done < file;
that changes this format
<iframe src="http://domain.xy/load.php?file=2259929" frameborder="0" scrolling="no"></iframe>|http://img9.domain.xy/t/929/320/1_2259929.jpg;http://img9.domain.xy/t/929/320/2_2259929.jpg;http://img9.domain.xy/t/929/320/3_2259929.jpg;http://img9.domain.xy/t/929/320/4_2259929.jpg;http://img9.domain.xy/t/929/320/5_2259929.jpg;http://img9.domain.xy/t/929/320/6_2259929.jpg;http://img9.domain.xy/t/929/320/7_2259929.jpg;http://img9.domain.xy/t/929/320/8_2259929.jpg;http://img9.domain.xy/t/929/320/9_2259929.jpg;http://img9.domain.xy/t/929/320/10_2259929.jpg|13m5s
and gives me that output.
<iframe src="http://domain.xy/load.php?file=2259929" frameborder="0" scrolling="no"></iframe>|1_2259929.jpg;2_2259929.jpg;3_2259929.jpg;4_2259929.jpg;5_2259929.jpg;6_2259929.jpg;7_2259929.jpg;8_2259929.jpg;9_2259929.jpg;10_2259929.jpg|13m5s|http://img9.domain.xy/t/929/320/
that all works correct!!!
but there is also a time value that i want to change. 13m5s to 00:13:5 or better else 13m5s to 00:13:05
i try to use another grep + sed command at the end of the loop.
while IFS= read -r line;
do
var=$(echo "$line" | grep -oE "http://img[0-9].domain.xy/t/[0-9][0-9][0-9]/[0-9][0-9][0-9]/" | uniq);
echo "$line" | sed -e 's|http://img[0-9].domain.xy/t/[0-9][0-9][0-9]/[0-9][0-9][0-9]/||g' -e "s|.*|&"${var}"|g" >> newFile;
done < file;
grep -oE "[0-9]*m[0-9]*[0-9]s" newFile | sed -e 's|^|00:|' -e s'|m|:|' -e s'|s||'
this gives me only the output of the numbers not the full line.
00:13:5
00:3:18
00:1:50
and so on
how can i get the full line and just change 13m5s to 00:13:5 ?
if just use sed after the while loop without grep it changes the wrong letters. and puts 00: at the begin of every line.
what is the best way to handle that. i think its be the best to integrate the command in the existing loop. but i have try many differnt variations witout a result.
thx for helping
thx

I broke your code apart in to a few additional pieces to make understanding what was going on easier. Here's the result which I believe is correct:
# Read each field in to separate variables
while IFS='|' read iframe urls time; do
# Get the first URL from the ';'-separated list
url="${urls%%;*}"
# Get the base URL by matching up to the last '/' (and add it back since the match is exclusive)
base_url="${url%/*}"'/'
# Remove the base URL from the list of full URLs so only the filenames are left
files="${urls//$base_url/}"
# Parse the minute and second out from the '##m#s' string
IFS='ms' read min sec <<<"$time"
# Print the new line - note the numeric formatting in the third column
printf '%s|%s|00:%02d:%02d|%s\n' "$iframe" "$files" "$min" "$sec" "$base_url"
done <file
The lines that answer your specific request about how to turn 13m5s in to 00:13:05 are these two:
IFS='ms' read min sec <<<"$time"
printf '%s|%s|00:%02d:%02d|%s\n' "$iframe" "$files" "$min" "$sec" "$base_url"
The read line uses IFS to tell it to split on the characters m or s making it able to easily read the minute and second variables.
The printf line, with 00:%02d:%02d specifically, formats the $min and $sec variables as zero-padded two-digit numbers.

grep only outputs the lines that match the expression. Use sed's built-in line matching to restrict the substitution to certain lines:
sed '/[0-9]*m[0-9]*[0-9]s/{s|^|00:|;s|m|:|;s'|s||;}'
or maybe this:
sed 's/\([0-9]*\)m\([0-9]*[0-9]\)s/00:\1:\2/'

Related

How can I get my Perl one-liner to show only the first regex match in the file?

I have a file with this format:
KEY1="VALUE1"
KEY2="VALUE2"
KEY1="VALUE2"
I need a perl command to only get first occurrence of KEY1, ie VALUE1.
I'm using this command:
perl -ne 'print "$1" if /KEY1="(.*?)"/' myfile
But the result is:
VALUE1VALUE2
EDIT
The solution must be with perl command, because the system there is no other regex tool.
Add and last to your one-liner like so (extra quotes removed):
perl -ne 'print $1 and last if /KEY1="(.*?)"/' myfile
This works because -n switch effectively wraps your code in a while loop. Thus, if the pattern matches, print is executed, which succeeds and thus causes last to be executed. This exits the while loop.
You can also use the more verbose last LINE, which specifies the (implicit) label of the while loop that iterates over the input lines. This last form is useful for more complex code than you have here, such as the code involving nested loops.
You can exit after printing first match:
perl -ne '/KEY1="([^"]*)"/ && print ($1 . "\n") && exit' file
VALUE1
You can also use sed:
sed -nE 's/^KEY1="(.*)"/\1/p;q' file
The p;q means 'print' then 'quit'
For registration only, thanks to #Andy Lester's comment I also found a simple way to solve the problem with grep and cut, without the need for regex:
grep -a -m1 'KEY1' file | cut -d "\"" -f2
return
VALUE1

Sed : print all lines after match

I got my research result after using sed :
zcat file* | sed -e 's/.*text=\(.*\)status=[^/]*/\1/' | cut -f 1 - | grep "pattern"
But it only shows the part that I cut. How can I print all lines after a match ?
I'm using zcat so I cannot use awk.
Thanks.
Edited :
This is my log file :
[01/09/2015 00:00:47] INFO=54646486432154646 from=steve idfrom=55516654455457 to=jone idto=5552045646464 guid=100021623456461451463 n
um=6 text=hi my number is 0 811 22 1/12 status=new survstatus=new
My aim is to find all users that spam my site with their telephone numbers (using grep "pattern") then print all the lines to get all the information about each spam. The problem is there may be matches in INFO or id, so I use sed to get the text first.
Printing all lines after a match in sed:
$ sed -ne '/pattern/,$ p'
# alternatively, if you don't want to print the match:
$ sed -e '1,/pattern/ d'
Filtering lines when pattern matches between "text=" and "status=" can be done with a simple grep, no need for sed and cut:
$ grep 'text=.*pattern.* status='
You can use awk
awk '/pattern/,EOF'
n.b. don't be fooled: EOF is just an uninitialized variable, and by default 0 (false). So that condition cannot be satisfied until the end of file.
Perhaps this could be combined with all the previous answers using awk as well.
Maybe this is what you actually want? Find lines matching "pattern" and extract the field after text= up through just before status=?
zcat file* | sed -e '/pattern/s/.*text=\(.*\)status=[^/]*/\1/'
You are not revealing what pattern actually is -- if it's a variable, you cannot use single quotes around it.
Notice that \(.*\)status=[^/]* would match up through survstatus=new in your example. That is probably not what you want? There doesn't seem to be a status= followed by a slash anywhere -- you really should explain in more detail what you are actually trying to accomplish.
Your question title says "all line after a match" so perhaps you want everything after text=? Then that's simply
sed 's/.*text=//'
i.e. replace up through text= with nothing, and keep the rest. (I trust you can figure out how to change the surrounding script into zcat file* | sed '/pattern/s/.*text=//' ... oops, maybe my trust failed.)
The seldom used branch command will do this for you. Until you match, use n for next then branch to beginning. After match, use n to skip the matching line, then a loop copying the remaining lines.
cat file | sed -n -e ':start; /pattern/b match;n; b start; :match n; :copy; p; n ; b copy'
zcat file* | sed -e 's/.*text=\(.*\)status=[^/]*/\1/' | ***cut -f 1 - | grep "pattern"***
instead change the last 2 segments of your pipeline so that:
zcat file* | sed -e 's/.*text=\(.*\)status=[^/]*/\1/' | **awk '$1 ~ "pattern" {print $0}'**

Bash - how to put each line within quotation

I want to put each line within quotation marks, such as:
abcdefg
hijklmn
opqrst
convert to:
"abcdefg"
"hijklmn"
"opqrst"
How to do this in Bash shell script?
Using awk
awk '{ print "\""$0"\""}' inputfile
Using pure bash
while read FOO; do
echo -e "\"$FOO\""
done < inputfile
where inputfile would be a file containing the lines without quotes.
If your file has empty lines, awk is definitely the way to go:
awk 'NF { print "\""$0"\""}' inputfile
NF tells awk to only execute the print command when the Number of Fields is more than zero (line is not empty).
I use the following command:
xargs -I{lin} echo \"{lin}\" < your_filename
The xargs take standard input (redirected from your file) and pass one line a time to {lin} placeholder, and then execute the command at next, in this case a echo with escaped double quotes.
You can use the -i option of xargs to omit the name of the placeholder, like this:
xargs -i echo \"{}\" < your_filename
In both cases, your IFS must be at default value or with '\n' at least.
This sed should work for ignoring empty lines as well:
sed -i.bak 's/^..*$/"&"/' inFile
or
sed 's/^.\{1,\}$/"&"/' inFile
Use sed:
sed -e 's/^\|$/"/g' file
More effort needed if the file contains empty lines.
I think the sed and awk are the best solution but if you want to use just shell here is small script for you.
#!/bin/bash
chr="\""
file="file.txt"
cp $file $file."_backup"
while read -r line
do
echo "${chr}$line${chr}"
done <$file > newfile
mv newfile $file
paste -d\" /dev/null your-file /dev/null
(not the nicest looking, but probably the fastest)
Now, if the input may contain quotes, you may need to escape them with backslashes (and then escape backslashes as well) like:
sed 's/["\]/\\&/g; s/.*/"&"/' your-file
This answer worked for me in mac terminal.
$ awk '{ printf "\"%s\",\n", $0 }' your_file_name
It should be noted that the text in double quotes and commas was printed out in terminal, the file itself was unaffected.
I used sed with two expressions to replace start and end of line, since in my particular use case I wanted to place HTML tags around only lines that contained particular words.
So I searched for the lines containing words contained in the bla variable within the text file inputfile and replaced the beginnign with <P> and the end with </P> (well actually I did some longer HTML tagging in the real thing, but this will serve fine as example)
Similar to:
$ bla=foo
$ sed -e "/${bla}/s#^#<P>#" -e "/${bla}/s#\$#</P>#" inputfile
<P>foo</P>
bar
$

remove part of string from each line

I have a text file, where each line is a single string of the format
/home/usr1/284.txt
The whole file is like
/home/usr1/284.txt
/home/usr1/361.txt
What I want is to remove /home/usr1/ and keep the file name, e.g., 284.txt
How to do that using linux/unix command?
sed -e 's!/home/usr1/!!' filename.txt
or
awk -F\/ {print $NF} filename.txt
should do the trick. Note the use of ! instead of the more usual / as pattern delimiters in the sed example - it means you don't have to escape literal / characters in your pattern.
Since the fields in the file are fixed, you can simply do:
cut -b 12-
To skip the first 11 bytes of the input.
You could also use Perl, like so:
perl -pe 's,.*/,,' file.txt
Try this:
while read line; do basename "$line"; done < filename
The reciprocal of basename is dirname, in case you need the other part eventually.
Got bash?
read -d '' -a lines < input.txt
echo "${lines[#]##*/}"
There's more than one way to do it (TIMTOWTDI). In addition to the existing answers, I can think of two ways of going about this:
Use slash "/" as a field delimiter: reverse, cut the first field, then reverse again:
< filename.txt rev | cut -d/ -f 1 | rev
These are filenames, hence you can use GNU basename in combination with xargs (or GNU parallel) to, as you say, "keep the file name":
< filename.txt xargs basename -a
or
< filename.txt parallel -X basename -a

Cut out number in files named after a pattern

I've got the following files:
create_file_1.sql
create_file_2.sql
create_file_3.sql
create_file_4.sql
I'm iterating those files in a loop.
Now I want to get the number inside those files. I want to store the 1, 2, 3, … inside a variable in the loop.
How can I achieve this? How can I cut out this number?
P. S.: I want to achieve this with an AIX command.
Using sed:
[jaypal:~/Temp] echo "create_file_1.sql" | sed 's/.*_\([0-9]\+\)\.sql/\1/'
1
Using bash:
[jaypal:~/Temp] var="create_file_1.sql"
[jaypal:~/Temp] tmp=${var%.*} # Removes the extension
[jaypal:~/Temp] var=${tmp##*_} # Removes portion till the last underscore
[jaypal:~/Temp] echo $var
1
Using awk:
[jaypal:~/Temp] echo "create_file_1.sql" | awk -v FS="[_.]" '{print $(NF-1)}'
1
Well ... It depends on how flexible you want it to be. If you can assume that the number is "the part between the second underscore and the first period after the second underscore", you can simply use:
NUMBER=$(echo $FILENAME | cut -d_ -f3 | cut -d. -f1)
assuming that $FILENAME holds the current filename, of course.
This uses cut to first take the string after the second underscore, then cutting that by taking the string leading up to the first period.
This, admittedly, does not use regular expressions which maybe you want based on your tags, but I find the above a bit easier to read for a simple case like this.
for filename in create_file_1.sql create_file_2.sql create_file_3.sql create_file_4.sql
do
i=$(echo $filename | cut -d_ -f3 | cut -d. -f1)
# do something with $i
done
If the only number in the file name is the one that you want to get this will also works
for filename in create_file_1.sql create_file_2.sql create_file_3.sql create_file_4.sql ; do
number=`echo $filename | grep [0-9]* -o`
done