sed or awk to capture part of url - regex

I am not very experienced with regular expressions and sed/awk scripting.
I have urls that are similar to the following torrent url:
http://torcache.net/torrent/D7249CD9AF321C8578B3A7007ABBDD63B0475EEB.torrent?title=[kickass.to]against.the.ropes.by.carly.fall.epub.torrent
I would like to have sed or awk script extract the text after the title i.e
from the example above just get:
[kickass.to]against.the.ropes.by.carly.fall.epub.torrent

A simple approach with awk: use the = as the field separator:
awk -F"=" '{print $2}'
Thus:
echo "http://torcache.net/torrent/D7249CD9AF321C8578B3A7007ABBDD63B0475EEB.torrent?title=[kickass.to]against.the.ropes.by.carly.fall.epub.torrent" | awk -F"=" '{print $2}'
[kickass.to]against.the.ropes.by.carly.fall.epub.torrent

Just remove everything before the title=: sed 's/.*title=//'
$ echo "http://torcache.net/torrent/D7249CD9AF321C8578B3A7007ABBDD63B0475EEB.torrent?title=[kickass.to]against.the.ropes.by.carly.fall.epub.torrent" | sed 's/.*title=//'
[kickass.to]against.the.ropes.by.carly.fall.epub.torrent

Let's say:
s='http://torcache.net/torrent/D7249CD9AF321C8578B3A7007ABBDD63B0475EEB.torrent?title=[kickass.to]against.the.ropes.by.carly.fall.epub.torrent'
Pure BASH solution:
echo "${s/*title=}"
[kickass.to]against.the.ropes.by.carly.fall.epub.torrent
OR using grep -P:
echo "$s"|grep -oP 'title=\K.*'
[kickass.to]against.the.ropes.by.carly.fall.epub.torrent

By using sed (no need to mention title in the regexp in your example) :
sed 's/.*=//'
An another solution exists with cut, another standard unix tool :
cut -d= -f2

Related

Using grep regex to select to first hyphen

echo "Linux/DEB/mainbinary-0.1.20190424165331-0-armdef.deb" | grep -oE "([^\/]+$)"
This prints just the filename, without the directory structure, but I cannot manage to print just mainbinary from that string. Suggestions?
And a sed alternative to PS.'s great grep -oP
echo "Linux/DEB/mainbinary-0.1.20190424165331-0-armdef.deb" |sed -r 's#^.*/([^-]+).*#\1#'
mainbinary
echo "Linux/DEB/mainbinary-0.1.20190424165331-0-armdef.deb" |grep -oP '.*/\K[^-]+'
mainbinary
This will scan till last / and ignore everything to its left and keep moving until - (excluding)
With any awk in any shell on any UNIX machine:
$ echo "Linux/DEB/mainbinary-0.1.20190424165331-0-armdef.deb" | awk -F'[/-]' '{print $3}'
mainbinary

How to cut a string from a string

My script gets this string for example:
/dir1/dir2/dir3.../importance/lib1/lib2/lib3/file
let's say I don't know how long the string until the /importance.
I want a new variable that will keep only the /importance/lib1/lib2/lib3/file from the full string.
I tried to use sed 's/.*importance//' but it's giving me the path without the importance....
Here is the command in my code:
find <main_path> -name file | sed 's/.*importance//
I am not familiar with the regex, so I need your help please :)
Sorry my friends I have just wrong about my question,
I don't need the output /importance/lib1/lib2/lib3/file but /importance/lib1/lib2/lib3 with no /file in the output.
Can you help me?
I would use awk:
$ echo "/dir1/dir2/dir3.../importance/lib1/lib2/lib3/file" | awk -F"/importance/" '{print FS$2}'
importance/lib1/lib2/lib3/file
Which is the same as:
$ awk -F"/importance/" '{print FS$2}' <<< "/dir1/dir2/dir3.../importance/lib1/lib2/lib3/file"
importance/lib1/lib2/lib3/file
That is, we set the field separator to /importance/, so that the first field is what comes before it and the 2nd one is what comes after. To print /importance/ itself, we use FS!
All together, and to save it into a variable, use:
var=$(find <main_path> -name file | awk -F"/importance/" '{print FS$2}')
Update
I don't need the output /importance/lib1/lib2/lib3/file but
/importance/lib1/lib2/lib3 with no /file in the output.
Then you can use something like dirname to get the path without the name itself:
$ dirname $(awk -F"/importance/" '{print FS$2}' <<< "/dir1/dir2/dir3.../importance/lib1/lib2/lib3/file")
/importance/lib1/lib2/lib3
Instead of substituting all until importance with nothing, replace with /importance:
~$ echo $var
/dir1/dir2/dir3.../importance/lib1/lib2/lib3/file
~$ sed 's:.*importance:/importance:' <<< $var
/importance/lib1/lib2/lib3/file
As noted by #lurker, if importance can be in some dir, you could add /s to be safe:
~$ sed 's:.*/importance/:/importance/:' <<< "/dir1/dirimportance/importancedir/..../importance/lib1/lib2/lib3/file"
/importance/lib1/lib2/lib3/file
With GNU sed:
echo '/dir1/dir2/dir3.../importance/lib1/lib2/lib3/file' | sed -E 's#.*(/importance.*)#\1#'
Output:
/importance/lib1/lib2/lib3/file
pure bash
kent$ a="/dir1/dir2/dir3.../importance/lib1/lib2/lib3/file"
kent$ echo ${a/*\/importance/\/importance}
/importance/lib1/lib2/lib3/file
external tool: grep
kent$ grep -o '/importance/.*' <<<$a
/importance/lib1/lib2/lib3/file
I tried to use sed 's/.*importance//' but it's giving me the path without the importance....
You were very close. All you had to do was substitute back in importance:
sed 's/.*importance/importance/'
However, I would use Bash's built in pattern expansion. It's much more efficient and faster.
The pattern expansion ${foo##pattern} says to take the shell variable ${foo} and remove the largest matching glob pattern from the left side of the shell variable:
file_name="/dir1/dir2/dir3.../importance/lib1/lib2/lib3/file"
file_name=${file_name##*importance}
Removeing the /file at the end as you ask:
echo '<path>' | sed -r 's#.*(/importance.*)/[^/]*#\1#'
Input /dir1/dir2/dir3.../importance/lib1/lib2/lib3/file
Returns: /importance/lib1/lib2/lib3
See this "Match groups" tutorial.

How to use sed to identify a string in brackets?

I want to find the string in that is placed with in the brackets. How do I use sed to pull the string?
# cat /sys/block/sdb/queue/scheduler
noop anticipatory deadline [cfq]
I'm not getting the exact result
# cat /sys/block/sdb/queue/scheduler | sed 's/\[*\]//'
noop anticipatory deadline [cfq
I'm expecting an output
cfq
It can be easier with grep, if it happens to be changing the position in which the text in between brackets is located:
$ grep -Po '(?<=\[)[^]]*' file
cfq
This is look-behind: whenever you find a string [, start fetching all the characters up to a ].
See another example:
$ cat a
noop anticipatory deadline [cfq]
hello this [is something] we want to [enclose] yeah
$ grep -Po '(?<=\[)[^]]*' a
cfq
is something
enclose
You can also use awk for this, in case it is always in the same position:
$ awk -F[][] '{print $2}' file
cfq
It is setting the field separators as [ and ]. And from that, prints the second one.
And with sed:
$ sed 's/[^[]*\[\([^]]*\).*/\1/g' file
cfq
It is a bit messy, but basically it is looking from the block of text in between [] and prints it back.
I found one possible solution-
cut -d "[" -f2 | cut -d "]" -f1
so the exact solution is
# cat /sys/block/sdb/queue/scheduler | cut -d "[" -f2 | cut -d "]" -f1
Another potential solution is awk:
s='noop anticipatory deadline [cfq]'
awk -F'[][]' '{print $2}' <<< "$s"
cfq
Another way by gnu grep :
grep -Po "\[\K[^]]*" file
with pure shell:
while read line; do [[ "$line" =~ \[([^]]*)\] ]] && echo "${BASH_REMATCH[1]}"; done < file
Another awk
echo 'noop anticipatory deadline [cfq]' | awk '{gsub(/.*\[|\].*/,x)}8'
cfq
perl -lne 'print $1 if(/\[([^\]]*)\]/)'
Tested here

bash, extract string before a colon

If I have a file with rows like this
/some/random/file.csv:some string
/some/random/file2.csv:some string2
Is there some way to get a file that only has the first part before the colon, e.g.
/some/random/file.csv
/some/random/file2.csv
I would prefer to just use a bash one liner, but perl or python is also ok.
cut -d: -f1
or
awk -F: '{print $1}'
or
sed 's/:.*//'
Another pure BASH way:
> s='/some/random/file.csv:some string'
> echo "${s%%:*}"
/some/random/file.csv
Try this in pure bash:
FRED="/some/random/file.csv:some string"
a=${FRED%:*}
echo $a
Here is some documentation that helps.
This works for me you guys can try it out
INPUT='ubuntu:x:1000:1000:Ubuntu:/home/ubuntu:/bin/bash'
SUBSTRING=$(echo $INPUT| cut -d: -f1)
echo $SUBSTRING
This has been asked so many times so that a user with over 1000 points ask for this is some strange
But just to show just another way to do it:
echo "/some/random/file.csv:some string" | awk '{sub(/:.*/,x)}1'
/some/random/file.csv
Another pure Bash solution:
while IFS=':' read a b ; do
echo "$a"
done < "$infile" > "$outfile"
You can try using basename with:
basename /some/random/file.csv:some :some

Regexp on nmap services file

I need to filter with sed only the ports from /usr/share/nmap/nmap-services
tcpmux 1/tcp 0.001995 # TCP Port Service Multiplexer [rfc-1078]
compressnet 2/tcp 0.000013 # Management Utility
compressnet 3/tcp 0.001242 # Compression Process
unknown 4/tcp 0.000477
unknown 6/tcp 0.000502
echo 7/tcp 0.004855
unknown 8/tcp 0.000013
discard 9/tcp 0.003764 # sink null
unknown 10/tcp 0.000063
systat 11/tcp 0.000075 # Active Users
I've tryed something like (!?([0-9]+/tcp))
But it wont work: why?
Thank you
Try doing this :
grep -oP '\d+(?=/(udp|tcp))' /usr/share/nmap/nmap-services
or with perl :
perl -lne 'print $& if m!\d+(?=/(udp|tcp))!' /usr/share/nmap/nmap-services
I use a positive look ahead advanced regex, see http://www.perlmonks.org/?node_id=518444
or with awk without advanced regex :
awk '{gsub("/.*", ""); print $2}' /usr/share/nmap/nmap-services
or
awk -F'[ /\t]' '{print $2}' /usr/share/nmap/nmap-services
Here's an example using AWK
cat /usr/share/nmap/nmap-services | awk '{print $2}' | awk -F\/ '{print $1}'
The simplest is so:
cut -s -d\ -f2 test
You can also do it so:
sed '/[^ ]* \([^ ]*\).*/ s::\1:; /^$/d' FILE
cut variant prints empty lines for non-matching.