Getting delimited substrings using sed+regexp - regex

I'm trying to get substrings using sed with regexp. I want to get the first and second "fields" delimited by ":".
To get the first field I used the following command, but don't know how to get the second field.
Command used to get the first field:
sed -r -n '1,2 s/([^:]+).*/\1/p' /etc/passwd
Input file (example):
root:x:0:0:root:/root:/bin/bash
daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin
Command's result:
root
daemon
But I tried do get the first ("root") and second ("x") fields (examples based on the file's first line only), but I did't succedded.
I tried:
sed -r -n '1,2 s/([^:]+).*([^:]+).*/\1 \2/p' /etc/passwd
Command's result:
root h
daemon n
Desired result:
root x
daemon x

sed uses greedy match. In
sed -r -n '1,2 s/([^:]+).*([^:]+).*/\1 \2/p' /etc/passwd
^^
.* matches as many characters as possible. You need
sed -r -n '1,2 s/([^:]+):([^:]+).*/\1 \2/p' /etc/passwd
^
Demo: http://ideone.com/wjL7Za.
By the way, a simpler way to do this is using cut:
cut -d ":" -f 1,2 --output-delimiter=' ' /etc/passwd
Demo: http://ideone.com/stJdSy.

Another expression that would return the desire result would be:
([a-z]+):([a-z]+).*
RegEx Demo
sed -r -n '1,2 s/([^:]+):([^:]+).*/\1 \2/p'
Sed Demo

Related

Delete any special character using Sed

I have yet another list of subdomain. I want to remove any Wildcard subdomain which include these special characters:
()!&$#*+?
Mostly, the data are prefixly random. Also, could be middle. Here's some sample of output data
(www.imgur.com
***************diet.blogspot.com
*-1.gbc.criteo.com
------------------------------------------------------------i.imgur.com
This has been quite an inconvenience while scanning through the list. As always, I'm trying sed to fix it:
sed -i "/[!()#$&?+]/d" foo.txt ###Didn't work
sed -i "/[\!\(\)\#\$\&\?\+]/d" ###Escaping char didn't work
Performing commands above still result in an unchanged list and the file still on original state. I'm thinking that; to fix this is to pipe series of sed command in order to remove it one by one:
cat foo.txt | sed -e "/!/d" -e "/#/d" -e "/\*/d" -e "/\$/d" -e "/(/d" -e "/)/d" -e "/+/d" -e "/\'/d" -e "/&/d" >> foo2.txt
cat foo.txt | sed -e "/\!/d" | sed -e "/\#/d" | sed -e "/\*/d" | sed -e "/\$/d" | sed -e "/\+/d" | sed -e "/\'/d" | sed -e "/\&/d" >> foo2.txt
If escaping all special char doesn't work, it must've been my false logic. Also tried with /g still doesn't increase my luck.
As a side note: I don't want - to be deleted as some valid subdomain can have - character:
line-apps.com
line-apps-beta.com
line-apps-rc.com
line-apps-dev.com
Any help would be cherished.
Using sed
$ sed '/[[:punct:]]/d' input_file
This should delete all lines with special characters, however, it would help if you provided sample data.
To do what you're trying to do in your answer (which adds [ and ] and more to the set of characters in your question) would be:
sed '/[][!?+,#$&*() ]/d'
or just:
grep -v '[][!?+,#$&*() ]'
Per POSIX to include ] in a bracket expression it must be the first character otherwise it indicates the end of the bracket expression.
Consider printing lines you want instead of deleting lines you do not want, though, e.g.:
grep '^[[:alnum:]_.-]$' file
to print lines that only contain letters, numbers, underscores, dashes, and/or periods.

Deleting everything between two string matches in a file

I got this text in file.txt:
Osmun.Prez#mail.com:c7lB2m6b#3.a.a:tt_webid_v2=6990226111024612869; tt_webid=6990226111024612869; tt_csrf_token=VD5Nb_TQFH4RKhoJeSe2nzLB; R6kq3TV7=AHkh4PB6AQAA3LIS90nWf2ss0Q7ZTCQjUat4axctvhQY68DdUEz92RwpmVSX|1|0|e9d6917c2fe555827dcf5ee916ba9778079ab2a9; ttwid=1%7CAFodeNF0iZM2fyy-ZeiZ6HTpZoG_MSx6SmXHgGVQ-V4%7C1627538859%7C59ca1e4a56f9f537b55e655a6dabff88e44eb48502b164ed6b4199f5a5263cb0; passport_csrf_token_default=6f7653c3ce946a6ce5444723fb0c509b; passport_csrf_token=6f7653c3ce946a6ce5444723fb0c509b; sid_guard=0483b7d37f4e4bd20ab3046e29724798%7C1627538893%7C5184000%7CMon%2C+27-Sep-2021+06%3A08%3A13+GMT; uid_tt=27b52febe6222486b9f6b6a90ef4ffeace5ea25c09d29a1583be5a1ecf760996; uid_tt_ss=27b52febe6222486b9f6b6a90ef4ffeace5ea25c09d29a1583be5a1ecf760996; sid_tt=0483b7d37f4e4bd20ab3046e29724798; sessionid=0483b7d37f4e4bd20ab3046e29724798; sessionid_ss=0483b7d37f4e4bd20ab3046e29724798; store-idc=maliva; store-country-code=us; odin_tt=294845c8f7711db177f7c549a9f44edb1555031b27a2a485df809cd92c4e544ac0772bf462df5b7a100f6e488c45303cd62df3b6b950f0842520cd887850137b035d990f29cc8b752765e594560c977f; cmpl_token=AgQQAPNSF-RMpbE89z5HYF0_-2PcrxjXf4fZYP5_ZA
How can I delete everything from the string inside ( first & only instance ) from :tt_ to _ZA in file.txt keeping only Osmun.Prez#mail.com:c7lB2m6b#3.a.a using bash linux?
Thank you
Something like:
sed -i "s/:tt_.*//" file.txt
if you want to edit the file in place. If not, remove the -i switch.
The sed command means: replace (s), in each line of file.txt, all the chars (.*) starting by the pattern :tt_ with an empty string (//).
Or the command:
sed -i "s/:tt_.*_ZA//" file.txt
which is more adherent to what you ask for, but returns the same output.
Use pattern substitution:
i=$(cat file.txt)
echo "${i/:tt*_ZA}"
Assuming the general requirement is to remove everything after the 2nd : ...
Sample data:
$ cat file.txt
Osmun.Prez#mail.com:c7lB2m6b#3.a.a:tt_webid_v ... to end of line
some.one#home.com:B52_m6b#9_az.more.stuff:delete from here ... to end of line
One sed idea:
$ sed -En 's/^([^:]*:[^:]*).*$/\1/p' file.txt
Osmun.Prez#mail.com:c7lB2m6b#3.a.a
some.one#home.com:B52_m6b#9_az.more.stuff
Using awk
awk 'BEGIN{FS=OFS=":"}{print $1,$2}'
Using : as the delimiter, it is easy to extract the columns before :tt
This deletes all chars from ":tt_" to the last "_ZA", inclusive, in file.txt
Mac_3.2.57$cat file.txt | sed 's/\(\)[:]tt.*_ZA\(.*\)/\1\2/'
Osmun.Prez#mail.com:c7lB2m6b#3.a.a
Mac_3.2.57$
Or if it is always the first 2 values which are separated by colon (as per you example)
cat file.txt | cut -f1,2 -dā€™:ā€™

How to modify three command (regular expressions) in one line to use in sed (instead of fnr.exe)

I'm using fnr.exe (https://findandreplace.codeplex.com).
My homework:
Value: String
Details: other_str
Number: xxx
Value: String
Details: other_str
Number: xxx
Value: String
Details: other_str
Number: xxx
Value: String
Details: other_str
Number: xxx
after using fnr.exe i have
String
xxx
String
xxx
String
xxx
String
xxx
Really good tool for find and replace in place a lot of files using ie. regex.
I tried today compare speed of this tool with sed in Cygwin.
Unfortunately my command not working in seed.
Don't know why. Can You help me?
For now in fnr.exe I'm using this command via terminal (and works well):
"C:\Test\soft\fnr.exe" --cl --dir "C:\Test" --fileMask "*.*" --caseSensitive --useRegEx --find "^(Value: )|^.*(Details).*$\s|^(Number.*: )" --replace ""
Then I tried with sed
sed -i 's/^(Value: )|^.*(Details).*$\s|^(Number.*: )/\1/' *.*
and not working...
:) :)
OK I FOUND A SOLUTION FOR THIS PROBLEM
HERE IS THE CODE FOR ALL FILES IN FOLDER:
sed -i -e '/Details/d' -e 's/Value: //g' -e 's/Number: //g' *.txt
Explanation - we want to remove second line and two strings "Value" and "Number" and save it in place in all text files or for one file ie. (example.txt instead of *.txt)
Remove second line
sed -i -e '/Details/d'
Remove string "Value"
-e 's/Value: //g'
Remove string "Number"
-e 's/Number: //g'
Speed between FNR.EXE and SED is tremendous!
Example.txt (few milions records)
FNR.EXE - 3 min
SED - 36 seconds :) :)
I wonder if awk or grep would be faster than sed?
Cheers!
Grep won't help here since you need to remove parts of lines.
You may use a POSIX ERE regex (enabled with -E option) with sed like
sed -i -E '/Details/d;s/(Value|Number): //g' *.txt
See the online sed demo.
However, awk is usually recommended:
awk '!/Details/{gsub(/(Value|Number): /, ""); print}' *.txt
See the online awk demo
Details
!/Details/ - if a line does not contain Details
gsub(/(Value|Number): /, "") (note the input for gsub is the whole line as the input variable is omitted) - remove Value: and Number: with a space after them and
print - print the modified line ($0).

Using sed for extracting substring from string

I just started using sed from doing regex. I wanted to extract XXXXXX from *****/XXXXXX> so I was following
sed -n "/^/*/(\S*\).>$/p"
If I do so I get following error
sed: 1: "/^//(\S).>$/p": invalid command code *
I am not sure what am I missing here.
Try:
$ echo '*****/XXXXXX>' | sed 's|.*/||; s|>.*||'
XXXXXX
The substitute command s|.*/|| removes everything up to the last / in the string. The substitute command s|>.*|| removes everything from the first > in the string that remains to the end of the line.
Or:
$ echo '*****/XXXXXX>' | sed -E 's|.*/(.*)>|\1|'
XXXXXX
The substitute command s|.*/(.*)>|\1| captures whatever is between the last / and the last > and saves it in group 1. That is then replaced with group 1, \1.
In my opinion awk performs better this task. Using -F you can use multiple delimiters such as "/" and ">":
echo "*****/XXXXXX>" | awk -F'/|>' '{print $1}'
Of course you could use sed, but it's more complicated to understand. First I'm removing the first part (delimited by "/") and after the second one (delimited by ">"):
echo "*****/XXXXXX>" | sed -e s/.*[/]// -e s/\>//
Both will bring the expected result: XXXXXX.
with grep if you have pcre option
$ echo '*****/XXXXXX>' | grep -oP '/\K[^>]+'
XXXXXX
/\K positive lookbehind / - not part of output
[^>]+ characters other than >
echo '*****/XXXXXX>' |sed 's/^.*\/\|>$//g'
XXXXXX
Start from start of the line, then proceed till lask / ALSO find > followed by EOL , if any of these found then replace it with blank.

unterminated `s' command, can't find my mistake

sudo wbinfo --group-info GROUPNAME| sed -r -e 's/(?:DOMAIN\\(\w+),?)|(?:[^]+:)/$1/g'
This command results in an
sed: -e expression #1, char 36: unterminated `s' command
The output of
sudo wbinfo --group-info GROUPNAME
is like
GROUPNAME:x:0123456789:DOMAIN\user1,DOMAIN\user2,DOMAIN\user3,...,DOMAIN\userN
I tried escaping all instances of ( with \(, \ with \\ (also \\ with \\\\)
sudo wbinfo --group-info GROUPNAME| sed -r -e s/'(?:DOMAIN\\(\w+),?)|(?:[^]+:)'/$1/g
(changed quoted area)
sudo wbinfo --group-info GROUPNAME| sed -r -e s/'(?:DOMAIN\\(\w+),?)|(?:[^]+:)/\1/g'
(\1 instead of $1)
I still don't know how to get what I need:
user1 user2 user3 ... userN
TL;TR
Your attempt is too complicated, you can simply use this:
sed -r 's/[^\]+DOMAIN\\([[:alnum:]]+)/\1 /g'
About the syntax error:
You are using sed -r which enables extended posix regular expressions. Note that in extended posix regular expressions the ? is used as a quantifier for optional repetition. You you need to escape it:
sed -r -e 's/(\?:DOMAIN\\(\w+),\?)|(\?:[^]+:)/$1/g'
However, there is still a problem left with the regex: you are using [^]. Note that the ^ when used in a character class, negates the match of that class. You are using the ^ but missed to say which characters should not matched. You need to put in something like:
sed -r -e 's/(\?:DOMAIN\\(\w+),\?)|(\?:[^abc]+:)/$1/g'
awk to the rescue!
$ ... | awk -F'\\\\' -v RS=, '{print $2}'
will give the result one user per line, if you want them to appear on a single line add ... | xargs
Here's another approach with sed:
sed -r -e 's/^.*://' -e 's/[^,]+\\//g' -e 's/,/ /g'
First remove all the stuff before the last colon in the line,
then remove all the domain parts (non-commas followed by a backslash),
then change commas to spaces.