Match a username after a keyword

Match a username after a keyword - regex

I am trying to match for a username in the format of DOMAIN\USERNAME after the first appearance of a keyword "Details:"
The following is a good sample for the text that I will be looking through.
Source: Product | Action: ADDUSER |
Administrator: domain\admin | Details: Alpha Snack Foods:
Added user domain\fuser to the group Viewers |
In this example I would want to return only "domain\fuser"
I have tried using (\bdomain\\.)\w+ but this returns both "domain\admin" and "niners\fuser"
I also tried using (?<=\buser\s)(\w+)but this only returned the second instance of domain.
So i feel like i am getting closeish here but I could use some help.
Thanks!

You can change your second attempt to allow the backslash character.
(?<=\buser\s)[\w\\]+
Or you can use \S which matches any non-whitespace characters.
(?<=\buser\s)\S+

Using perl :
$ perl -lne '/Added user (domain\\fuser) to the group/ and print $1' file
or
$ perl -lne '/Added user (\Qdomain\fuser\E) to the group/ and print $1' file
Output :
domain\fuser

Details:.+(domain\\[\w]+)
This finds "Details:" in the text, looks for all characters until domain\, then grabs the user name with domain.

Related

Use "sed" to Remove Capture Group 1 From All Lines In a File

I currently have a file with lines like the below:
ABCD123RTY,steve_tyler#gmail.com,10.20.30.142,2021-08-20T14:49:51.035Z
ABCD123QWE,thisguy#hotmail.com,10.20.30.245,2021-08-20T14:10:22.254Z
ABCD123DFG,calvin_hobbes2#netnet,10.20.30.l6,2021-08-20T15:30:34.480Z
My goal is to remove everything from the "#" to the next comma, such that it instead looks like the below:
ABCD123RTY,steve_tyler,10.20.30.142,2021-08-20T14:49:51.035Z
ABCD123QWE,thisguy,10.20.30.245,2021-08-20T14:10:22.254Z
ABCD123DFG,calvin_hobbes2,10.20.30.l6,2021-08-20T15:30:34.480Z
I'm not that experienced with utilizing sed and RegEx expressions. In playing around on a testing website, I came up with the below RegEx string, in which capture group 1 is perfectly matching to what I want to remove:
regex101.com Test
How would I go about putting this in a "sed" command against a given input file, and writing the results to a new output file. I had tried the below most recently:
sed 's/(#.+?),//' input.csv > input_Corrected.csv
Just as another note, I'm doing this in a bash script in which I have an API call generating the "input.csv" file, and then want to run this sed command to clean up the data format to match my needs.

You can use
sed 's/#[^,]*,/,/' input.csv > input_Corrected.csv
sed 's/#[^,]*//' input.csv > input_Corrected.csv
The #[^,]*, POSIX BRE pattern matches a # and then any zero or more chars other than , and then a , (in the first example, use it if there MUST be a comma after the match) and replaces with a comma (in the first example, keep the replacement empty if you use the second approach).
See the online demo:
s='ABCD123RTY,steve_tyler#gmail.com,10.20.30.142,2021-08-20T14:49:51.035Z
ABCD123QWE,thisguy#hotmail.com,10.20.30.245,2021-08-20T14:10:22.254Z
ABCD123DFG,calvin_hobbes2#netnet,10.20.30.l6,2021-08-20T15:30:34.480Z'
sed 's/#[^,]*,/,/' <<< "$s"
Output:
ABCD123RTY,steve_tyler,10.20.30.142,2021-08-20T14:49:51.035Z
ABCD123QWE,thisguy,10.20.30.245,2021-08-20T14:10:22.254Z
ABCD123DFG,calvin_hobbes2,10.20.30.l6,2021-08-20T15:30:34.480Z

You can used the below regular expression in order to remove the content of the valid email address only.
sed "s/#([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})//g" input.csv > input_Corrected.csv
And as per your requirement you can use the below code. As it is going to replace all the email address on the file as you have on your file "calvin_hobbes2#netnet" which is not valid email address.
sed "s/#[^,]*//g" input.csv > input_Corrected.csv

How do I remove a particular pattern with a number sequence sed

I'm very new to sed bash command, so trying to learn.
I'm currently faced with a few thousand markdown files i need to clean up and I'm trying to create a command that deletes part of the following
# null 864: Headline
body text
I need anything that come before the headline deleted which is '# null 864: '
it's allways: '# null ' then some digits ': '
I'm using gnu-sed because I'm using mac
The best I've come up with sofar is
gsed -i '/#\snull\s([1-9]|[1-9][0-9]|[1-9][0-9][0-9]|[1-9][0-9][0-9][0-9]):\s/d' *.md
The above does not seem to work?
however if I do
gsed -i '/#\snull/d' *.md
it does what I want, however it does some unintended stuff in the body test.
How do I control so only the headline and the body text remains?

Considering that you want to print values before headline and don't want to print any other lines, then try following.
sed -E -n 's/^(#\s+null\s+[0-9]+:\s+)Headline/\1/p' Input_file
In case you want to print value before Headline and if match is not found want to print that complete line then try following:
sed -E 's/^(#\s+null\s+[0-9]+:\s+)Headline/\1/' Input_file
Explanation: Simple using -E option of sed to enable ERE(extended regular expression), then using s option of sed to perform substitution here. matching # followed by space(s) null followed by space(s) digits colon and space(s) and keeping it in 1st capturing group, while substitution, substituting it with 1st capturing group.
NOTE: Above commands will print values on terminal, in case you want to save them inplace then use -i option once you are satisfied with above code's output.

If I'm understanding correctly, you have files like this:
This should get deleted
This should too.
# null 864: Headline
body text
this should get kept
You want to keep the headline, and everything after, right? You can do this in awk:
awk '/# null [0-9]+:/,eof {print}' foo.md

You might use awk, and replace the # null 864: part with an empty string using sub.
See this page to either create a new file, or to overwrite the same file.
The }1 prints the whole line as 1 evaluates to true.
awk '{sub(/^# null [0-9]+:[[:blank:]]+/,"")}1' file
The pattern matches
^# null Match literally from the start of the string
[0-9]+:[[:blank:]]+ match 1+ digits, then : and 1+ spaces
Output
Headline
body text

On a mac ed should be installed by default so.
The content of script.ed
g/^# null [[:digit:]]\{1,\}: Headline$/s/^.\{1,\}: //
,p
Q
for file in *.md; do ed -s "$file" < ./script.ed; done
If the output is ok, remove the ,p and change the Q to w so it can edit the file in-place
g/^# null [[:digit:]]\{1,\}: Headline$/s/^.\{1,\}: //
w
Run the loop again.

I'd use a range in sed same as Andy Lester's awk solution.
Borrowing his infile,
$: cat tst.md
This should get deleted
This should too.
# null 864: Headline
body text
this should get kept
$: sed -Ein '/^# null [0-9]+:/,${p;d};d;' tst.md
$: cat tst.md
# null 864: Headline
body text
this should get kept

Using sed (or any other tool) to remove the quotes in a json file

I have a json file
{"doc_type":"user","requestId":"1000778","clientId":"42114"}
I want to change it to
{"doc_type":"user","requestId":1000778,"clientId":"42114"}
i.e. convert the requestId from String to Integer. I have tried some ways, but none seem to work :
sed -e 's/"requestId":"[0-9]"/"requestId":$1/g' test.json
sed -e 's/"requestId":"\([0-9]\)"/"requestId":444/g' test.json
Could someone help me out please?

Try
sed -e 's/\("requestId":\)"\([0-9]*\)"/\1\2/g' test.json
or
sed -e 's/"requestId":"\([0-9]*\)"/"requestId":\1/g' test.json
The main differences with your attempts are:
Your regular expressions were looking for [0-9] between double quotes, and that's a single digit. By using [0-9]* instead you are looking for any number of digits (zero or more digits).
If you want to copy a sequence of characters from your search in your replacing string, you need to define a group with a starting \( and a final \) in the regexp, and then use \1 in the replacing string to insert the string there. If there are multiple groups, you use \1 for the first group, \2 for the second group, and so on.
Also note that the final g after the last / is used to apply this substitution in all matches, in every processed line. Without that g, the substitution would only be applied to the first match in every processed line. Therefore, if you are only expecting one such replacement per line, you can drop that g.

Since you said "or any other tool", I'd recommend jq! While sed is great for line-based, JSON is not and sometimes newlines are added in just for pretty printing the output to make developers' lives easier. It's rules also get even more tricky when handling Unicode or double-quotes in string content. jq is specifically designed to understand the JSON format and can dissect it appropriately.
For your case, this should do the job:
jq '.requestId = (.requestId | tonumber)'
Note, this will throw an error if requestId is missing and not output the JSON object. If that's a concern, you might need something a little more sophisticated like this example:
jq 'if has("requestId") then .requestId = (.requestId | tonumber) else . end'
Also, jq does pretty-print and colorize it's output if sent to a terminal. To avoid that and just see a compact, one-line-per-object format, add -Mc to the command. jq will also work if provided multiple objects back-to-back without a newline in the input. Here's a full-demo to show this filter:
$ (echo '{"doc_type":"bare"}{}'
echo '{"doc_type":"user","requestId":"0092","clientId":"11"}'
echo '{"doc_type":"user","requestId":"1000778","clientId":"42114"}'
) | jq 'if has("requestId") then .requestId = (.requestId | tonumber) else . end' -Mc
Which produced this output:
{"doc_type":"bare"}
{}
{"doc_type":"user","requestId":92,"clientId":"11"}
{"doc_type":"user","requestId":1000778,"clientId":"42114"}

sed -e 's/"requestId":"\([0-9]\+\)"/"requestId":\1/g' test.json
You were close. The "new" regex terms I had to add: \1 means "whatever is contained in the first \( \) on the "search" side, and \+ means "1 or more of the previous thing".
Thus, we search for the string "requestId":" followed by a group of 1 or more digits, followed by ", and replace it with "requestId": followed by that group we found earlier.

Perhaps the jq (json query) tool would help you out?
$ cat test
{"doc_type":"user","requestId":"1000778","clientId":"42114"}
$ cat test |jq '.doc_type' --raw-output
user
$

How can I use sed to regex string and number in bash script

I want to separate string and number in a file to get a specific number in bash script, such as:
Branches executed:75.38% of 1190
I want to only get number
75.38
. I have try like the code below
$new_value=value | sed -r 's/.*_([0-9]*)\..*/\1/g'
but it was incorrect and it was failed.
How should it works? Thank you before for your help.

You can use the following regex to extract the first number in a line:
^[^0-9]*\([0-9.]*\).*$
Usage:
% echo 'Branches executed:75.38% of 1190' | sed 's/^[^0-9]*\([0-9.]*\).*$/\1/'
75.38

Give this a try:
value=$(sed "s/^Branches executed:\([0-9][.0-9]*[0-9]*\)%.*$/\1/" afile)
It is assumed that the line appears only once in afile.
The value is stored in the value variable.

There are several things here that we could improve. One is that you need to escape the parentheses in sed: \(...\)
Another one is that it would be good to have a full specification of the input strings as well as a good script that can help us to play with this.
Anyway, this is my first attempt:
Update: I added a little more bash around this regex so it'll be more easy to play with it:
value='Branches executed:75.38% of 1190'
new_value=`echo $value | sed -e 's/[^0-9]*\([0-9]*\.[0-9]*\).*/\1/g'`
echo $new_value
Update 2: as john pointed out, it will match only numbers that contain a decimal dot. We can fix it with an optional group: \(\.[0-9]\+\)?.
An explanation for the optional group:
\(...\) is a group.
\(...\)? Is a group that appears zero or one times (mind the question mark).
\.[0-9]\+ is the pattern for a dot and one or more digits.
Putting all together:
value='Branches executed:75.38% of 1190'
new_value=`echo $value | sed -e 's/[^0-9]*\([0-9]\+\(\.[0-9]\+\)\?\).*/\1/g'`
echo $new_value

Regular Expression to parse Common Name from Distinguished Name

I am attempting to parse (with sed) just First Last from the following DN(s) returned by the DSCL command in OSX terminal bash environment...
CN=First Last,OU=PCS,OU=guests,DC=domain,DC=edu
I have tried multiple regexs from this site and others with questions very close to what I wanted... mainly this question... I have tried following the advice to the best of my ability (I don't necessarily consider myself a newbie...but definitely a newbie to regex..)
DSCL returns a list of DNs, and I would like to only have First Last printed to a text file. I have attempted using sed, but I can't seem to get the correct function. I am open to other commands to parse the output. Every line begins with CN= and then there is a comma between Last and OU=.
Thank you very much for your help!

I think all of the regular expression answers provided so far are buggy, insofar as they do not properly handle quoted ',' characters in the common name. For example, consider a distinguishedName like:
CN=Doe\, John,CN=Users,DC=example,DC=local
Better to use a real library able to parse the components of a distinguishedName. If you're looking for something quick on the command line, try piping your DN to a command like this:
echo "CN=Doe\, John,CN=Users,DC=activedir,DC=local" | python -c 'import ldap; import sys; print ldap.dn.explode_dn(sys.stdin.read().strip(), notypes=1)[0]'
(depends on having the python-ldap library installed). You could cook up something similar with PHP's built-in ldap_explode_dn() function.

Two cut commands is probably the simplest (although not necessarily the best):
DSCL | cut -d, -f1 | cut -d= -f2
First, split the output from DSCL on commas and print the first field ("CN=First Last"); then split that on equal signs and print the second field.

Using sed:
sed 's/^CN=\([^,]*\).*/\1/' input_file
^ matches start of line
CN= literal string match
\([^,]*\) everything until a comma
.* rest

http://www.gnu.org/software/gawk/manual/gawk.html#Field-Separators
awk -v RS=',' -v FS='=' '$1=="CN"{print $2}' foo.txt

I like awk too, so I print the substring from the fourth char:
DSCL | awk '{FS=","}; {print substr($1,4)}' > filterednames.txt

This regex will parse a distinguished name, giving name and val a capture groups for each match.
When DN strings contain commas, they are meant to be quoted - this regex correctly handles both quoted and unquotes strings, and also handles escaped quotes in quoted strings:
(?:^|,\s?)(?:(?<name>[A-Z]+)=(?<val>"(?:[^"]|"")+"|[^,]+))+
Here is is nicely formatted:
(?:^|,\s?)
(?:
(?<name>[A-Z]+)=
(?<val>"(?:[^"]|"")+"|[^,]+)
)+
Here's a link so you can see it in action:
https://regex101.com/r/zfZX3f/2
If you want a regex to get only the CN, then this adapted version will do it:
(?:^|,\s?)(?:CN=(?<val>"(?:[^"]|"")+"|[^,]+))

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Match a username after a keyword - regex

You can change your second attempt to allow the backslash character. (?<=\buser\s)[\w\\]+ Or you can use \S which matches any non-whitespace characters. (?<=\buser\s)\S+

Using perl : $ perl -lne '/Added user (domain\\fuser) to the group/ and print $1' file or $ perl -lne '/Added user (\Qdomain\fuser\E) to the group/ and print $1' file Output : domain\fuser

Details:.+(domain\\[\w]+) This finds "Details:" in the text, looks for all characters until domain\, then grabs the user name with domain.

Related

Use "sed" to Remove Capture Group 1 From All Lines In a File

How do I remove a particular pattern with a number sequence sed

Using sed (or any other tool) to remove the quotes in a json file

How can I use sed to regex string and number in bash script

Regular Expression to parse Common Name from Distinguished Name

Categories

Resources