regex to exclude string and delete line - regex

I have the following lines in an XML file
<User id="10338" directoryId="1" sometext txt text test/>
<User id="10359" directoryId="100" some more text text text/>
<User id="103599" directoryId="100" some more text text text/>
<User id="10438" directoryId="1" sometext txt text test/>
I am trying to remove any lines that start with User id=" but I want to keep the ones that have directoryId="1"
my current sed command is
sed -i '' '/<User id="/d' file.xml
I have looked at A regular expression to exclude a word/string and a few other stack overflow posts but not able to get this to work. Please can someone help me write the regex. I essentially need to delete any lines that start with <User id= but excluding the ones where directoryId="1"

You can use
sed -i '' -e '/directoryId="1"/b' -e '/<User id="/d' file.xml
With this sed command,
/directoryId="1"/b skips the lines containing directoryId="1" and
/<User id="/d deletes the other lines that contain <User id=".
See an online demo.

Related

Replace tags surrounding string only if string contains match

I have a file with many lines containing strings surrounded by tags.
<tag:identifier>99454</tag:identifier>
<tag:identifier>97817(web)</tag:identifier>
<tag:identifier>http://www.google.com</tag:identifier>
<tag:title>Title String/</tag:title>
<tag:creator>Example</tag:creator>
<tag:creator>Field</tag:creator>
<tag:creator>Country</tag:creator>
I am trying to find a way to change the tags around each URL. They all start with <tag:identifier>http, so finding which lines contain URLs isn't an issue, I just don't know how I can replace the ending tag too. For example, to <tag:url>http://www.google.com</tag:url>
What tool can I use to do this?
You can try this sed
sed -E '/http/ {s/identifier/url/g}' $file
This will match any line with http and will then substitute identifier for url
You can also use this awk
awk -F"[<>]" '$3~/http/{$2="<tag:url>"; $4="</tag:url>"}1' $file
Here, we set the delimiter to < or > and replace the value of columns 2 and 4
Output
<tag:identifier>99454</tag:identifier>
<tag:identifier>97817(web)</tag:identifier>
<tag:url>http://www.google.com</tag:url>
<tag:title>Title String/</tag:title>
<tag:creator>Example</tag:creator>
<tag:creator>Field</tag:creator>
<tag:creator>Country</tag:creator>
When you might have an url like http://www.identifier.com you can match every part of the line.
sed -r 's#<(tag:identifier)>(.*)</\1>#<tag:url>\2</tag:url>#' file

Grepping for a pattern followed by another pattern and excluding what lies inbetween as ouput

I want to do something like
egrep -o '(mon|tues)[1-3]?[0-9].*(mon|tues)[1-3]?[0-9]'
And only get what isn't found by the (mon|tues)[1-3]?[0-9]
With this as input
mon19hellotues20
mon19world
hellomon19
tues8worldtues22
I want
mon19tues20
tues8tues22
As output
sed is better tool for this to print certain matched txt in output:
sed -nE 's/(mon|tues)([1-3]{0,1}[0-9]).*(mon|tues)([1-3]{0,1}[0-9])/\1\2\3\4/p' file
mon19tues20
tues8tues22

extract pattern using powershell script

My bad, I have updated the question-its using Powershell
my file contains 1000s of lines like below:
<dependency org="${abcd}" name="some-random-name" rev="100.100" conf="compile;runtime"/>
I would like to get only the output like:
name="some-random-name"
how can i achieve this. please help
This probably will solve your issue:
cat <file> | grep -oP 'name="[\w-]*"'
Explaining:
grep is the tool that print lines matching a pattern
-o option will print only the matching parts
-P option will use Perl-style regex in order to allow the \w metacharacter.
[\w-]* will match any string containing only 'word' characters or dash with size >= 0

sed regular expression query

To change the Tomcat password in the following line
<user username="user1" password="tomcat" roles="tomcat"/>
I'm trying
sed -i s/'username=\"user1\" password=\".*\"'/'username=\"user1\" password=\"NEWPASS\"'/g tomcat-users.xml
but the resulting line will be
<user username="user1" password="NEWPASS"/>
How do I change the regular expression to not cut off the last attribute?
I want it to look like
<user username="user1" password="NEWPASS" roles="tomcat"/>
Try the following substitution:
sed -i 's/password="[^"]*"/password="NEWPASS"/' tomcat-users.xml
If you want to do the substitution only in the line corresponding to user1, specify a regexp address:
sed -i '/username="user1"/ s/password="[^"]*"/password="NEWPASS"/' tomcat-users.xml
Explanation:
# for lines that match
# this regexp...
# ↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓
'/username="user1"/ s/password="[^"]*"/password="NEWPASS"/'
# ↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑
# ...execute this command.
You could try the below sed command,
$ sed 's~\(<user username="user1" password="\)[^"]*~\1NEWPASS~g' file
<user username="user1" password="NEWPASS" roles="tomcat"/>
Presumably, you only want to do this for lines that contain username="user1". So, you could do
sed -i '/username="user1"/{s/password="[^"]*"/password="NEWPASS"/}
The idea is to find any lines that contain username="user1" and run the script only on those. This is achieved by the /foo/{...} syntax. Then, you identify the password by looking for the longest stretch of non-" after password=" and replace the whole thing with the new version.
There is no need for g, that makes sed replace all occurrences in any single line ans is unnecessary here.
Less is more. Replace the g with a 1
sed -i s/tomcat/NEWPASS/1 tomcat-users.xml

How to remove commas in the middle of a specific line in a file in linux/unix

Someone was trying to be helpful in their test description. However, they added commas to the description so that when the test description is outputted to the log file, the results have extra commas. This makes it difficult to parse the results since the number of commas vary in the results file.
I want to use sed and go into the test files to remove the commas from the description so we don't get bitten in the butt anymore, but I'm not sure what the regex should look like since I need to preserve everything else and remove just the commas. The line is from a jmeter jmx file.
Here are a few sample lines:
1 comma
HTTPSamplerProxy guiclass="HttpTestSampleGui" testclass="HTTPSamplerProxy" testname="avgRespTime inst = green, 12 hr" enabled="true">
2 commas
HTTPSamplerProxy guiclass="HttpTestSampleGui" testclass="HTTPSamplerProxy" testname="avgRespTime, inst = network, 2 days" enabled="true">
Can someone give me a hint on how to search for this line and remove only commas while keeping everything else intact? Thanks in advance for any help you can give me.
EDIT: There might be other lines in the jmx file that contain a comma too so I can't blindly say something like:
sed -i 's/,//g' file.jmx
You can use tr to remove all commas from a given string:
s='HTTPSamplerProxy guiclass="HttpTestSampleGui" testclass="HTTPSamplerProxy" testname="avgRespTime, inst = network, 2 days" enabled="true">'
tr -d <<< "$s"
Or to change it inline using sed in all the lines that have HTTPSamplerProxy text :
sed -i.bak '/HTTPSamplerProxy/s/,//g' file
This awk should do:
awk '/HTTPSamplerProxy/ {gsub(/,/,"")} 1' file
It will search for line with HTTPSamplerProxy, then replace , with nothing
After this is done the 1 will than print everything out.
If you like to write back data to the original file like the sed -i 'code' do:
awk '/HTTPSamplerProxy/ {gsub(/,/,"")} 1' file > tmp && mv tmp file