I have hdfs-site.xml file which contains following information
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>/data/dfs/nn</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/data/dfs/dn,/mnt_test_volume/data/dfs/dn,/mnt_test_volume/data/dfs/dni,/mnt_test_v5olume/data/dfs/dn,/mnt_test_volume/d5ata/dfs/dgn</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
</configuration>
I want to remove some of the entries present in <name>dfs.data.dir</name><value> and </value></name> tags. Which entry to remove is decided by one parameter to shell script.
I am new to sed and I have written following sed command to find particular entry and delete it. This works as expected when sed is executed very first time but when same command is executed next time, all the contents of the file are wiped out and file becomes a blank file.
sed -ni '1h; 1!H; ${g; s#\(<name>dfs\.data\.dir<\/name>[^a-zA-Z0-9]*<value>.*\)'$data_dir_path'[^,<]\(.*<\/value>\)#\1\2# p}' hdfs-site.xml
In this command $data_dir_path variable decides which entry to be deleted.
For example, if value of data_dir_path is /mnt_test_volume/data/dfs/dn then I am expecting following output
<name>dfs.data.dir</name> <value>/data/dfs/dn,,/mnt_test_volume/data/dfs/dni,/mnt_test_v5olume/data/dfs/dn,/mnt_test_volume/d5ata/dfs/dgn</value>
which is working fine when command is executed once but if same command is executed next time, entire file becomes empty.
Can anyone please tell me what am I doing wrong here?
You can use a much simpler sed as
sed "/<name>dfs.data.dir<\/name>/ {n; s#$data_dir_path##}" hdfs-site.xml
What it does?
-i inplace editing of the file
'/<name>dfs.data.dir<\/name>/ checks if the line matches the pattern. If yes then the commands following are excecuted. Note that the commands following are grouped in {} as {n; s/'$data_dir_path'//}'
n; reads the next line from file into the pattern space
s/'$data_dir_path'// substiture the value in $data_dir_path with null
Test
$ sed "/<name>dfs.data.dir<\/name>/ {n; s#$data_dir_path##}" test
bash-3.2$ cat test
:
:
:
<name>dfs.data.dir</name>
<value>/data/dfs/dn,,i,/mnt_test_v5olume/data/dfs/dn,/mnt_test_volume/d5ata/dfs/dgn</value>
:
:
:
Related
Following up on an answer by #dawg to my question how to delete multiple sections in a file based on known patterns, I want to use a regular expression in awk to identify the start of the section(s) I want to delete.
The file I am working with is an xml file. It is in fact the file containing the recently used filenames list (RUFL) in Linux Mint (~/.local/share/recently-used.xbel).
This is how the RUFL is structured:
<?xml version="1.0" encoding="UTF-8"?>
<xbel version="1.0"
xmlns:bookmark="http://www.freedesktop.org/standards/desktop-bookmarks"
xmlns:mime="http://www.freedesktop.org/standards/shared-mime-info"
>
<bookmark href="file:///home/ocor61/Documents/Linux/Linux%20Mint%20Cinnamon%20Keyboard%20Shortcuts.pdf" added="2021-07-18T01:57:02Z" modified="2021-07-18T01:57:02Z" visited="1969-12-31T23:59:59Z">
<info>
<metadata owner="http://freedesktop.org">
<mime:mime-type type="application/pdf"/>
<bookmark:applications>
<bookmark:application name="Document Viewer" exec="'xreader %u'" modified="2021-07-18T01:57:02Z" count="1"/>
</bookmark:applications>
</metadata>
</info>
</bookmark>
<bookmark href="file:///home/ocor61/Documents/Linux/Linux%20Command%20Line%20Cheat%20Sheet.pdf" added="2021-07-18T01:57:09Z" modified="2021-07-18T01:57:09Z" visited="1969-12-31T23:59:59Z">
<info>
<metadata owner="http://freedesktop.org">
<mime:mime-type type="application/pdf"/>
<bookmark:applications>
<bookmark:application name="Document Viewer" exec="'xreader %u'" modified="2021-07-18T01:57:09Z" count="1"/>
</bookmark:applications>
</metadata>
</info>
</bookmark>
<bookmark href="file:///home/ocor61/Documents/work.bfproject" added="2021-07-20T10:52:59Z" modified="2021-07-22T08:41:57Z" visited="1969-12-31T23:59:59Z">
<info>
<metadata owner="http://freedesktop.org">
<mime:mime-type type="application/x-bluefish-project"/>
<bookmark:applications>
<bookmark:application name="bluefish" exec="'bluefish %u'" modified="2021-07-22T08:41:57Z" count="2"/>
</bookmark:applications>
</metadata>
</info>
</bookmark>
</xbel>
I am working on a script to remove filenames from the list. It works fine, but I am also working with an array that contains patterns that should not be used. For example: if the pattern [bookmark] would be used to identify a section that must be removed, the entire file would become unusable. That goes for parts of [bookmark], but also for href, added, info... You get my drift.
So, I want to work with a regexp to counter the problems of entering patterns that cannot be used.
Currently, this is the awk code I am using now (thanks to #dawg):
ENDLINE='</bookmark>'
awk -v f=1 -v st="$1" -v end="$ENDLINE" '
match($0, st) {f=0}
f
match($0, end){f=1}' ~/.local/share/recently-used.xbel
$1 would be the pattern a user enters at the command line, which is part of the file name that must be removed from the RUFL.
The following is the code I would like to use, including the regexp, which doesn't work:
STARTLINE='/(<bookmark href)(.*)($1)(.*)(>)/'
ENDLINE='</bookmark>'
awk -v f=1 -v st="$STARTLINE" -v end="$ENDLINE" '
match($0, st) {f=0}
f
match($0, end){f=1}' ~/.local/share/recently-used.xbel
I have tested the regular expression at https://regexr.com/, so I know it is correct. However, when I use it in my script, this is the error message I am getting:
./ruffle.sh: line 99: syntax error near unexpected token `$0,'
./ruffle.sh: line 99: ` match($0, st) {f=0}'
I have also tried to enter the regexp itself in the awk command line instead of the variable, but that has the same result.
I don't know how to proceed, so any help is appreciated.
The answer to my question lies in how regular expressions can differ when used in different environments. The website I used to check my regexp does so for languages like JS, but not for Bash or likely other shell implementations.
With shellcheck.net as well as by putting the command 'set -vx' in my script right before the awk command, I managed to work things out.
Another mistake I made was to attempt to catch the complete line in the regexp, while I need only the part in that line that can hold the pattern that is entered (which is the part between 'file:' and 'added' in the file ~/.local/share/recently-used.xbel).
The regexp that ultimately works for me now with the variable STARTLINE is:
STARTLINE='file:.*'$1'.*added='
I will have to look into using an xml parser, thanks for the suggestion! For now, however, my script works. Thanks #Sundeep and #EdMorton!
I am trying to extract a value in a shell script using xmllint, I was able to find and extract values by matching complete key strings.
The problem is for some values I just know what the key starts with.
For example: let a part of xml be:
<property>
<name>foo.bar.random_part_of_name</name>
<value> SOME_VALUE</value>
</property>
I want to extract this entire segment as write it to an output file.
So far, I have been able to match complete segments with
if (xmllint --xpath '//property[name/text()="foo.bar"]/value/text()' "$INPUT_FILE"); then
value=$(xmllint --xpath '//property[name/text()="foo.bar"]/value/text()' "$INPUT_FILE")
echo "<property><name>foo.bar</name><value>$value</value></property>">> $OUTPUT_FILE
fi
Thanks in advance
Xpath 1.0 offers start-with(node, pattern) function to do what you want
name="foo.bar"
value=$(xmllint --xpath "//property[starts-with(name,'$name')]/value/text()" test.xml)
if [ -n "$value" ]; then
echo "<property><name>$name</name><value>$value</value></property>"
fi
Result:
<property><name>foo.bar</name><value> SOME_VALUE</value></property>
I have large log files (around 50mb each), which contain java debug information plus all kinds of XML responses
Here's an example of something I'm trying to extract from the log
<envelope>
<response>
<ATTR name="uniqueid" value="XYZ_00000-00-00_12345_1"/>
<ATTR name="status" value="Activated"/>
<ATTR name="datecreated" value="2018/10/04 09:39:05"/>
</response>
</envelope>
I need only the XMLs which the uniqueid attribute contains "12345" and the status attribute is set to "Activated"
By using "sed" I'm able to extract all the envelopes, and currently I'm using regex to check if the above conditions exist inside of it (by running all of them in a loop).
sed -n '/<envelope>/,/<\/envelope>/p' logfile
What would be a proper solution to extract what I need from the file?
Thanks!
assuming your xml is formatted as shown, this should work...
$ awk '/<envelope>/ {line=$0; p=0; next}
line {line=line ORS $0}
/uniqueid/ && $3~/12345/ {p=1}
/<\/envelope>/ && p {print line}' file
with the opening tag, start accumulating the lines, if the desired line found set the flag, with the end tag if the flag is set print the record.
with gawk you can do this instead
$ awk -F'\n' -v RS='</envelope>\n' \
'$3~/uniqueid.*12345/ && $4~/status.*Activated/{print $0, RT}' file
there will be an extra newline though.
I am trying to replace a particular xml statement and making it as a comment.I am trying for some linux awk,sed or any regular grammer expression,but completely stucked is therey anyway by which i can achieve this task.Below is the scenario i am looking for.
For Example
I have a n numbers of xml files. I want to replace a statement which has a word "Distribution_Facilities_carrying_Item" and should get replace with comment statement.
suppose the statement is ----
<Parameter name="RelationshipName1" direction="in" eval="constant" type="string">Distribution_Facilities_carrying_Item</Parameter>
.....as this statement contains the word "Distribution_Facilities_carrying_Item" i will replace this statement as a comment.So i want it to get replaced as
<!--Parameter name="RelationshipName1" direction="in" eval="constant" type="string">Distribution_Facilities_carrying_Item</Parameter-->
Further all such a statement in all the xml files should get replaced as a commented xml statement.Below is the pattern in which they might occcur.So how should i go about it.I know one needs to be an adept in the regular expression,because it's the only way to achieve.
......................................
This statement can be there in n number of xml files.
File:a.xml
<Parameter name="RelationshipName1" direction="in" eval="constant" type="string">Distribution_Facilities_carrying_Item</Parameter>
<Parameter direction="in" eval="constant" type="string" name="RelationshipName3">Distribution_Facilities_carrying_Item</Parameter>
<Parameter name="RelationshipName" direction="in" eval="constant" type="string">Distribution_Facilities_carrying_Item</Parameter>
<Parameter direction="in" name="RelationshipName10" type="string" eval="constant">Distribution_Facilities_carrying_Item</Parameter>
<Parameter direction="in" name="RelationshipName11" type="string" eval="constant">Distribution_Facilities_carrying_Item</Parameter>
<Parameter direction="in" eval="constant" type="string" name="RelationshipName5">Distribution_Facilities_carrying_Item</Parameter>
Thanks in advance!!
Using sed:
sed '/Distribution_Facilities_carrying_Item/ s/<\(.*\)>/<!--\1-->/' inputfile
would comment all lines containing the string Distribution_Facilities_carrying_Item.
If you want to modify the file in-place, add the -i option:
sed -i '/Distribution_Facilities_carrying_Item/ s/<\(.*\)>/<!--\1-->/' inputfile
If this is to be performed for all .xml files in a directory, use find and -exec:
find /some/dir -maxdepth 1 -type f -name "*.xml" -exec sed -i '/Distribution_Facilities_carrying_Item/ s/<\(.*\)>/<!--\1-->/' {} \;
(Remove -maxdepth 1 from the find command if you want to do it recursively.)
check with below sed equation it will comment
sed -i 's/\(<.*Distribution_Facilities_carrying_Item.*>\)/<!--\1-->/' filename.xml
Do not use regular expressions to parse XML. Use a proper parser. For example, using xsh:
my $search = "Distribution_Facilities_carrying_Item" ;
for my $file in { #ARGV } {
open $file ;
for my $p in //Parameter[text() = $search]
xinsert comment { $p->toString } replace $p ;
save :b ;
}
If you want to delete the text, too, you can change the inner loop to
for my $p in //Parameter[text() = $search] {
delete $p/text() ;
xinsert comment { $p->toString } replace $p ;
}
An awk version:
awk '/Distribution_Facilities_carrying_Item/ {sub(/^</,"<!--");sub(/>$/,"-->")}1' a.xml
i have a file which contains these values:-
<property name="india" column="delhi" />
<property name="austrelia" column="sydney" />
<property name="uae" column="dubai" />
Now i want to extract value inside the first " ".
So result should be :-
india
austrelia
uae
i am using shell and my regex is "(.*?)" . But it selects both " " value. I want only first one.
Can someone suggest me correct regex for this.
try this:
sed -r 's/^[^"]+"([^"]*)".*/\1/' file
test with your data:
kent$ echo '<property name="india" column="delhi" />
<property name="austrelia" column="sydney" />
<property name="uae" column="dubai" />'|sed -r 's/^[^"]+"([^"]*)".*/\1/'
india
austrelia
uae
$ awk -F\" '{print $2}' file
Btw, probably shell is not ideal tool for parsing XML.