i have a file which contains these values:-
<property name="india" column="delhi" />
<property name="austrelia" column="sydney" />
<property name="uae" column="dubai" />
Now i want to extract value inside the first " ".
So result should be :-
india
austrelia
uae
i am using shell and my regex is "(.*?)" . But it selects both " " value. I want only first one.
Can someone suggest me correct regex for this.
try this:
sed -r 's/^[^"]+"([^"]*)".*/\1/' file
test with your data:
kent$ echo '<property name="india" column="delhi" />
<property name="austrelia" column="sydney" />
<property name="uae" column="dubai" />'|sed -r 's/^[^"]+"([^"]*)".*/\1/'
india
austrelia
uae
$ awk -F\" '{print $2}' file
Btw, probably shell is not ideal tool for parsing XML.
Related
I have large log files (around 50mb each), which contain java debug information plus all kinds of XML responses
Here's an example of something I'm trying to extract from the log
<envelope>
<response>
<ATTR name="uniqueid" value="XYZ_00000-00-00_12345_1"/>
<ATTR name="status" value="Activated"/>
<ATTR name="datecreated" value="2018/10/04 09:39:05"/>
</response>
</envelope>
I need only the XMLs which the uniqueid attribute contains "12345" and the status attribute is set to "Activated"
By using "sed" I'm able to extract all the envelopes, and currently I'm using regex to check if the above conditions exist inside of it (by running all of them in a loop).
sed -n '/<envelope>/,/<\/envelope>/p' logfile
What would be a proper solution to extract what I need from the file?
Thanks!
assuming your xml is formatted as shown, this should work...
$ awk '/<envelope>/ {line=$0; p=0; next}
line {line=line ORS $0}
/uniqueid/ && $3~/12345/ {p=1}
/<\/envelope>/ && p {print line}' file
with the opening tag, start accumulating the lines, if the desired line found set the flag, with the end tag if the flag is set print the record.
with gawk you can do this instead
$ awk -F'\n' -v RS='</envelope>\n' \
'$3~/uniqueid.*12345/ && $4~/status.*Activated/{print $0, RT}' file
there will be an extra newline though.
I am trying to build a PowerShell script such that I give it an input file and regex, it replaces the matching content with the environment variable.
For example,
If the input file contains the following:
<?xml version="1.0" encoding="utf-8"?>
<Application xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" Name="fabric:/Services" xmlns="http://schemas.microsoft.com/2011/01/fabric">
<Parameters>
<Parameter Name="IntegrationManager_PartitionCount" Value="1" />
<Parameter Name="IntegrationManager_MinReplicaSetSize" Value="2" />
<Parameter Name="IntegrationManager_TargetReplicaSetSize" Value="#{INT_MGR_IC}" />
<Parameter Name="EventManager_InstanceCount" Value="#{EVT_MGR_IC}" />
<Parameter Name="Entities_InstanceCount" Value="#{ENT_IC}" />
<Parameter Name="Profile_InstanceCount" Value="#{PRF_IC}" />
<Parameter Name="Identity_InstanceCount" Value="#{IDNT_IC}" />
</Parameters>
</Application>
I would like to build a script that replaces #{INT_MGR_IC} with the value of the INT_MGR_IC environment variable and so on.
If you know of such script or can point me in the right direction, it would be a great help. Specifically, I am interested to know how to:
Extract and loop over keys from the file such as: #{INT_MGR_IC}, #{EVT_MGR_IC}, etc.
Once I have the key, how do I replace it with an associated environment variable. For example, #{INT_MGR_IC} with INT_MGR_IC env. variable.
Thanks a lot for looking into this :)
UPDATE 1
This is the RegEx I am planning to use: /#{(.+)}/g
Just load the file using the Get-Content cmdlet, iterate over each Parmeter, filter all parameter that Value starts with an #using Where-Object and change the value. Finally, use the Set-Content cmdlet to write it back:
$contentPath = 'Your_Path_Here'
$content = [xml] (Get-Content $contentPath)
$content.DocumentElement.Parameters.Parameter | Where Value -Match '^#' | ForEach-Object {
$_.Value = "REPLACE HERE"
}
$content | Set-Content $contentPath
In case you need to determine an environment variable, you could use [Environment]::GetEnvironmentVariable($_.Value).
Thanks a lot for everyone who helped. Especially, #jisaak :)
Here is the final script I built that solved the problem in the question. Hopefully its useful to someone!
$configPath = 'Cloud.xml'
$config = [xml] (Get-Content $configPath)
$config.DocumentElement.Parameters.Parameter | Where {$_.Value.StartsWith("#{")} | ForEach-Object {
$var = $_.Value.replace("#{","").replace("}","")
$val = (get-item env:$var).Value
Write-Host "Replacing $var with $val"
$_.Value = $val
}
$config.Save($configPath)
My build.xml has 134 targets, most of which are hidden (hidden="true"). Is there a way to list all targets from the commandline? Target-definitions are sometimes split over multiple-lines, and I usually use double-quote-characters for properties. I'm running this on Debian, and kudos for sorting targets and/or also displaying descriptions. :-)
Examples:
<target name="example1" hidden="false" />
<target name="example3" hidden="true" />
<target
description="Ipsum lorem"
hidden="true"
name='example3'
>
<phingcall target="example1" />
</target>
We can't do this with Phing, but we can within Phing. There's probably a cleaner, better way than this, but this works - assuming all other properties are wrapped in double quotes (i.e. it just passes example#3, above)
<target name="list_all" hidden="false">
<property name="command" value="
cat ${phing.file.foo}
| perl -pe 's|^\s*||g'
| perl -0pe 's|\n([^<])| \1|gs'
| grep '<target'
| perl -pe "s|name='([^']*)'|name=\"\1\"|g"
| perl -pe 's|^<target(\s?)||'
| perl -pe 's|(.*)([ ]?)depends="([^"]*)"([ ]?)(.*)|\1 \2|g'
| perl -pe 's|(.*)([ ]?)hidden="([^"]*)"([ ]?)(.*)|\1 \2|g'
| perl -pe 's|.*description="([^"]*).*name="([^"]*).*|name="\2" description="\1"|g'
| perl -pe 's|name="([^"]*)"|\1|g'
| perl -pe 's|description="([^"]*)"|[\1]|g'
| sort
| uniq
" override="true" />
<exec command="${command}" passthru="true" />
</target>
What do those lines do?
1) output the contents of build.xml. Here, I'm interested in my 'global' build.xml which is named 'foo';
2) remove all leading whitespace from each line;
3) remove line-breaks within each opening tag;
4) filter for lines starting "<target";
5) change single quote-marks on name-property to double;
6, 7,8) remove leading "<target", and depends and hidden properties;
9) move 'description' after 'name' on each line;
10) remove 'name=' and its quote-marks;
11) replace 'description=' and its quote-marks with square-brackets; and
12, 13) sort & remove duplicates (if any)
Sample output:
example1
example2
example3 [Ipsum lorem]
You can't with phing itself. The code simply skips the display if the target is set to "hidden".
I have hdfs-site.xml file which contains following information
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>/data/dfs/nn</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/data/dfs/dn,/mnt_test_volume/data/dfs/dn,/mnt_test_volume/data/dfs/dni,/mnt_test_v5olume/data/dfs/dn,/mnt_test_volume/d5ata/dfs/dgn</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
</configuration>
I want to remove some of the entries present in <name>dfs.data.dir</name><value> and </value></name> tags. Which entry to remove is decided by one parameter to shell script.
I am new to sed and I have written following sed command to find particular entry and delete it. This works as expected when sed is executed very first time but when same command is executed next time, all the contents of the file are wiped out and file becomes a blank file.
sed -ni '1h; 1!H; ${g; s#\(<name>dfs\.data\.dir<\/name>[^a-zA-Z0-9]*<value>.*\)'$data_dir_path'[^,<]\(.*<\/value>\)#\1\2# p}' hdfs-site.xml
In this command $data_dir_path variable decides which entry to be deleted.
For example, if value of data_dir_path is /mnt_test_volume/data/dfs/dn then I am expecting following output
<name>dfs.data.dir</name> <value>/data/dfs/dn,,/mnt_test_volume/data/dfs/dni,/mnt_test_v5olume/data/dfs/dn,/mnt_test_volume/d5ata/dfs/dgn</value>
which is working fine when command is executed once but if same command is executed next time, entire file becomes empty.
Can anyone please tell me what am I doing wrong here?
You can use a much simpler sed as
sed "/<name>dfs.data.dir<\/name>/ {n; s#$data_dir_path##}" hdfs-site.xml
What it does?
-i inplace editing of the file
'/<name>dfs.data.dir<\/name>/ checks if the line matches the pattern. If yes then the commands following are excecuted. Note that the commands following are grouped in {} as {n; s/'$data_dir_path'//}'
n; reads the next line from file into the pattern space
s/'$data_dir_path'// substiture the value in $data_dir_path with null
Test
$ sed "/<name>dfs.data.dir<\/name>/ {n; s#$data_dir_path##}" test
bash-3.2$ cat test
:
:
:
<name>dfs.data.dir</name>
<value>/data/dfs/dn,,i,/mnt_test_v5olume/data/dfs/dn,/mnt_test_volume/d5ata/dfs/dgn</value>
:
:
:
I am trying to replace a particular xml statement and making it as a comment.I am trying for some linux awk,sed or any regular grammer expression,but completely stucked is therey anyway by which i can achieve this task.Below is the scenario i am looking for.
For Example
I have a n numbers of xml files. I want to replace a statement which has a word "Distribution_Facilities_carrying_Item" and should get replace with comment statement.
suppose the statement is ----
<Parameter name="RelationshipName1" direction="in" eval="constant" type="string">Distribution_Facilities_carrying_Item</Parameter>
.....as this statement contains the word "Distribution_Facilities_carrying_Item" i will replace this statement as a comment.So i want it to get replaced as
<!--Parameter name="RelationshipName1" direction="in" eval="constant" type="string">Distribution_Facilities_carrying_Item</Parameter-->
Further all such a statement in all the xml files should get replaced as a commented xml statement.Below is the pattern in which they might occcur.So how should i go about it.I know one needs to be an adept in the regular expression,because it's the only way to achieve.
......................................
This statement can be there in n number of xml files.
File:a.xml
<Parameter name="RelationshipName1" direction="in" eval="constant" type="string">Distribution_Facilities_carrying_Item</Parameter>
<Parameter direction="in" eval="constant" type="string" name="RelationshipName3">Distribution_Facilities_carrying_Item</Parameter>
<Parameter name="RelationshipName" direction="in" eval="constant" type="string">Distribution_Facilities_carrying_Item</Parameter>
<Parameter direction="in" name="RelationshipName10" type="string" eval="constant">Distribution_Facilities_carrying_Item</Parameter>
<Parameter direction="in" name="RelationshipName11" type="string" eval="constant">Distribution_Facilities_carrying_Item</Parameter>
<Parameter direction="in" eval="constant" type="string" name="RelationshipName5">Distribution_Facilities_carrying_Item</Parameter>
Thanks in advance!!
Using sed:
sed '/Distribution_Facilities_carrying_Item/ s/<\(.*\)>/<!--\1-->/' inputfile
would comment all lines containing the string Distribution_Facilities_carrying_Item.
If you want to modify the file in-place, add the -i option:
sed -i '/Distribution_Facilities_carrying_Item/ s/<\(.*\)>/<!--\1-->/' inputfile
If this is to be performed for all .xml files in a directory, use find and -exec:
find /some/dir -maxdepth 1 -type f -name "*.xml" -exec sed -i '/Distribution_Facilities_carrying_Item/ s/<\(.*\)>/<!--\1-->/' {} \;
(Remove -maxdepth 1 from the find command if you want to do it recursively.)
check with below sed equation it will comment
sed -i 's/\(<.*Distribution_Facilities_carrying_Item.*>\)/<!--\1-->/' filename.xml
Do not use regular expressions to parse XML. Use a proper parser. For example, using xsh:
my $search = "Distribution_Facilities_carrying_Item" ;
for my $file in { #ARGV } {
open $file ;
for my $p in //Parameter[text() = $search]
xinsert comment { $p->toString } replace $p ;
save :b ;
}
If you want to delete the text, too, you can change the inner loop to
for my $p in //Parameter[text() = $search] {
delete $p/text() ;
xinsert comment { $p->toString } replace $p ;
}
An awk version:
awk '/Distribution_Facilities_carrying_Item/ {sub(/^</,"<!--");sub(/>$/,"-->")}1' a.xml