Regex compare strings from multiple files - regex

I have multiple XML files that contain various strings. I also have a text file of strings, some of which are contained within the XML files.
XML:
text="$$sRegister $$s is stuck at One. (VDB-5014)" uid="5014"/>
String File:
is stuck at one
I would like to print the strings that are both in my string file and my XML file. This way I can set the correct message type in the XML file. Given the high volume of messages I've been attempting to automate this process. Thoughts?

You can use grep -f:
grep -f stringFile xmlFile

Related

Replacing strings in xml file using batch script

I'm new to batch script. I want to replace a strings in a particular file.
In below script I'm getting error.
#echo off
$standalone = Get-Content 'C:\wildfly\standalone\configuration\standalone.xml'
$standalone -replace '<wsdl-host>${jboss.bind.address:127.0.0.1}</wsdl-host>','<wsdl-host>${jboss.bind.address:0.0.0.0}</wsdl-host>' |
Set-Content 'C:\wildfly\standalone\configuration\standalone.xml'
The proper way to edit XML is to process it as an XML document, not as a string. That's because the XML file is not guaranteed to maintain specific formatting. Any edits should be context-aware and string replace isn't. Consider the three eqvivalent XML fragments:
<wsdl-host>${jboss.bind.address:127.0.0.1}</wsdl-host>
<wsdl-host>${jboss.bind.address:127.0.0.1}</wsdl-host >
<wsdl-host >${jboss.bind.address:127.0.0.1}</wsdl-host >
Note that whitespacing in element names is different and it's legal to add some. What's more, in practice, a lot of implementations simply discard line breaks in element values, so the two following are likely to provide same results to a config parser:
<wsdl-host>${jboss.bind.address:127.0.0.1}</wsdl-host>
<wsdl-host>${jboss.bind.address:127.0.0.1}
</wsdl-host>
It really doesn't make much sense to process XML as string, does it?
Fortunately, Powershell has built-in support for XML files. A simple approach is like so,
# Mock XML config
[xml]$x = #'
<root>
<wsdl-host>${jboss.bind.address:127.0.0.1}</wsdl-host>
</root>
'#
# Let's change the wsdl-host element's contents
$x.root.'wsdl-host' = '${jboss.bind.address:0.0.0.0}'
# Save the modified document to console to see the change
$x.save([console]::out)
<?xml version="1.0" encoding="ibm850"?>
<root>
<wsdl-host>${jboss.bind.address:0.0.0.0}</wsdl-host>
</root>
If you can't use Powershell and are stuck with batch scripts, you really need to use a 3rd party XML manipulation program.

Regex to batch rename files in for loop by only extracting first 5 characters

I have many .xlsx files that look like XXX-A_2016(Final).xlsx and I am trying to write a shell script (bash) that will batch convert each one to csv, but also rename the output file to just "XXX-A.csv", so I think I need a regular expression within my for loop that extracts the first 5 characters of the input string (filename). I have xlsx2csv and I am using the following loop:
for i in *.xlsx;
do
filename=$(basename "$i" .xlsx);
outext=".csv"
xlsx2csv $i $filename$outext
done
There is a line missing that would take care of the file renaming prior to converting to csv.
You can use:
for i in *.xlsx; do
xlsx2csv "$i" "${i%_*}".csv
done
"${i%_*}" will strip anything after _ at the end of variable $i, giving us XXX-A as a result.

Fetch particular pattern from file and read file

I have text file which contain some pattern(First I thought to use awk command into text file
into C++).The below solution works fine for us one single file.
https://stackoverflow.com/questions/15151055/truncate-file-in-linux
But We are getting multiple file which also contain different pattern.So we need to write awk
command for each file which will not be generic solution.
Than after I have found lexer (Flex) for pattern matching in c++ (linux enviroment) But I faced some issue and could write lexer file. So I thought do we have any open source library in linux platform for pattern match in text file and convert into xml file. ( work in progress for google but do not have any concrete solution).
In brief,
1)Search Pattern into Text File(In current, we are using awk command in c++ (in case any general solution)
2)Read Tabular format file into C++.
I hope I am able to convey my message.

Find, move, replace and parse strings simultanuosly while building an .xml playlist file

I get many videos and I need to compile functioning .xml playlist files where they are all listed, including snapshot jpg's. Videos and snapshot images are named automatically. So I end up with lots of files like this:
hxxp://site.com/video/_5712.480p.flv
hxxp://site.com/video/_5712.480p.jpg
hxxp://site.com/video/_5713.480p.flv
hxxp://site.com/video/_5713.480p.jpg
So with these files I need to produce an .xml file looking something like this:
....
<track>
<title>5712.480p</title>
<creator>Whatever_5712.480p</creator>
<info>hxxp://site.com/video/_5712.480p.jpg</info>
<annotation>Playlist marked_480p</annotation>
<location>hxxp://site.com/video/_5712.480p.flv</location>
<image>hxxp://site.com/video/_5712.480p.jpg</image>
</track>
<track>
<title>5713.480p</title>
<creator>Whatever_5713.480p</creator>
<info>hxxp://site.com/video/_5713.480p.jpg</info>
<annotation>Playlist marked_480p</annotation>
<location>hxxp://site.com/video/_5713.480p.flv</location>
<image>hxxp://site.com/video/_5713.480p.jpg</image>
</track>
So I guess I might be looking at some advanced sed/awk procedure to copy, move and place the right strings inside the correct brackets, and to compile one whole file? I really appreciate all the help I can get on this one. Thx
With that input, you can do something like:
awk 'NR%2==1 && /\.jpg$/ {JPGFILE=$0}
NR%2==0 { print "whateverXMLtags" JPGFILE "whatanotherXMLtags" $0 "someotherXMLtags" }' INPUTFILELIST
So this assumes that jpg files are on odd numbered lines, and on that saves the name, and on every even line prints the desired output. Note that the SPACE between e.g. JPGFILE and "whatanotherXMLtags" concatenates the sring.

How do I detect plaintext in a MIME file?

I have a large set of MIME files, which contain multiple parts. Many of the files contain parts labelled with the following headers:
Content-Type: application/octet stream
Content-Transfer-Encoding: Binary
However, sometimes the contents of these parts are some form of binary code, and sometimes they are plaintext.
Is there a clever way in either C++, Bash or Ruby to detect whether the contents of a MIME part labelled as application/octet stream is binary data or plaintext?
The -I option of grep will treat binary files as files without a match. Combined with the -q option grep will return a nonzero exit status if a file is binary.
if grep -qI -e '' <file>
then
# plaintext
else
# binary
fi
The simplest method is to split the file into a set of multiple files each of which contains one of the component parts. We can then use grep and other functions to ascertain the text format.