Change the color of x SVG files - regex

I'd like to change the color of at least 1.000 SVG files. The main problem I have is that the currently SVGs doesnt contain the "fill" attribute, so I have to add fill="X" at the end of the SVG tag.
Heres is an example of one SVG file:
<?xml version="1.0" encoding="utf-8"?>
<!-- Generator: Adobe Illustrator 19.0.0, SVG Export Plug-In . SVG Version: 6.00 Build 0) -->
<svg version="1.1" id="Layer_1" " x="0px" y="0px"
viewBox="-236 286 30 30" style="enable-background:new -236 286 30 30;" xml:space="preserve">
Thanks for your help.

There are many possibilites to do that. The safest way would be to read the XML structure and then manipulate that. But for that specific example you could also use the following regex with e.g. sed or python:
With sed:
sed -E 's/xml:space=\"preserve\">/xml:space="preserve" fill="red" >/gm;t;d'
With Python:
import re
regex = r"xml:space=\"preserve\">"
test_str = ("<?xml version=\"1.0\" encoding=\"utf-8\"?>\n"
"<!-- Generator: Adobe Illustrator 19.0.0, SVG Export Plug-In . SVG Version: 6.00 Build 0) -->\n"
"<svg version=\"1.1\" id=\"Layer_1\" \" x=\"0px\" y=\"0px\"\n"
" viewBox=\"-236 286 30 30\" style=\"enable-background:new -236 286 30 30;\" xml:space=\"preserve\">")
subst = "xml:space=\"preserve\" fill=\"red\" >"
# You can manually specify the number of replacements by changing the 4th argument
result = re.sub(regex, subst, test_str, 0, re.MULTILINE)
if result:
print (result)
you can also have a look at regex101.com/r/cIfbEd

Related

Rmd report does not import margins, header o footer from template when using officedown::rdocx_document:

Edit:
I installed again the previous version of the package (ver. 0.2.1) and the problem is solved.
__
I am making a Rmarkdown report with the officedown package.
My Word output imports well the Word Styles of the template document, but does not import the margins, header or footer of the same template, did the sintax change?, here is the YAML of the report:
---
output:
officedown::rdocx_document:
reference_docx: template.docx
---
The template works well when is used with the officer package, or when is used with the "normal" word output like this:
---
output:
word_document:
reference_docx: template.docx
---
Any ideas?
Thanks a lot for the help, i have been using the officeverse packages a lot in the last year and these are awesome
I had the same problem, but you can set these parameters in the YAML header as mentioned in the Officeverse book.
Units in inches
output:
officedown::rdocx_document:
page_size:
width: 8.3
height: 11.7
orient: "portrait"
page_margins:
bottom: 1
top: 1
right: 1.25
left: 1.25
header: 0.5
footer: 0.5
gutter: 0.5

another LaTeX Error: Missing \begin{document} in rmarkdown

I'm automating a pdf report using rmarkdown. I use a macro to run the code. I can run the code once and it works with no problems. When I call the macro again, it appears to work but when creating a pdf, I get the error "LaTeX Error: Missing \begin{document}"
This is what I get the first time:
output file: L:/Statunit/morton/NCC R markdown reports/NCC Reports/NCC_Dashboard_Report_Dave.knit.md
"C:/Program Files/RStudio/bin/pandoc/pandoc" +RTS -K512m -RTS "L:/Statunit/morton/NCC R markdown reports/NCC Reports/NCC_Dashboard_Report_Dave.utf8.md" --to latex --from markdown+autolink_bare_uris+ascii_identifiers+tex_math_single_backslash --output pandoc9e03c3032cf.tex --template "C:\Users\Mortond\Documents\R\win-library\3.5\rmarkdown\rmd\latex\default-1.17.0.2.tex" --highlight-style tango --latex-engine xelatex --variable graphics=yes --variable "geometry:margin=1in" --variable "compact-title:yes" --include-in-header "C:\Users\Mortond\AppData\Local\Temp\Rtmp8cWvvQ\rmarkdown-str9e022b75c22.html"
Output created: Report-254-225573.pdf
The second time, I call the same code but only change the report name, so the data is the same and I get.
output file: L:/Statunit/morton/NCC R markdown reports/NCC Reports/NCC_Dashboard_Report_Dave.knit.md
"C:/Program Files/RStudio/bin/pandoc/pandoc" +RTS -K512m -RTS "L:/Statunit/morton/NCC R markdown reports/NCC Reports/NCC_Dashboard_Report_Dave.utf8.md" --to latex --from markdown+autolink_bare_uris+ascii_identifiers+tex_math_single_backslash --output pandoc9e01f0a74c5.tex --template "C:\Users\Mortond\Documents\R\win-library\3.5\rmarkdown\rmd\latex\default-1.17.0.2.tex" --highlight-style tango --latex-engine xelatex --variable graphics=yes --variable "geometry:margin=1in" --variable "compact-title:yes"
! LaTeX Error: Missing \begin{document}.
Error: Failed to compile Report-253-225573.tex. See Report-253-225573.log for more info.
my YAML is
---
title: ''
header-includes:
- \usepackage{fancyhdr}
- \addtolength{\headheight}{1.0cm} % make more space for the header
- \pagestyle{fancyplain} % use fancy for all pages except chapter start
- \lhead{\includegraphics[height=1.2cm]{TJC_logo_color.png}} % left logo
- \renewcommand{\headrulewidth}{0pt} % remove rule below header
output:
pdf_document:
latex_engine: xelatex
word_document: default
html_document: default
urlcolor: blue
classoption: landscape
---
my code that calls the markdown is :
render_report = function(b,h,p) {
rmarkdown::render(
"L:/Statunit/morton/NCC R markdown reports/NCC Dashboard Report
Dave.Rmd", params = list(
b1 = b,
h1 = h,
p1 = p
),
output_file = paste0("Report-", h, "-", p, ".pdf")
)
}
render_report(b="xxxx Hospital, Inc.",h='253',p='225573')
The log file with the error part is.
("C:\Users\Mortond\AppData\Local\Programs\MiKTeX 2.9\tex\latex\graphics-def\xet
ex.def"
File: xetex.def 2017/06/24 v5.0h Graphics/color driver for xetex
))
\Gin#req#height=\dimen160
\Gin#req#width=\dimen161
)
("C:\Users\Mortond\AppData\Local\Programs\MiKTeX 2.9\tex\latex\oberdiek\grffile
.sty"
Package: grffile 2017/06/30 v1.18 Extended file name support for graphics (HO)
Package grffile Info: Option `multidot' is set to `true'.
Package grffile Info: Option `extendedchars' is set to `false'.
Package grffile Info: Option `space' is set to `true'.
Package grffile Info: \Gin#ii of package `graphicx' fixed on input line 494.
)
("C:\Users\Mortond\AppData\Local\Programs\MiKTeX 2.9\tex\latex\parskip\parskip.
sty"
Package: parskip 2018-08-24 v2.0a non-zero parskip adjustments
)
("C:\Users\Mortond\AppData\Local\Programs\MiKTeX 2.9\tex\latex\titling\titling.
sty"
Package: titling 2009/09/04 v2.1d maketitle typesetting
\thanksmarkwidth=\skip53
\thanksmargin=\skip54
\droptitle=\skip55
)
("C:\Users\Mortond\AppData\Local\Programs\MiKTeX 2.9\tex\latex\fancyhdr\fancyhd
r.sty"
Package: fancyhdr 2017/06/30 v3.9a Extensive control of page headers and footer
s
\f#nch#headwidth=\skip56
\f#nch#O#elh=\skip57
\f#nch#O#erh=\skip58
\f#nch#O#olh=\skip59
\f#nch#O#orh=\skip60
\f#nch#O#elf=\skip61
\f#nch#O#erf=\skip62
\f#nch#O#olf=\skip63
\f#nch#O#orf=\skip64
)
! LaTeX Error: Missing \begin{document}.
See the LaTeX manual or LaTeX Companion for explanation.
Type H <return> for immediate help.
...
l.90 \addtolength{\headheight}{1.0cm} \%
make more space for the header
Here is how much of TeX's memory you used:
22493 strings out of 427767
408844 string characters out of 3146884
530389 words of memory out of 3000000
26423 multiletter control sequences out of 15000+200000
532722 words of font info for 28 fonts, out of 3000000 for 9000
1328 hyphenation exceptions out of 8191
45i,0n,68p,816b,443s stack positions out of 5000i,500n,10000p,200000b,50000s
No pages of output.
So why does it work once and not a second time? If I exit RStudio and then s start it up again, it appears to work. I've tried to
.rs.restartR() to no avail. as well as
rm(list = ls(envir = globalenv()),envir = globalenv())
gc() to clean things up.
Any thoughts? I appreciate reading through all this.
I do not know if I had the same issue. But I experienced that if I compiled my document it worked the first time and failed the second. I suspected a cache issue and added cache.rebuild=T to
<<echo=F, cache=T, message=F, warning=F, `>>=
set_parent('../../Parent.Rnw')
#
Just FIY, the header of the parent does not only include the latex info but also sources my main .R file with the calculations.
Anyway, if someone experience a similar problem try to add cache.rebuild=T to your included script(s).

Unable to match XML element using Python regular expression

I have an XML document with the following structure-
> <?xml version="1.0" encoding="UTF-8"?> <!-- generated by CLiX/Wiki2XML
> [MPI-Inf, MMCI#UdS] $LastChangedRevision: 93 $ on 17.04.2009
> 12:50:48[mciao0826] --> <!DOCTYPE article SYSTEM "../article.dtd">
> <article xmlns:xlink="http://www.w3.org/1999/xlink"> <header>
> <title>Postmodern art</title> <id>192127</id> <revision>
> <id>244517133</id> <timestamp>2008-10-11T05:26:50Z</timestamp>
> <contributor> <username>FairuseBot</username> <id>1022055</id>
> </contributor> </revision> <categories> <category>Contemporary
> art</category> <category>Modernism</category> <category>Art
> movements</category> <category>Postmodern art</category> </categories>
> </header> <bdy> Postmodernism preceded by Modernism '' Postmodernity
> Postchristianity Postmodern philosophy Postmodern architecture
> Postmodern art Postmodernist film Postmodern literature Postmodern
> music Postmodern theater Critical theory Globalization Consumerism
> </bdy>
I am interested in capturing the text contained within ... and for that I wrote the following Python 3 regex code-
file = open("sample_xml.xml", "r")
xml_doc = file.read()
file.close()
body_text = re.findall(r'<bdy>(.+)</bdy>', xml_doc)
But 'body_text' is always returning an empty list. However, when I try to capture the text for the tags ... using code-
category_text = re.findall(r'(.+)', xml_doc)
This does the job.
Any idea(s) as to why the ... XML element code is not working?
Thanks!
The special character . will not match a newline, so that regex will not match a multiline string.
You can change this behavior by specifying the DOTALL flag. To specify that flag you can include this at the start of your regular expression: (?s)
More information on Python's regular expression syntax can be found here: https://docs.python.org/3/library/re.html#regular-expression-syntax
You can use re.DOTALL
category_text = re.findall(r'<bdy>(.+)</bdy>', xml_doc, re.DOTALL)
Output:
[" Postmodernism preceded by Modernism '' Postmodernity\n> Postchristianity Postmodern philosophy Postmodern architecture\n> Postmodern art Postmodernist film Postmodern literature Postmodern\n> music Postmodern theater Critical theory Globalization Consumerism\n> "]

Removing specific tags in a KML file

I have a KML file which is a list of places around the world with coordinates and some other attributes. It looks like this for one place:
<Placemark>
<name>Albania - Durrës</name>
<open>0</open>
<visibility>1</visibility>
<description>(Spot ID: 275801) show <![CDATA[forecast]]></description>
<styleUrl>#wgStyle001</styleUrl><Point>
<coordinates>19.489747,41.277806,0</coordinates>
</Point>
<LookAt><range>200000</range><longitude>19.489747</longitude><latitude>41.277806</latitude></LookAt>
</Placemark>
I would like to remove everything except the name of the place. So in this case that would mean I would like to remove everything except
<name>Albania - Durrës</name>
The problem is, this KML file includes more than 1000 of these places. Doing this manually obviously isn't an option, so then how can I remove all tags except for the name tags for all of the items in the list? Can I use some kind of program for that?
Use a specialized command line tool that understands XML documents.
One such tool is xmlstarlet, which is available here for Linux, Windows and Solaris.
To address your particular problem, I used the xmlstarlet executable xml.exe like this (on Windows):
xml.exe sel -N ns=http://www.opengis.net/kml/2.2 -t -v /ns:kml/ns:Document/ns:Placemark/ns:name places.kml
This produces this output:
Albania - Durrës
Second Name
Third Name
...
Final Name
If you can guarantee that <name> occurs only as a child of <Placemark>, then this abbreviated version will produce the same result:
xml.exe sel -N ns=http://www.opengis.net/kml/2.2 -t -v //ns:name places.kml
(This is because this shorter version finds all <name> elements no matter where they occur in the document.)
If you really want an XML document, you'll need to do a little post-processing. Here's an example of a complete XML document:
<?xml version='1.0' encoding='utf-8'?>
<items>
<item>Albania - Durrës</item>
<item>Second Name</item>
<item>Third Name</item>
<!-- ... -->
<item>Final Name</item>
</items>
This first line is the XML declaration. It declares the Unicode encoding utf-8. You'll need to include this line so that XML processors recognize that your document includes Unicode characters. (As in Durrës.)
More: Here's an enhanced 'xmlstarlet' command that will produce the XML document above:
xml.exe sel -N ns=http://www.opengis.net/kml/2.2 -T -t -o "<?xml version='1.0' encoding='utf-8'?>" -n -t -v "'<items>'" -n -t -m //ns:Placemark -v "concat('<item>',ns:name,'</item>')" -n -t -o "</items>" -n places.kml
If you are on linux or similar:
grep "<name>" your_file.kml > file_with_only_name_tags
On windows, see What are good grep tools for Windows?

Access attributes from XML in shell

I'm trying to parse out values from a Widget config.xml using shell. I do want to use sed for this task. If there is something that sucks less than xsltproc, I'd love to know.
In this example I am after the id attribute value from the config.xml below:
<?xml version="1.0" encoding="UTF-8"?>
<widget xmlns="http://www.w3.org/ns/widgets" id="http://example.org/exampleWidget" version="2.0 Beta" height="200" width="200">
<name short="123">Foo Widget</name>
</widget>
I wish it was as simple as Jquery's attr: var id = $("widget").attr("id");
Currently this shell code utilising xsltproc fails:
snag () {
TMP=$(tempfile)
cat << EOF > $TMP
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output method="text" encoding="utf-8" indent="no"/>
<xsl:template>
<xsl:value-of select="$1"/>
</xsl:template>
</xsl:stylesheet>
EOF
echo $(xsltproc $TMP config.xml)
rm -f $TMP
}
ID=$(snag "widget/#id")
if test "$ID" = "http://example.org/exampleWidget"
then
echo Mission accomplished.
else
echo "<$ID> is wrong."
fi
XMLStarlet (http://xmlstar.sourceforge.net/) is a nice command line tools that supports such queries:
xmlstarlet sel -N w=namespace -T -t -m "/w:widget/#id" -v . -n config.xml
template match="widget"
select value-of="#id"
<xsl:template xmlns:wgt="http://www.w3.org/ns/widgets" match="/wgt:widget">
<xsl:select value-of="#id" />
</xsl:template>
You don't need XSLT if you're not doing a transform.
If you only need to grab a value use XPath.
There's an xpath program that comes with Perl's XML::XPath module.
From the shell:
ID=$(xpath config.xml 'string(/widget/#id)' )
( The string() function is to get only the value of the id.
/widget/#id by itself returns "id=value" )
If you only need to produce some other output depending on the value, you could
do it all in xslt. There are also other XPath implementations available from
other scripting languages: I've used Java's XPath from both rhino and Jython.
There's also XQuery from the command line with Saxon.