golang template escape first char - templates

I'm trying to build sitemap XML file with the standard template package.
But the first charset "<" become "&lt ;", and make the XML unreadable for clients.
package main
import (
"bytes"
"fmt"
"html/template"
)
const (
tmplStr = `{{define "indexSitemap"}}<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<sitemap>
<loc>https://www.test.com/sitemap.xml</loc>
</sitemap>
<sitemap>
<loc>https://www.test.com/events-sitemap.xml</loc>
</sitemap>
<sitemap>
<loc>https://www.test.com/gamesAndTeams-sitemap.xml</loc>
</sitemap>
</sitemapindex>{{end}}`
)
func main() {
// Parse the template and check for error
tmpl, parseErr := template.New("test").Parse(tmplStr)
if parseErr != nil {
fmt.Println(parseErr)
return
}
// Init the writer
buf := new(bytes.Buffer)
// Execute and get the template error if any
tmplExErr := tmpl.ExecuteTemplate(buf, "indexSitemap", nil)
if tmplExErr != nil {
fmt.Println(tmplExErr)
return
}
// Print the content malformed
fmt.Println(buf)
}
playground golang
Is that normal?
How can I make it works normaly.
Thanks in advance

Your example shows you're using the html/template package, which auto-escapes text for html usage.
If you want a raw template engine, use the text/template package instead - the html one just wraps it with context-aware escaping.
However, you'll need to make sure by yourself that the texts you output with the raw template engine are XML-safe. You can do this by exposing some escape function to your template, and passing all texts via this function instead of writing them directly.
[EDIT] It looks like a bug in html/template, if you omit the ? from the xml declaration it works okay. But still my advice stands - if it's not html you're better off using the text/template package. Actually, even better, describe the site map as a struct and don't use a template at all, just XML serialization.

Also see issue #12496 on github which confirms they are not planning to fix this.
https://github.com/golang/go/issues/12496
Probably because this is the HTML templating package and you're trying
to produce XML. I suspect that it doesn't know how to parse the
directives with the question mark there.
You probably want to use the text/template package instead, if you're
not going to be taking advantage of any of the HTML auto-escaping
features.

Related

Regex Capture group does not operate as expected from regex builder website in golang

essentially, I'm trying to build capture groups in golang. I'm utilizing the following web page, which seems to indicate that this should work properly as I've written it
For random reasons this is time sensitive, I'm sure you can sympathize
package main
import (
"fmt"
"regexp"
)
func main() {
var r = regexp.MustCompile(`/number(?P<value>.*?)into|field(?P<field>.*?)of|type(?P<type>.*?)$/g`)
fmt.Printf("%#v\n", r.FindStringSubmatch(`cannot unmarshal number 400.50 into Go struct field MyStruct.Numbers of type int64`))
fmt.Printf("%#v\n", r.SubexpNames())
}
This of course produces a result that I don't expect, which is inconsistent with the results on the regex builder website. This is probably because it was built for use with a different language, but I'm ignorant of another website that is more suited for golang that also supports building capture groups, and could use an assist on this one, as it's out of my usual wheelhouse.
the output of the above code using the regex format I have provided is
[]string{"field", "", "", ""}
[]string{"", "value", "field", "type"}
I'd love for it to be as close as possible to
[]string{"field", "cannot unmarshal number (number)", "into go struct (Mystruct.Numbers)", "of type (int64)"}
[]string{"", "value", "field", "type"}
just as it shows on the regex scratchpad above.
It would also be convenient to only match the first instance that matches.
This looks like an XY Problem.
Extract the data directly from the json.UnmarshalTypeError instead of parsing the string representation of the error.
This program:
var v MyStruct
err := json.Unmarshal([]byte(`{"numbers": 400.50}`), &v)
if e, ok := err.(*json.UnmarshalTypeError); ok {
fmt.Printf("Value: %s\nStruct.Field: %s\nType: %s\n",
e.Value, e.Struct+"."+e.Field, e.Type)
}
prints the output:
Value: number 400.50
Struct.Field: MyStruct.Numbers
Type: int64
Run it on the Go playground.

In golang gin simple template example, how do you render a string without quotes?

Using example golang gin code from README:
func main() {
router := gin.Default()
router.LoadHTMLGlob("templates/*")
router.GET("/", func(c *gin.Context) {
c.HTML(http.StatusOK, "index.tmpl",
gin.H{
"foo": "bar",
})
}
}
// in template index.tmpl
<script>
{{.foo}}
</script>
// result in html
<script>
"bar"
</script>
But how can I get it without the quotes, I need just bar vs "bar"?
the template package implements an HTML context aware engine to provide injection safe html.
In other words it knows it executes inside a script tag, thus it does not output raw string but json encoded strings compatible with js.
To fix it, unlike the comment suggests, make the string a template.JS value and the security measures will not attempt to protect the strings.
ref
- https://golang.org/pkg/html/template/
Package template (html/template) implements data-driven templates for
generating HTML output safe against code injection.
https://golang.org/pkg/html/template/#JS
Use of this type presents a security risk: the encapsulated content
should come from a trusted source, as it will be included verbatim in
the template output.
package main
import (
"html/template"
"os"
)
func main() {
c := `<script>
{{.foo}}
{{.oof}}
</script>`
d := map[string]interface{}{"foo": "bar", "oof": template.JS("rab")}
template.Must(template.New("").Parse(c)).Execute(os.Stdout, d)
}
https://play.golang.org/p/6qLnc9ALCeC

Automatic asset revision filenames in Go HTML templates

I'm looking for help on implementing something that automatically includes versioned filenames in a Go HTML template. For example, in my template I have something like this in the head:
<link rel="stylesheet" href="{{ .MyCssFile }}" />
The stylesheets themselves have a chunk of MD5 hash appended to the name from a gulp script called gulp-rev
stylesheet-d861367de2.css
The purpose is to ensure new changes are picked up by browsers, but also allow caching. Here is an example implementation in Django for a better explanation:
https://docs.djangoproject.com/en/1.9/ref/contrib/staticfiles/#manifeststaticfilesstorage
A subclass of the StaticFilesStorage storage backend which stores the file names it handles by appending the MD5 hash of the file’s content to the filename. For example, the file css/styles.css would also be saved as css/styles.55e7cbb9ba48.css.
The purpose of this storage is to keep serving the old files in case some pages still refer to those files, e.g. because they are cached by you or a 3rd party proxy server. Additionally, it’s very helpful if you want to apply far future Expires headers to the deployed files to speed up the load time for subsequent page visits.
Now I'm wondering how to best pull this off in Go? I intend to serve the files from the built in file server.
My current thoughts are:
Have a loop that checks for the newest stylesheet file in a directory. Sounds slow.
Do some kind of redirect/rewrite to a generically named file (as in file.css is served on a request to file-hash.css).
Have Go manage the asset naming itself, appending the hash or timestamp.
Maybe its better handled with nginx or something else?
Write a template function to resolve the name. Here's an example template function:
func resolveName(p string) (string, error) {
i := strings.LastIndex(p, ".")
if i < 0 {
i = len(p)
}
g := p[:i] + "-*" + p[i:]
matches, err := filepath.Glob(g)
if err != nil {
return "", err
}
if len(matches) != 1 {
return "", fmt.Errorf("%d matches for %s", len(matches), p)
}
return matches[0], nil
}
and here's how to use it in a template when registered as the function "resolveName":
<link rel="stylesheet" href="{{ .MyCssFile | resolveName }}" />
playground example
This function resolves the name of the file every time the template is rendered. A more clever function might cache names as they are resolved or walk the directory tree at startup to prebuild a cache.
I knew it's too old, but maybe this library will help you. It allows to collect and hash static files. Also it has function to reverse file path from the orignal location to the hashed location:
staticFilesPrefix := "/static/"
staticFilesRoot := "output/dir"
storage := NewStorage(staticFilesRoot)
err := storage.LoadManifest()
funcs := template.FuncMap{
"static": func(relPath string) string {
return staticFilesPrefix + storage.Resolve(relPath)
},
}
tmpl := template.Must(
template.New("").Funcs(funcs).ParseFiles("templates/main.tpl")
)
Now you can call static function in templates like this {{static "css/style.css"}}. The call will be converted to /static/css/style.d41d8cd98f00b204e9800998ecf8427e.css.

XML Name space issue revisited

XML Name space issue revisited:
I am still not able to find a good solution to the problem that the findnode or findvalue does not work when we have xmlns has some value.
The moment I set manually xmlns="", it starts working. At least in my case. Now I need to automate this.
consider this
< root xmlns="something" >
--
---
< /root>
My recommended solution :
dynamically set the value to xmlns=""
and when the work is done automatically we can reset to the original value xmlns="something"
And this seems to be a working solution for my XMLs only but its stll manual.
I need to automate this:
How to do it 2 options:
using Perl regex, or
using proper LibXML setNamespace etc.
Please put your thought in this context.
You register the namespace. The point of XML is not having to kludge around with regexes!
Besides, it's easier: you create an XML::LibXML::XPathContext, register your namespaces, and use its find* calls with your chosen prefixes.
The following example is verbatim from a script of mine to list references in Visual Studio projects:
(...)
# namespace handling, see the XML::LibXML::Node documentation
my $xpc = new XML::LibXML::XPathContext;
$xpc->registerNs( 'msb',
'http://schemas.microsoft.com/developer/msbuild/2003' );
(...)
my $tree; eval { $tree = $parser->parse_file($projfile) };
(...)
my $root = $tree->getDocumentElement;
(...)
foreach my $attr ( find( '//msb:*/#Include', $root ) )
{
(...)
}
(...)
sub find { $xpc->find(#_)->get_nodelist; }
(...)
That's all it takes!
I only have one xmlns attribuite at the top of the XML once only so this works for me.
All I did was first to remove the namespace part i.e. remove the xmlns from my XML file.
NODE : for my $node ($conn->findnodes("//*[name()='root']")) {
my $att = $node->getAttribute('xmlns');
$node->setAttribute('xmlns', "");
last NODE;
}
using last just to make sure i come of the for loop in time.
And then once I am done with the XML parsing I will replace the
<root>
with
<root xmlns="something">
using simple Perl file operation or sed editor.

How to replace text in content control after, XML binding using docx4j

I am using docx4j 2.8.1 with Content Controls in my .docx file. I can replace the CustomXML part by injecting my own XML and then calling BindingHandler.applyBindings after supplying the input XML. I can add a token in my XML such as ¶ then I would like to replace that token in the MainDocumentPart, but using that approach, when I iterate through the content in the MainDocumentPart with this (link) method none of my text from my XML is even in the collection extracted from the MainDocumentPart. I am thinking that even after binding the XML, it remains separate from the MainDocumentPart (??)
I haven't tried this with anything more than a little test doc yet. My token is the Pilcrow: ¶. Since it's a single character, it won't be split in separate runs. My code is:
private void injectXml (WordprocessingMLPackage wordMLPackage) throws JAXBException {
MainDocumentPart part = wordMLPackage.getMainDocumentPart();
String xml = XmlUtils.marshaltoString(part.getJaxbElement(), true);
xml = xml.replaceAll("¶", "</w:t><w:br/><w:t>");
Object obj = XmlUtils.unmarshalString(xml);
part.setJaxbElement((Document) obj);
}
The pilcrow character comes from the XML and is injected by applying the XML bindings to the content controls. The problem is that the content from the XML does not seem to be in the MainDocumentPart so the replace doesn't work.
(Using docx4j 2.8.1)