Parse string in xmllib instead of xmlParseFile - c++

I have string with xml content, writing it into a file and reading it back with xmlParseFile affects performance, if there is a way to parse the string directly, can you please show it with an example?

Consider xmlParseMemory instead.
I have string with xml content
So you should be able to do it like so:
const std::string xmlContent = "<something> </something>";
xmlDocPtr doc = xmlParseMemory(xmlContent.c_str(), xmlContent.length());

Related

How to search and replace from a SafeHtml variable in Angular?

I've a very simple question.
I've a sanitized string and its type in Angular is SafeHtml.
How would be the best approach to search and replace some Html inside this SafeHtml variable?
...
const sanitzedHtml: SafeHtml = this.sanitizer.bypassSecurityTrustHtml(changes.pureHtml.currentValue);
...
My goal is to replace some string with some extra html code, so the best would be to be able to search only within the html nodes, not really everywhere in the code.
Are there faster way than reconverting the SafeHtml variable into a string and apply a basic replace with a RegExp?
Thanks
Change HTML code before sanitize
1 - Using regex
You can change your code by using Regex on your html string, then sanitize it.
let html = "<div>myHtml</div>"
const regex = /myRegexPattern/i;
html.replace(regex, 'Replacement html part'));
2 - Using DocumentFragment
You can also create a fragment of your html, modify what you want in it and string it before start your sanitize function
let str:string = "<div id='test'>myHtml</div>";
const sanitzedHtml:SafeHtml = this.sanitizer.bypassSecurityTrustHtml(changeMyHtml(str));
function changeMyHtml(htmlString:string):string{
let fragment= document.createRange().createContextualFragment(str);
//do what you need to do here like for exemple
fragment.getElementById('test').innerHtml = "myHtmlTest";
//then return a string of the modified html
const serializer = new XMLSerializer();
return serializer.serializeToString(element)
}

Regex to grab data from massive HTML string

I am grabbing a HTML source dump that includes some sort of JSON props created by react.
Trying to grab data in syntax like this: "siteName":"Example Site". I want to grab that "Example Site" text without the quotations.
I know I could be using an HTML parser but this is actually within some JS code in the source.
Any thoughts on how I could do this? Thanks
With this regex you get it but I would use something else like a Json parser
var regex = /"siteName":"(.+?)"/g;
var str = `{"siteName":"ABC Example Business","contactName":"Jeff","siteKey":"abcexample","tabKey":"service","entityKey":"1192289","siteId":152285976,"entityId":13123055221,"phone":"","mobile":"0100 000 000",}`;
var result = regex.exec(str);
console.log(result[1]);
How about that:
\"siteName\":\"(.+)\"

Save Results in a list and give the list out to a csv

I work with Tika File Detector. It checks which file Typ my file is.
At the moment my Code is like this
if (Type.endsWith("application/msword")){ //Match if its .doc
}
else if (Type.endsWith("application/vnd.ms-powerpoint")){ //Match if its .ppt
}
else if (Type.endsWith("application/vnd.ms-excel")){ //Match if its .xls
}
else if (Type.endsWith("application/vnd.openxmlformats-officedocument.wordprocessingml.document")){ //Match if its .docx
Now I want to Store the Result in a list, which list has two entries. When I checked all files I want to save the list in a csv file.
I tried this with a hashmap but that didn't work.
You could use parallel arrays. I'm guessing one for the file name and one for the file type, but there is no need to store the info in a temporary data structure if you are just writing to .csv.
If you want to write filename, mime string and extension to a csv, do something like this, where you iterate through your files in main()...
static Tika tika = new Tika();
static MimeTypes mimeTypes = TikaConfig.getDefaultConfig().getMimeRepository();
static void processFile(Path p, Writer writer) throws IOException, MimeTypeException {
String mimeString = tika.detect(p);
MimeType mt = mimeTypes.forName(mimeString);
writer.write(String.format("%s,%s,%s,%n",
p.getFileName(),mimeString,mt.getExtension()));
}
You'll want to add exception handling, and it is always better to use a genuine CSV writer (see Apache Commons csv) than to "hope" than none of your data has a comma/newline or to roll your own.

How to replace text in content control after, XML binding using docx4j

I am using docx4j 2.8.1 with Content Controls in my .docx file. I can replace the CustomXML part by injecting my own XML and then calling BindingHandler.applyBindings after supplying the input XML. I can add a token in my XML such as ¶ then I would like to replace that token in the MainDocumentPart, but using that approach, when I iterate through the content in the MainDocumentPart with this (link) method none of my text from my XML is even in the collection extracted from the MainDocumentPart. I am thinking that even after binding the XML, it remains separate from the MainDocumentPart (??)
I haven't tried this with anything more than a little test doc yet. My token is the Pilcrow: ¶. Since it's a single character, it won't be split in separate runs. My code is:
private void injectXml (WordprocessingMLPackage wordMLPackage) throws JAXBException {
MainDocumentPart part = wordMLPackage.getMainDocumentPart();
String xml = XmlUtils.marshaltoString(part.getJaxbElement(), true);
xml = xml.replaceAll("¶", "</w:t><w:br/><w:t>");
Object obj = XmlUtils.unmarshalString(xml);
part.setJaxbElement((Document) obj);
}
The pilcrow character comes from the XML and is injected by applying the XML bindings to the content controls. The problem is that the content from the XML does not seem to be in the MainDocumentPart so the replace doesn't work.
(Using docx4j 2.8.1)

yaml-cpp parsing strings

Is it possible to parse YAML formatted strings with yaml-cpp?
There isn't a YAML::Parser::Parser(std::string&) constructor. (I'm getting a YAML string via libcurl from a http-server.)
Try using a stringstream:
std::string s = "name: YAML from libcurl";
std::stringstream ss(s);
YAML::Parser parser(ss);
In the new version, you can parse a string directly (see here):
YAML::Node node = YAML::Load("[1, 2, 3]");