Parsing with fscanf() ignoring spaces or missing values? - c++

I'm trying to scan a text file with XML, the XML has a number of items with this structure:
<enemy>
<type> 0 </type>
<x> 273 </x>
<y> 275 </y>
<event> </event>
</enemy>
The problem is that the xml may have spaces between tags or inside them. I created a loop and I'm trying to do a single scan in each iteration to get int type, x, y and event into a variable each. However I don't know how to ignore whitespaces nor how to handle missing values since some tags may or may not have a value (like event).
How can I scan this "enemy" regadless of spacing and missing values?

That's an easy one - you do not parse XML using fscanf(). Use a real XML parser otherwise you will end up with a very complicated code that will not work 80% of the time either returning wrong data or crashing.
XML format (despite seeming simplicity) is complicated even in most innocuous cases and existing XML parsers are there for a reason. See libxml or a lot of others.
Still, if you are hell-bent on parsing XML yourself, the right way to do it is to first tokenize the input and then ensure that your token sequences result in correct forms. That's way more complicated than using simple fscanf().

Related

How to get ampersand "&" in output of Transform xml activity of TIBCO

Could anyone please help in getting the ampersand "&" output of Transform xml activity of TIBCO .
My requirement is the xmlstring from Transform xml activity is mapped to Parse xml (which will give the final output ) .Ex; Maitree&Sons. What should be passed in xslt so that when the output from Transform xml goes to Parse xml it will give the final result as "&".
I tried using CDATA and disable-escaping-output also in xslt but in parse xml it fails.
Please help.
Generally XSLT won't allow you to produce invalid output. The correct representation in XML is Maitree&Sons and this is what it produces. If it produced Maitree&Sons, this would be invalid XML and would be thrown out by an XML parser trying to read the document.
Having said that, it's possible using disable-output-escaping to produce an unescaped ampersand if your XSLT processor supports this option. If it's not working for you we need to know exactly what you did and how it failed.
(General rule: on SO, always tell us exactly what you did and exactly how it failed. Saying in general terms that you tried lots of things and none of them worked doesn't get us any nearer to a solution.)
LATER
I'm reading the question again. You want to produce output from the transformer that will go into an XML parser, such that the output of the parser is Maitree&Sons. Well, in that case the lexical XML must be Maitree&Sons, which it will be if you generate the string Maitree&Sons in XSLT. But XSLT is XML, so if you want to write this as a literal string in your stylesheet, it will be written Maitree&Sons.
I guess we need a much clearer picture of what you are doing and where it is going wrong.

how to change links in a xml or any other file with regex or some other way?

I have a file with a lot of xml nodes and they are linked together with an id. I need to change the id of a node as well as the link.
<event id="12345">
<action>6789</action>
</event>
<action id="6789">
<name>pre-filter1</name>
<someotherlink>45678</someotherlink>
</action>
I need to change the id of action nodes and the reference wherever it is being linked from. I was looking into regex because I have to do it for some action nodes only with some specific name like pre-filter here. the id needs to be processed by some logic before replacing with the new value. the order of nodes is random.
I only need to do it once for the whole file and any way is fine. also time complexity is not a constraint.
Any help is appreciated.
Perl supports using functions on the replacement of a regular expression. Not sure about other languages.
If you are not using perl, you may do the following:
1) Get all action ids for a given name with this regexp:
<action\s*id="(\d+)">(?=[^=]*<name>pre-filter\d<\/name>).*?<\/action>
https://regex101.com/r/Q7lKgx/1
2) Convert values and store both original id and converted value in a hash.
3) Loop the hash and use a regexp to replace the id with the new value
This matches both action and action id:
(<action(?:\s*id="|>))(THE_ID)("|<\/action)> ==> replace with \1NEW_ID\3
Anyways, parsing XML with regexes is usually not a good idea, so It would be even better to use some library to parse xmls.

XSLT output format

I am using XSLT to generate an .sql file from an .xml input file.
I have some problems with the indentation.
The way the stylesheet is formatted (how many line feeds and carriage returns and tabs) directly effects the output file i.e. if I include a few line feeds and CRs in my stylesheet to make it more readable, they are displayed in the output file as well (this would not be that bad if the tabs didn't affect the formatting of the output file as well):
It looks like this:
SQLStatement1<CR><LF>
<CR><LF>
<CR><LF>
SQLStatement2<CR><LF>
.... (tabs are also outputted)
I use an ant task to create the .sql file. The target looks like this:
<xslt in="input.xml"
out="queries.sql"
style="createQueries.xls">
</xslt>
I am using XSLT 1.0 and cannot use XSLT 2.0.
I thought about modifying some output parameters. However it does not have any effect if I change the method attribute to e.g. 'html' (I guess that the method is set to 'text' since the type of the output file(sql) is not known)
Any ideas on how to fix this issue?
Cheers
You would make it much easier on us if you showed a small but complete XML input sample, an XSLT sample, the output you get and the output you want.
If you use xsl:output method="text" and want to control the white space then make sure you use xsl:text to output literal text and xsl:value-of to output computed text. That way you should be able to control the white space exactly.

Regex to Replace Node Attribute Contents

I have an xml document like the following:
<nodes> <node idName="employee">Some Text Here "employee" idName="employee" employee<innderNode idName="manager">Some Manager Text Here manager manager "manager" </innerNode> </node> </nodes>
How do I replace "employee" with "supervisor" and replace "manager" with "employee" ONLY in the attributes?
Thanks,
g
A regex is not able to handle the class of languages an XML is part of. However there is of course a hacky way to do this:
You could just match for idName="something" - including the equals sign and the quotes - and replace it with idName="somethingelse"
However, this of course only works when the exact string as shown above is certain not to show up in any XML element body as text. If this is the case, there is really no way that leads around a proper XML parser.
Although modern regexes can often handle more than regular languages, the can only handle so much. You will need a context free grammar to parse XML.
I agree that you should, in an ideal world, be using a proper XML parser.
However, the world isn't ideal, and regexes can handle this if you need them to.
Here is an example which will work with perl/sed, it should be easy to convert to any lang:
s/<node idName="employee">(.*?)<\/node>/<node idName="supervisor">$1<\/node>/g
This could easily be modified to include other attributes, it would look somthing like this:
s/<node (.*?idName=)"employee"(.*?)>(.*?)<\/node>/<node $1"supervisor"$2>$3<\/node>/g
And so on, watch out for it getting hungry for memory if the XML contains large chunks though.

A XML to CSV transformation, with complications

I swear I have looked at the existing threads! But I still need help.
I need to take some very messy XML and convert it to a very neat CSS file for upload to a website database.
I don't really need a finished solution, but I need help with understanding the process I should follow to solve my problem in XSLT. I won't ask you all to code for me, just tell me the elements and template structure I need. I would also love if the community could explain the logic behind the process, so that I can modify it as needed.
I have xml that has records in all orders and numbers:
<record-list>
<record>
<title>Title One</title
<author>Author One</author>
<subject>
Subject One A
Subject One B
Subject One C
</subject>
<subject>Subject Two</subject>
<subject>Subject Three</subject>
<subject>Subject Four</subject>
</record>
<record>
<subject>Subject Five</subject>
<title>Title Two</title>
<useless-element>Extra Stuff One</useless-element>
</record>
<record>
<title>Title Three</title>
<subject>Subject Six</subject>
<author/>
</record>
</record-list>
So I have multiple numbers of repeated elements, some missing elements, some empty elements, elements out of order, and some elements with extra line breaks.
I need a CSV file which reads as below, or with a different number of subject repeats (see requirements below)
"Title","Subject","Subject","Subject","Author"
"Title One","Subject One A ; Subject One B ; Subject One C","Subject Two","Subject Three","Author One"
"Title Two", "Subject Five","","",""
"Title Three","Subject Six","","",""
Requirements for the final output
-The number of columns of any repeated elements either needs to match the record with the most repeats of that element, or the program needs to chop off any repeats past a certain number.
-Each new record needs a line break and no other line breaks can exist in the files (only as record delimiters).
-The elements each need to be in the same order for each record.
-Each element text needs quotes around it (to handle intrinsic commas).
-Missing or empty elements need blank, comma surrounded quotes.
-Extra elements can't be sent through to the output
What I have done:
I have figured out how to get rid of the extra line breaks within the elements using the translate function, although I would love a solution that lets me replace the line breaks with more than one character (right now I will have to run find-and-replace to change a placeholder character to a space-semicolon-space in my output). I can get the quotes, commas, and line breaks in the output with text elements and strip-whitespace.
However, I don't know how to straighten out the order of the elements, handle the element repeats, or put through only some elements while still using the element as the cue for the line-break.
Right now, I just need a solution that works, even if all sorts of manual manipulation or multiple style-sheets are required. I can even do a find and replace in a text editor, as long as the output is good. Please help with an XSLT solution, I don't even begin to know any other suitable programing languages (college matlab many years ago is not helping).
I think I need to run two transforms. I looked at the XSLT bible, Mangano's XSLT Cookbook, where he used two transforms for a similar problem. However, his solution is so generalized, I can't understand it. If I can't figure out how it works, I can't modify it for my needs. Sorry, but without a programming background, the explanations on this site and in the text are challenging at best. However, I think I am presenting a problem with some novel features, compared to others asked on this forum.
Any help, be it non-generalized code, or even just a suggested procedure for multiple runs through my processor would be wonderful. I have been struggling with this for over a week and have made very little progress.
Thanks
CAMc
I'd suggest having a look at A CSV to XML converter in XSLT 2.0. There's a lot of useful info on that page, including how to run it.