Validate attribute names using regular expressions? - regex

I would like to know if it is possible to validate attribute names by pattern using XML Schema. In other words, I would like to describe a set of acceptable attribute names for a given type, using a pattern (such as a regular expression).
Lets say I have the following XML data I would like to validate:
<?xml version="1.0" encoding="utf-8"?>
<root xmlns="http://mywebsite.com/myns">
<somename data-someattr1="value1"
data-someattr2="value2"/>
</root>
How can I describe that attributes of elements with name "somename" can only have attributes with name beginning by "data-"? Is this even possible?

Try something along the lines of:
<xs:simpleType name="somename">
<xs:restriction base="xs:string">
<xs:pattern value="^data-"/>
</xs:restriction>
</xs:simpleType>
The regex ^data- means "beginning with 'data-'", as you require.
EDIT:
I misunderstood the question, sorry... Here is a more relevant answer:
As I understand it, you cannot pattern match attribute names in an XSD - so there is no solution to your problem using an XSD alone. However, you may find one of the following XML schema elements helpful in constructing a solution:
XML choice (http://www.w3schools.com/schema/el_choice.asp) - so you could (possibly?) list all "data-" attribute names explicitly.
XML any (http://www.w3schools.com/schema/schema_complex_any.asp) - so you could then perform any additional validation steps via some other method.

Well, as it was said you can't validate attribute names, but you still can go different way and transform your xml to some kind of:
<root>
<element>
<data name='data-attr1' value='v1'/>
<data name='data-attr2' value='v2'/>
<data name='data-attr3' value='v3'/>
</element>
</root>
So now you can validate fake attribute name - data name='data-attr1' as well as values. Your schema might look like this:
<xs:schema xmlns:xs='http://www.w3.org/2001/XMLSchema' elementFormDefault='qualified' attributeFormDefault='unqualified'>
<xs:element name='root'>
<xs:complexType>
<xs:sequence>
<xs:element name='element'>
<xs:complexType>
<xs:sequence>
<xs:element maxOccurs='10' name='data'>
<xs:complexType>
<xs:attribute name='name' use='required'>
<xs:simpleType>
<xs:restriction base='xs:string'>
<xs:pattern value='data-.*' />
</xs:restriction>
</xs:simpleType>
</xs:attribute>
<xs:attribute name='value' use='required' />
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>

Related

Allow all characters except # and $ in XSD

We have a requirement to allow all the characters (including special chars) except #, $ and space to an XSD element.
I’ve tried the regex as [^$#\s]* but didn’t work. Can you please help with the resolution as I'm not able to figure out.
I tried your regex in a XSD and it works as expected.
<?xml version="1.0" encoding="utf-16"?>
<xs:schema xmlns="http://Scratch.SO53903548" targetNamespace="http://Scratch.SO53903548" xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="Root">
<xs:complexType>
<xs:sequence>
<xs:element name="SpecialString2">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:pattern value="[^$#\s]*" />
</xs:restriction>
</xs:simpleType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
Will quite happily validate the below, but fail on $,# or space
<ns0:Root xmlns:ns0="http://Scratch.SO53903548">
<SpecialString2>thequickbrownfoxjumpedoverthelazydog#THEQUICKBROWNFOXJUMPEDOVERTHELAZYDOG!~`#%^&*()-_+=</SpecialString2>
</ns0:Root>

XML Schema for multiple email recipients

I need a sample XSD to support multiple email recipients in a new element. I require each recipient email address in a different element. Can anyone help me with explanation?
Example:
<EmailReceipts>
<address1></address1>
<address2></address2>
</EmailReceipts>
First off, I'd recommend not embedding an index number in the address elements:
<EmailReceipts>
<address>john#example.com</address>
<address>mary#example.org</address>
</EmailReceipts>
Then this XSD will validate the above XML (as well as other XML documents with additional address elements):
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="EmailReceipts">
<xs:complexType>
<xs:sequence>
<xs:element name="address" maxOccurs="unbounded" type="xs:string"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
The above XSD will allow any string contents for the address elements. If you've like to be more strict, you could use a regular expression to limit the values for address:
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="EmailReceipts">
<xs:complexType>
<xs:sequence>
<xs:element name="address" maxOccurs="unbounded" type="EmailAddressType"/>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:simpleType name="EmailAddressType">
<xs:restriction base="xs:string">
<xs:pattern value="([0-9a-zA-Z]([-.\w]*[0-9a-zA-Z])*#([0-9a-zA-Z][-\w]*[0-9a-zA-Z]\.)+[a-zA-Z]{2,9})"/>
</xs:restriction>
</xs:simpleType>
</xs:schema>
Note that the above regular expression is one of many possible, each having various degrees of generality and specificity over a syntax that is more involved than you might imagine.

xsd:SimpleType: How to restrict attribute to specific values and regex values

I have an attribute that can be any string, but if it has starting and ending brackets "^[.*]$" - it must be only one of the following specific values:
"[EVENT]"
and
"[PROTOCOL]"
So, "[EVENT]", "[PROTOCOL]", "SomeString" - are correct, but "[SomeString]" - isn't.
How can I achieve this?
Use an xs:simpleType and regular expressions to restrict the base xs:string type. You can have more than one xs:pattern to keep the alternative patterns simple. The element has to match one of the patterns or validation will fail. Since the patterns are regular expressions, special characters like "[" and "]" have to be escaped when used as literals.
XSD:
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="root">
<xs:complexType>
<xs:sequence>
<xs:element name="elem" type="myElemType" maxOccurs="unbounded"/>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:complexType name="myElemType">
<xs:attribute name="attrib" type="myAttribType"/>
</xs:complexType>
<xs:simpleType name="myAttribType">
<xs:restriction base="xs:string">
<xs:pattern value="\[EVENT\]"/><!-- "[EVENT]": okay -->
<xs:pattern value="\[PROTOCOL\]"/><!-- "[PROTOCOL]": okay -->
<xs:pattern value="[^\[].*"/><!-- Starts with anything but "[": okay -->
</xs:restriction>
</xs:simpleType>
</xs:schema>
XML:
<root>
<elem attrib="[EVENT]"/>
<elem attrib="[PROTOCOL]"/>
<elem attrib="SomeString"/>
<elem attrib="SomeString]"/>
<elem attrib=" [SomeString] "/>
<!-- All the above are okay; the ones below fail validation -->
<elem attrib="[SomeString]"/>
<elem attrib="[SomeString"/>
</root>
Modify the regular expressions to your heart's content, e.g., to fail the example with leading and/or trailing spaces.
Edited to reflect OP's comment that "[SomeString" should also be invalid.
I like #Burkart's use of separate xs:pattern elements better (+1), but I had this pending while awaiting clarification in comments ("[SomeString" without a closing bracket should be invalid), so I'll post it anyway in case anyone might find it useful. Read the regular expression as EVENT or PROTOCOL within brackets or any string that doesn't start or end with a bracket.
<?xml version="1.0" encoding="utf-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="root">
<xs:complexType>
<xs:attribute name="attr">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:pattern value="\[(EVENT|PROTOCOL)\]|[^\[].*[^\]]"/>
</xs:restriction>
</xs:simpleType>
</xs:attribute>
</xs:complexType>
</xs:element>
</xs:schema>

Restricting values in group of elements via XML Schema

I have a restriction to set on values of the element. As per the rules I want to set, following set of values are possible for my element.
<tags>
<tag>One of Audio, Video, Others.</tag>
<tag>For Audio, either Label or Record, For Video, either Studio or Producer, For Others this tag will be empty.</tag>
<tag>One of English, Spanish, French</tag>
</tags>
Now I could have set a regex pattern restriction in my XSD for a single tags element if it was plain text delimiter (,) separated values which might be
<element name="tags">
<simpleType>
<restriction base="string">
<pattern value="(Audio, (Label|Record)|Video, (Studio|Producer)|Others), (English|Spanish|French)" />
</restriction>
</simpleType>
</element>
But since I have a sequence of elements with same name tag, I am not sure it is even possible to restrict such way via XSD. I know I can restrict the values via enumeration but then I cannot group those. I want following XML to validate
<tags>
<tag>Audio</tag>
<tag>Record</tag>
<tag>English</tag>
</tags>
And following to fail validation
<tags>
<tag>Others</tag>
<tag>Record</tag>
<tag>English</tag>
</tags>
My real case is much more complex with nested restrictions, but I someone can help out in above condition, I think I can take it as a reference and solve my problem.
I don't think you can. If you have control of the schema why do you desire this specific rule set for validation? If you need this strict validation in this exact way you may need it done at the application level and not the document definition level. It appears what you really want is a way to tag different information based on certain tag "types". There really is no reason to have a list of elements all named tag, you know they are tags already from the parent elements name. Instead if you want validation based on the type of tags you should use different element types and structure your schema to validate against which types are allowed when and where. For your data this can be done using complex types and a choice model:
<xs:element name="audio">
<xs:complexType>
<xs:choice>
<xs:element name="Label" type="xs:string"/>
<xs:element name="Record" type="xs:string"/>
</xs:choice>
</xs:complexType>
</xs:element>
<xs:complexType name="generic">
<xs:choice>
<xs:element name="Studio" type="xs:string"/>
<xs:element name="Producer" type="xs:string"/>
</xs:choice>
</xs:complexType>
<xs:element name="video" type="generic"/>
<xs:element name="other" type="generic"/>
<xs:element name="tags">
<xs:complexType>
<xs:sequence>
<xs:choice>
<xs:element ref="audio"/>
<xs:element ref="video"/>
<xs:element ref="other"/>
</xs:choice>
<xs:element name="language">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:enumeration value="English"/>
<xs:enumeration value="Spanish"/>
<xs:enumeration value="French"/>
</xs:restriction>
</xs:simpleType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
I took liberty for Producer, Label, Studio, and Record that you would want the values for those types as well. If not, for your original case you can just use an attribute on the parent elements instead like this:
<xs:complexType name="generic">
<xs:attribute name="meta-type">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:enumeration value="Studio"/>
<xs:enumeration value="Producer"/>
</xs:restriction>
</xs:simpleType>
</xs:attribute>
</xs:complexType>
Instead of using a choice group you could use substitutionGroups but this would require each element to be derived from the same type which you may not want.
These schema's can be expanded quite easily and if you still need a generic <tag>'s list that doesn't need strict validation you could add it as part of the tags sequence definition.
Maybe someone can give you a better answer for your original requirements, but I hope this information helps.

definign a tag or pattern for xml schema

I am newbie in xml schema. Is there any possiblility to define that element starts with some characater or symbol. I mean to say, <xs:element minOccurs="1" maxOccurs="1" name="Header">
<xs:complexType>
<xs:sequence>
<xs:sequence>
<xs:element maxOccurs="1" minOccurs="1" name="NAME_STUDENTS">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:maxLength value="10"/>
</xs:restriction>
</xs:simpleType>
</xs:element>
is there any possible way to define a pattern or tag in xml schema that name of student starts with 'P'??And schema should recognize the text as element NAME_STUDENTS only if the text starts with 'P'
I'm not sure if I fully understand your question but concerning the name element restriction you should look into Xml Schema Regular Expressions.
In your case this would look like this:
<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" >
<xs:element name="Header">
<xs:complexType>
<xs:sequence>
<xs:element name="NAME_STUDENTS" type="filtered-students" />
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:simpleType name="filtered-students">
<xs:restriction base="xs:string">
<xs:pattern value="^[P]?"/>
</xs:restriction>
</xs:simpleType>
</xs:schema>
... but I must confess I'm not really a regex hero so you may want to check the pattern with somebody more proficient.