What is a better way to write this regular expression? - regex

I am converting XML children into the element parameters and have a dirty regex script I used in Textmate. I know that dot (.) doesn't search for newlines, so this is how I got it to resolve.
Search
language="(.*)"
(.*)<education>(.*)(\n)?(.*)?(\n)?(.*)?(\n)?(.*)?</education>
(.*)<years>(.*)</years>
(.*)<grade>(.*)</grade>
Replace
grade="$13" language="$1" years="$11">
<education>$3$4$5$6$7$8$9</education>
I know there's a better way to do this. Please help me build my regex skills further.

Use an xml parser, don't use regex to parse xml.

If there are no other tags inside the <education> element, I would change that part to:
<education>([^<>]*)</education>
If possible, I would use the same technique everywhere else you're using .*. In the case of the language attribute, it would take this form:
language="([^"]*)"

Related

Regular Expression to match parent and sub node

I want to development a regular expresion to match the tag :
<claim-text>aaaaaaa
<claim-text>bbbbbbb</claim-text>
<claim-text>ccccccc</claim-text>
</claim-text>
I tried
<claim-text>(.*)</claim-text>
But, only bbbbbbb and ccccccc can be matched. Can I get some help to cover aaaaaaa also?
Thanks
For a generic solution with any depth, you will at least need a stack, which not available for most regular expression implementation. However, if you know the structure will only have the depth you specified, you could use something like this:
<claim-text>([^<\r\n]*)
You can see a working example here: https://regex101.com/r/kbDbwF/1
It will search for your opening tag, and then find anything up to the next opening or closing tag [^<], or to the next line break [^\r\n]. I have combined both character classes to one definition [^<\r\n]. However, this is not a general solution!
Do not under any circumstances try to parse HTML with a regex unless you wish to invoke rite 666 Ph'nglui mglw'nafh Cthulhu R'lyeh wgah'nagl fhtagn.
Use an HTML parsing library see this page for some ways to do it.

Problems with finding and replacing

Hey stackoverflow community. Ive need help with huge information file. Is it possible with regular expression to find in this tag:
<category_name><![CDATA[Prekiniai ženklai>Adler|Kita buitinė technika>Buičiai naudingi prietaisai|Kita buitinė technika>Lygintuvai]]></category_name>
Somehow replace all the other data and leave only 'Adler' or 'Lygintuvai'. Im using Altova to edit xml files, so i cant find other way then find-replace. And im new in the regex stuff. So i thought maby you can help me.
#\<category_name\>.+?gt\;([\w]+?)\|.+?gt;([\w]+?)\]\]\>\<\/category_name\>#i
\1 - Adler
\2 - Lygintuvai
PHP
regex101.com
Fields may contain alphanumeric characters without spaces.
If you want to modify the scope of acceptable characters change [\w] to something other:
[a-z] - only letters
[0-9] - only digits
etc.
It's possible, but use of regular expressions to process XML will never be 100% correct (you can prove that using computer science theory), and it may also be very inefficient. For example, the solution given by Luk is incorrect because it doesn't allow whitespace in places where XML allows it. Much better to use XQuery or XSLT, both of which are designed for the job (and both work in Altova). You can then use XPath expressions to locate the element or attribute nodes you are interested in, and you can still use regular expressions (e.g. in the XPath replace() function) to process the content of text or attribute nodes.
Incidentally, your input is rather strange because it uses escape sequences like > within a CDATA section; but XML escape sequences are not recognized in a CDATA section.

Is there a function to create a regex pattern from a string input?

I'm lousy at regular expressions but occasionally they're the only thing that's the right solution for a problem.
Is there something in the .NET framework that allows you to input an unencoded string and get a pattern from it? Which you could then modify as required?
e.g. I want to remove a CDATA section that contains a file from some XML but I can't work out what the right pattern is for <![CDATA[hugepileofrandombinarydataherethatalsoneedstogo]]> and I don't want to ask for help each time I'm stuck on a regex pattern.
Such tools exist, google by "regex generator".
But, as suggested in comments, better learn regex. Simple patterns are easy. Something like <!\[.*?]]>
in your case.
There are Regex Design tools like expresso...
http://www.ultrapico.com/expresso.htm
It's not perfect but as there is no suitable .Net component the text to regex page at txt2re.com is the best I've seen for those people who occasionally need to build a regex to match a string but don't have the time to relearn regex each time they want to use one.

Need to create a gmail like search syntax; maybe using regular expressions?

I need to enhance the search functionality on a page listing user accounts. Rather than have multiple search boxes for each possible field, or a drop down menu where the user can only search against one field, I'd like a single search box and to use a gmail like syntax. That's the best way I can describe it, and what I mean by a gmail like search syntax is being able to type the following into the input box:
username:bbaggins type:admin "made up plc"
When the form is submitted, the search string should be split into it's separate parts, which will allow me to construct a SQL query. So for example, type:admin would form part of the WHERE clause so that it would find any record where the field type is equal to admin and the same for username. The text in quotes may be a free text search, but I'm not sure on that yet.
I'm thinking that a regular expression or two would be the best way to do this, but that's something I'm really not good at. Can anyone help to construct a regular expression which could be used for this purpose? I've searched around for some pointers but either I don't know what to search for or it's not out there as I couldn't find anything obvious. Maybe if I understood regular expressions better it would be easier :-)
Cheers,
Adam
No, you would not use regular expressions for this. Just split the string on spaces in whatever language you're using.
You don't necessarily have to use a regex. Regexes are powerful, but in many cases also slow. Regex also does not handle nested parameters very well. It would be easier for you to write a script that uses string manipulation to split the string and extract the keywords and the field names.
If you want to experiment with Regex, try the online REGex tester. Find a tutorial and play around, it's fun, and you should quickly be able to produce useful regexes that find any words before or after a : character, or any sentences between " quotation marks.
thanks for the answers...I did start doing it without regex and just wondered if a regex would be simpler. Sounds like it wouldn't though, so I'll go back to the way I was doing it and test it again.
Good old Mr Bilbo is my go to guy for any naming needs :-)
Cheers,
Adam

matching table tag by regular expression in php

I need to match a substring in php substring is like
<table class="tdicerik" id="dgVeriler"
I wrote a regular expression to it like <table\s*\sid=\"dgVeriler\" but it didnot work where is my problem ?
You forgot a dot:
<table\s.*\sid="dgVeriler"
would have worked.
<table\s+.*?\s+id="dgVeriler"
would have been better (making the repetition lazy, matching as little as possible).
<table\s+[^>]*?\s+id="dgVeriler"
would have been better still (making sure that we don't accidentally match outside of the <table>tag).
And not trying to parse HTML with regular expressions, using a parser instead, would probably have been best.
I dont know what you want get but try this:
<table\s*.*id=\"dgVeriler\"