Regex: Get first value from single line [duplicate] - regex

This question already has answers here:
Can you provide some examples of why it is hard to parse XML and HTML with a regex? [closed]
(12 answers)
RegEx match open tags except XHTML self-contained tags
(35 answers)
Closed 7 years ago.
I have the below xml on a single line, I want to get the string of DBB and replace it using regex
<Configuration ConfiguredType="Property" Path="\Package.Connections[DBA DB].Properties[ConnectionString]" ValueType="String"><ConfiguredValue>Data Source=.\test;Initial Catalog=DBA;Provider=SQLNCLI10.1;Integrated Security=SSPI;Auto Translate=False;Application Name=B;</ConfiguredValue></Configuration><Configuration ConfiguredType="Property" Path="\Package.Connections[DBB DB].Properties[ConnectionString]" ValueType="String"><ConfiguredValue>Data Source=.\test;Initial Catalog=DBB;Provider=SQLNCLI10.1;Integrated Security=SSPI;Auto Translate=False;Application Name=C;</ConfiguredValue></Configuration></DTSConfiguration>
I have the following which works on multi line xml but not this single line example
Data Source=.+?(?=[a-z])*\;Initial Catalog=DBB;(.*?)Integrated(.*?)[^;]*;
The above regex highlights both DBA and DBB and ends there.
Could you help in finding the missing piece in the regex I have created

Replace Data Source=.+? with Data Source=[^<]+? to avoid traversing the start of the tag.

Related

Python regex (.*?) isn't giving an output [duplicate]

This question already has answers here:
Python regex, matching pattern over multiple lines.. why isn't this working?
(2 answers)
Closed 4 years ago.
I'm making a project and part of it is taking in a python file as a text file and parsing it using regular expressions.
I was able to use this fine (where program is a string containing the code with newlines):
findall(r"def (.*?)\((.*?)\)", program)
But this line just gives None when I expect it to give a Match object where .group() returns "func1(None, None)"
mainblock = search(r'if __name__ == "__main__":(.*?)#END', program)
An abbreviated version of the python file I'm parsing is below:
def func1(stuff, morestuff):
pass
if __name__ == "__main__":
func1(None, None)
#END
I've checked for any discrepencies in the regex itself and I can't find any. I also tried copy/pasting it directly from the code file and it still couldn't find a match
You need to either include the newline characters \n in the regular expression, like this,
r'if __name__ == "__main__":\n(.*?)\n#END'
or enable the DOTALL flag, meaning that . also matches line breaks.
(MULTILINE means something else, which can be counterintuitive.)

Regex with Javascript to return ONLY the string between two strings [duplicate]

This question already has answers here:
Parse XML using JavaScript [duplicate]
(2 answers)
Closed 6 years ago.
I am trying to parse an itunes XML library. I want to get an array of all of the unique artist names from the XML file. I already tried converting it to JSON, but the way that itunes stores their library in XML made it incredibly difficult to access all of the artist names in the library. The regex seeemed much more effective for this purpose.
The format of the file is like this:
<dict>
<key>Track ID</key><integer>219</integer>
<key>Name</key><string>Something Sweet, Something Tender</string>
<key>Artist</key><string>Eric Dolphy</string>
<key>Album Artist</key><string>Eric Dolphy</string>
<key>Album</key><string>Out to Lunch (Remastered)</string>
<key>Genre</key><string>Jazz</string>
<key>Kind</key><string>Purchased AAC audio file</string>
<key>Size</key><integer>12175953</integer>
<key>Total Time</key><integer>363949</integer>
<key>Disc Number</key><integer>1</integer>
<key>Disc Count</key><integer>1</integer>
<key>Track Number</key><integer>2</integer>
<key>Track Count</key><integer>5</integer>
<key>Year</key><integer>1964</integer>
<key>Date Modified</key><date>2016-04-29T09:36:10Z</date>
<key>Date Added</key><date>2007-08-04T16:57:47Z</date>
<key>Bit Rate</key><integer>256</integer>
<key>Sample Rate</key><integer>44100</integer>
<key>Release Date</key><date>1964-02-25T00:00:00Z</date>
<key>Artwork Count</key><integer>1</integer>
<key>Sort Album</key><string>Out to Lunch</string>
<key>Sort Artist</key><string>Eric Dolphy</string>
<key>Sort Name</key><string>Something Sweet, Something Tender</string>
<key>Persistent ID</key><string>4AE13A27A2113C97</string>
<key>Track Type</key><string>Remote</string>
<key>Purchased</key><true/>
</dict>
I have an xml file, which can contain hundreds of different artists. "Data" is the contents of the xml file of which the above XML example is just one track.
I am using regex and string.match() to match:
<key>Artist</key><string>Eric Dolphy</string>
and return the artist's name. It returns an array of all of the matches, but I only want the artist name not the xml tags. I have found that using string.match() with regex /g in javascript returns an array containing all of the matched substrings, but the capture groups are not returned.
Is there a way in javascript that I can get an array returned of just the artist names without using str.replace() to replace everything I don't want with an empty string afterwards?
let artists = data.toString().match(/<key>Artist<\/key><string>(.*?)<\/string>/g);
let uniqueArtists = Array.from(new Set(artists))
Match returns an array of matches of your regex. The first item in the array will be the complete match, the second item will match your first sub pattern (between parentheses). etc..
What you're looking for is:
let artist = artists[1];

Need a simple reg ex for url checking [duplicate]

This question already has answers here:
What is the best regular expression to check if a string is a valid URL?
(62 answers)
Closed 9 years ago.
I am looking for it about 2 hours, but can not find what I need.
what I need is very simple:
allow: google.com, http://google.com, https://google.com
disallow spaces "goo gle.com"
with a valid domain: I mean it should have a dot "." + any domain (.com, .net etc.)
and allow anything after that: "googl.com/dsfsdf/sdfs/blablahblah/" without spaces
thanks
Edit:
Thanks all, I had to write it myself.
if (!/^((ftp|http|https):\/\/)?([a-z0-9_\.-]+)\.{1}([a-z0-9_\/\?\=\-\%-]+)$/.test(uri)
|| /([\._\/\?\=\-\%-])\1/.test(uri)) {
}
ps: I am noob in regexs.
www.google.com
http://www.google.com
mailto:somebody#google.com
somebody#google.com
www.url-with-querystring.com/?url=has-querystring
The REGEX below matches all the above cases
((([A-Za-z]{3,9}:(?:\/\/)?)(?:[-;:&=\+\$,\w]+#)?[A-Za-z0-9.-]+|(?:www.|[-;:&=\+\$,\w]+#)[A-Za-z0-9.-]+)((?:\/[\+~%\/.\w-_]*)?\??(?:[-\+=&;%#.\w_]*)#?(?:[\w]*))?)
REGEX Explanation can be found here
Working Example
Something that's working for me on a production product (haven't received any complaints yet):
((www\.|(http|https|ftp|news|file)+\:\/\/)?[_.a-z0-9-]+\.[a-z0-9\/_:#=.+?,##%&~-]*[^.|\'|\# |!|\(|?|,| |>|<|;|\)])

white space validation not working [duplicate]

This question already has answers here:
(Django) Trim whitespaces from charField
(6 answers)
Closed 8 years ago.
forms.py
I want to validate white space for following fields name1,name2 and name3.I tried the same in clean(),where i did other validation.Only white space validation is not accepting.
Thanks
Have you thought about using a RegexField, which would only accept the formats (incl whitespace) you want?
See RegexField in the docs

Pcrepp - Perl Regular Expression syntax to match host name [duplicate]

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
The Hostname Regex
I'm trying to use pcrepp (PCRE) to extract hostname from url.
the pcre regular expression is as same as Perl 5 regular expression.
for example:
url = "http://www.pandora.com/#/volume/73";
// the match will be "http://www.pandora.com/".
I can't find the correct syntax of the regex for this example.
Needs to work for any url: amazon.com/sds/ should return: amazon.com.
or abebooks.co.uk/isbn="62345627457245"/blabla/ should return abebooks.co.uk
I don't need to check if the url is valid. just to get the hostname.
Something like this:
^(?:[a-z]+://)?[^/]+/?
See Regexp::Common::URI::http which uses sub-patterns defined in Regexp::Common::URI::RFC2396. Examining the source code of those modules should give you a good idea how to put together a decent pattern.
Here is one possibility:
^[a-zA-Z0-9\-\.]+\.(com|org|net|mil|edu|COM|ORG|NET|MIL|EDU)$
And another:
^http\://[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(/\S*)?$
These and other URL related regular expressions can be found here: Regular Expression Library
string regex1, regex2, finalRegex;
regex1 = "^((\\w+):\\/\\/\\/?)?((\\w+):?(\\w+)?#)?([^\\/\\?:]+):?(\\d+)?(\\/?[^\\?#;\\|]+)?([;\\|])?([^\\?#]+)?\\??";
regex2 = "([^#]+)?#?(\\w*)";
//concatenation
finalRegex= regex1+regex2;
the result will be at the sixth place.
answered in another question I asked: Details.