Get digits between slashes or on the end in URL - regex

I need a reg expression (for groovy) to match 7 digits between 2 slashes (in a url) or on the end of the url. So fe:
https://stackoverflow.com/questions/6032324/problem-with-this-reg-expression
I need 6032324 but it should also match:
https://stackoverflow.com/questions/6032324
If it has 1 digit more/less, I should not match.
Maybe its an easy reg exp but Im not so familiar with this :)
Thanks for you help!

Since you are parsing a URL, it makes sense to use an URL parser to first grab the path part to split with /. Then, you will have direct access to the slash-separated path parts that you may test against a very simple [0-9]{7} pattern and get them all with
def results = new URL(surl).path.split("/").findAll { it.matches(/\d{7}/) }
You may also take the first match:
def results = new URL(surl).path.split("/").findAll { it.matches(/\d{7}/) }.first()
Or last:
def results = new URL(surl).path.split("/").findAll { it.matches(/\d{7}/) }.last()
See the Groovy demo:
def surl = "https://stackoverflow.com/questions/6032324/problem-with-this-reg-expression"
def url = new URL(surl)
final result = url.path.split("/").findAll { it.matches(/\d{7}/) }.first()
print(result) // => 6032324

Related

Entire text is matched but not able to group in named groups

I have following example text:
my_app|key1=value1|user_id=testuser|ip_address=10.10.10.10
I want to extract sub-fields from it in following way:
appName = my_app,
[
{key = key1, value = value1},
{key = user_id, value = testuser},
{key = ip_address, value = 10.10.10.10}
]
I have written following regex for doing this:
(?<appName>\w+)\|(((?<key>\w+)?(?<equals>=)(?<value>[^\|]+))\|?)+
It matches the entire text but is not able to group it correctly in named groups.
Tried testing it on https://regex101.com/
What am I missing here?
I think the main problem you have is trying to write a regex that matches ALL the key=value pairs. That's not the way to do it. The correct way is based on a pattern that matches ONLY ONE key=value, but is applied by a function that finds all accurances of the pattern. Every languages supplies such a function. Here's the code in Python for example:
import re
txt = 'my_app|key1=value1|user_id=testuser|ip_address=10.10.10.10'
pairs = re.findall(r'(\w+)=([^|]+)', txt)
print(pairs)
This gives:
[('key1', 'value1'), ('user_id', 'testuser'), ('ip_address', '10.10.10.10')]
The pattern matches a key consisting of alpha-numeric chars - (\w+) with a value. The value is designated by ([^|]+), that is everything but a vertical line, because the value can have non-alpha numeric values, such a dot in the ip address.
Mind the findall function. There's a search function to catch a pattern once, and there's a findall function to catch all the patterns within the text.
I tested it on regex101 and it worked.
I must comment, though, that the specific text pattern you work on doesn't require regex. All high level languages supply a split function. You can split by vertical line, and then each slice you get (expcept the first one) you split again by the equal sign.
Use the PyPi regex module with the following code:
import regex
s = "my_app|key1=value1|user_id=testuser|ip_address=10.10.10.10"
rx = r"(?<appName>\w+)(?:\|(?<key>\w+)=(?<value>[^|]+))+"
print( [(m.group("appName"), dict(zip(m.captures("key"),m.captures("value")))) for m in regex.finditer(rx, s)] )
# => [('my_app', {'ip_address': '10.10.10.10', 'key1': 'value1', 'user_id': 'testuser'})]
See the Python demo online.
The .captures property contains all the values captured into a group at all the iterations.
Not sure, but maybe regular expression might be unnecessary, and splitting similar to,
data='my_app|key1=value1|user_id=testuser|ip_address=10.10.10.10'
x= data.split('|')
appName = []
for index,item in enumerate(x):
if index>0:
element = item.split('=')
temp = {"key":element[0],"value":element[1]}
appName.append(temp)
appName = str(x[0] + ',' + str(appName))
print(appName)
might return an output similar to the desired output:
my_app,[{'key': 'key1', 'value': 'value1'}, {'key': 'user_id', 'value': 'testuser'}, {'key': 'ip_address', 'value': '10.10.10.10'}]
using dict:
temp = {"key":element[0],"value":element[1]}
temp can be modified to other desired data structure that you like to have.

how do you extract a certain part of a URL in python?

I am looking to extract only a portion of a patterned URL:
https://<some_server>/server/rest/services/<that part I need>/<other data I do not need>/...
I just need to extract the portion after 'rest/services/' the first part can change, the last part can change, but the URL will always have 'rest/services/' and what I need follows, followed by '/...
What you can do is turn this string into a list by splitting it, and then access the nth part.
m = 'https://<some_server>/server/rest/services/<that part I need>/<other data I do not need>'
a = m.split('/')
a[6] //This is the part that you want
You could try this:
(?<=rest/services/) # look behind for this
[^/]+ # anything except a '/'
import re
rgxp = re.compile(r'(?<=rest/services/)[^/]+')
print re.findall(rgxp, text)

Using python regex to find repeated values after a header

If I have a string that looks something like:
s = """
...
Random Stuff
...
HEADER
a 1
a 3
# random amount of rows
a 17
RANDOM_NEW_HEADER
a 200
a 300
...
More random stuff
...
"""
Is there a clean way to use regex (in Python) to find all instances of a \d* after HEADER, but before the pattern is broken by SOMETHING_TOTALLY_DIFFERENT? I thought about something like:
import re
pattern = r'HEADER(?:\na \d*)*\na (\d*)'
print re.findall(pattern, s)
Unfortunately, regex doesn't find overlapping matches. If there's no sensible way to do this with regex, I'm okay with anything faster than writing my own for loop to extract this data.
(TL;DR -- There's a distinct header, followed by a pattern that repeats. I want to catch each instance of that pattern, as long as there isn't a break in the repetition.)
EDIT:
To clarify, I don't necessarily know what SOMETHING_TOTALLY_DIFFERENT will be, only that it won't match a \d+. I want to collect all consecutive instances of \na \d+ that follow HEADER\n.
How about a simple loop?
import re
e = re.compile(r'(a\s+\d+)')
header = 'whatever your header field is'
breaker = 'something_different'
breaker_reached = False
header_reached = False
results = []
with open('yourfile.txt') as f:
for line in f:
if line == header:
# skip processing lines unless we reach the header
header_reached = True
continue
if header_reached:
i = e.match(line)
if i and not breaker_reached:
results.append(i.groups()[0])
else:
# There was no match, check if we reached the breaker
if line == breaker:
breaker_reached = True
Not completly sure where you want the regex to stop please clarify
'((a \d*)\s){1,}'
import re
sentinel_begin = 'HEADER'
sentinel_end = 'SOMETHING_TOTALLY_DIFFERENT'
re.findall(r'(a \d*)', s[s.find(sentinel_begin): s.find(sentinel_end)])

Having trouble doing a search and replace in Ruby

I’m using Rails 4.2.3 and trying to do a regular expression search and replace. If my variable starts out like so …
url = “http://results.mydomain.com/json/search?eventId=974&subeventId=2320&callback=jQuery18305053194007595733_1464633458265&sEcho=3&iColumns=13&sColumns=&iDisplayStart=1&iDisplayLength=100&mDataProp_0=“
and then I run that through
display_start = url.match(/iDisplayStart=(\d+)/).captures[0]
display_start = display_start.to_i + 1000
url = url.gsub(/iDisplayStart=(\d+)/) { display_start }
The result is
http://results.mydomain.com/json/search?eventId=974&subeventId=2320&callback=jQuery18305053194007595733_1464633458265&sEcho=3&iColumns=13&sColumns=&1001&iDisplayLength=100&mDataProp_0=
But what I want is to simply replace the “iDisplayStart” parameter with my new value, so I would like the result to be
http://results.mydomain.com/json/search?eventId=974&subeventId=2320&callback=jQuery18305053194007595733_1464633458265&sEcho=3&iColumns=13&sColumns=&1001&iDisplayStart=1001&iDisplayLength=100&mDataProp_0=
How do I do this?
You can achieve what you want with
url = "http://results.mydomain.com/json/search?eventId=974&subeventId=2320&callback=jQuery18305053194007595733_1464633458265&sEcho=3&iColumns=13&sColumns=&iDisplayStart=1&iDisplayLength=100&mDataProp_0="
display_start = url.sub(/(?<=iDisplayStart=)\d+/) {|m| m.to_i+1000}
puts display_start
See the IDEONE demo
Since you replace 1 substring, you do not need gsub, a sub will do.
The block takes the whole match (that is, 1 or more digits that are located before iDisplayStart), m, and converts to an int value that we add 1000 to inside the block.
Another way is to use your regex (or add \b for a safer match) and access the captured vaalue with Regexp.last_match[1] inside the block:
url = "http://results.mydomain.com/json/search?eventId=974&subeventId=2320&callback=jQuery18305053194007595733_1464633458265&sEcho=3&iColumns=13&sColumns=&iDisplayStart=1&iDisplayLength=100&mDataProp_0="
display_start = url.sub(/\biDisplayStart=(\d+)/) {|m| "iDisplayStart=#{Regexp.last_match[1].to_i+1000}" }
puts display_start
See this IDEONE demo

Find and replace between second and third slash

I have urls with following formats ...
/category1/1rwr23/item
/category2/3werwe4/item
/category3/123wewe23/item
/category4/132werw3/item
/category5/12werw33/item
I would replace the category numbers with {id} for further processing.
/category1/{id}/item
How do i replace category numbers with {id}. I have spend last 4 hours with out proper conclusion.
Assuming you'll be running regex in JavaScript, your regex will be.
/^(\/.*?\/)([^/]+)/gm
and replacement string should look like $1whatever
var str = "your url strings ..."
var replStr = 'replacement';
var re = /^(\/.*?\/)([^/]+)/gm;
var result = str.replace(re, '$1'+replStr);
console.log(result);
based on your input, it should print.
/category1/replacement/item
/category2/replacement/item
/category3/replacement/item
/category4/replacement/item
/category5/replacement/item
See DEMO
We devide it into 3 groups
1.part before replacement
2.replacement
3.part after replacement
yourString.replace(//([^/]*\/[^/]+\/)([^/]+)(\/[^/]+)/g,'$1' + replacement+ '$3');
Here is the demo: https://jsfiddle.net/9sL1qj87/