Incrementing the last digit in a Python string - regex

I'd like to increment the last digit of user provided string in Python 2.7.
I can replace the first digit like this:
def increment_hostname(name):
try:
number = re.search(r'\d+', name).group()
except AttributeError:
return False
number = int(number) + 1
number = str(number)
return re.sub(r'\d+', number, name)
I can match all the digits with re.findall then increment the last digit in the list but I'm not sure how to do the replace:
number = re.findall(r'\d+', name)
number = numbers[-1]
number = int(number) + 1
number = str(number)

Use negative look ahead to see that there are no digits after a digit, pass a function to the re.sub() replacement argument and increment the digit in it:
>>> import re
>>> s = "foo 123 bar"
>>> re.sub('\d(?!\d)', lambda x: str(int(x.group(0)) + 1), s)
'foo 124 bar'
You may also want to handle 9 in a special way, for example, replace it with 0:
>>> def repl(match):
... digit = int(match.group(0))
... return str(digit + 1 if digit != 9 else 0)
...
>>> s = "foo 789 bar"
>>> re.sub('\d(?!\d)', repl, s)
'foo 780 bar'
UPD (handling the new example):
>>> import re
>>> s = "f.bar-29.domain.com"
>>> re.sub('(\d+)(?!\d)', lambda x: str(int(x.group(0)) + 1), s)
'f.bar-30.domain.com'

Related

Regex match string where symbol is not repeated

I have like this strings:
group items % together into% FALSE
characters % that can match any single TRUE
How I can match sentences where symbol % is not repeated?
I tried like this pattern but it's found first match sentence with symbol %
[%]{1}
You may use this regex in python to return failure for lines that have more than one % in them:
^(?!([^%]*%){2}).+
RegEx Demo
(?!([^%]*%){2}) is a negative lookahead that fails the match if % is found twice after line start.
You could use re.search as follows:
items = ['group items % together into%', 'characters % that can match any single']
for item in items:
output = item
if re.search(r'^.*%.*%.*$', item):
output = output + ' FALSE'
else:
output = output + ' TRUE'
print(output)
This prints:
group items % together into% FALSE
characters % that can match any single TRUE
Just count them (Python):
>>> s = 'blah % blah %'
>>> s.count('%') == 1
False
>>> s = 'blah % blah'
>>> s.count('%') == 1
True
With regex:
>>> re.match('[^%]*%[^%]*$','gfdg%fdgfgfd%')
>>> re.match('[^%]*%[^%]*$','blah % blah % blah')
>>> re.match('[^%]*%[^%]*$','blah % blah blah')
<re.Match object; span=(0, 16), match='blah % blah blah'>
re.match must match from start of string, use ^ (match start of string) if using re.search, which can match in the middle of a string.
>>> re.search('^[^%]*%[^%]*$','gfdg%fdgfgfd%')
>>> re.search('^[^%]*%[^%]*$','gfdg%fdgfgfd')
<re.Match object; span=(0, 12), match='gfdg%fdgfgfd'>
I am assuming that "sentence" in your question is the same as a line in the input text. With that assumption, you can use the following:
^[^%\r\n]*(%[^%\r\n]*)?$
This, along with the multi-line and global flags, will match all lines in the input string that contain 0 or 1 '%' symbols.
^ matches the start of a line
[^%\r\n]* matches 0 or more characters that are not '%' or a new line
(...)? matches 0 or 1 instance of the contents in parentheses
% matches '%' literally
$ matches the end of a line

Python 2.7 RE Search by condition

When I am using re.search, I have some problem.
For example:
a = '<span class="chapternum">1 </span>abc,def.</span>'
How can I search the number '1'?
Or how to search by matching digit start with ">" and end with writespace?
I tried:
test = re.search('(^>)(\d+)(\s$)', a)
print test
>> []
It is fail to get the number "1"
^ and $ indicate the beginning and the end of the string. If you get rid of them you have your answer:
>>> test = re.search('(>)(\d+)(\s)', a)
>>> test.groups()
('>', '1', ' ')
Not sure that you need the first and last groups though (capturing with parenthesis):
>>> a = '<span class="chapternum">23 </span>abc,def.</span>'
>>> test = re.search('>(\d+)\s', a)
>>> test.group(1)
'23'

Extracting Numbers from a String Without Regular Expressions

I am trying to extract all the numbers from a string composed of digits, symbols and letters.
If the numbers are multi-digit, I have to extract them as multidigit (e.g. from "shsgd89shs2011%%5swts"), I have to pull the numbers out as they appear (89, 2011 and 5).
So far what I have done just loops through and returns all the numbers incrementally, which I like but I cannot figure out how to make it stop
after finishing with one set of digits:
def StringThings(strng):
nums = []
number = ""
for each in range(len(strng)):
if strng[each].isdigit():
number += strng[each]
else:
continue
nums.append(number)
return nums
Running this value: "6wtwyw66hgsgs" returns ['6', '66', '666']
w
hat simple way is there of breaking out of the loop once I have gotten what I needed?
Using your function, just use a temp variable to concat each sequence of digits, yielding the groups each time you encounter a non-digit if the temp variable is not an empty string:
def string_things(strng):
temp = ""
for ele in strng:
if ele.isdigit():
temp += ele
elif temp: # if we have a sequence
yield temp
temp = "" # reset temp
if temp: # catch ending sequence
yield temp
Output
In [9]: s = "shsgd89shs2011%%5swts"
In [10]: list(string_things(s))
Out[10]: ['89', '2011', '5']
In [11]: s ="67gobbledegook95"
In [12]: list(string_things(s))
Out[12]: ['67', '95']
Or you could translate the string replacing letters and punctuation with spaces then split:
from string import ascii_letters, punctuation, maketrans
s = "shsgd89shs2011%%5swts"
replace = ascii_letters+punctuation
tbl = maketrans(replace," " * len(replace))
print(s.translate(tbl).split())
['89', '2011', '5']
L2 = []
file_Name1 = 'shsgd89shs2011%%5swts'
from itertools import groupby
for k,g in groupby(file_Name1, str.isdigit):
a = list(g)
if k == 1:
L2.append("".join(a))
print(L2)
Result ['89', '2011', '5']
Updated to account for trailing numbers:
def StringThings(strng):
nums = []
number = ""
for each in range(len(strng)):
if strng[each].isdigit():
number += strng[each]
if each == len(strng)-1:
if number != '':
nums.append(number)
if each != 0:
if strng[each].isdigit() == False:
if strng[each-1].isdigit():
nums.append(number)
number = ""
continue;
return nums
print StringThings("shsgd89shs2011%%5swts34");
// returns ['89', '2011', '5', '34']
So, when we reach a character which is not a number, and if the previously observed character was a number, append the contents of number to nums and then simply empty our temporary container number, to avoid it containing all the old stuff.
Note, I don't know Python so the solution may not be very pythonic.
Alternatively, save yourself all the work and just do:
import re
print re.findall(r'\d+', 'shsgd89shs2011%%5swts');

removing punctuation then counting the no of every word occurance using python

Hello everybody I am new to python and need to write a program to eliminate punctuation then count the number of words in a string. So I have this:
import sys
import string
def removepun(txt):
for punct in string.punctuation:
txt = txt.replace(punct,"")
print txt
mywords = {}
for i in range(len(txt)):
item = txt[i]
count = txt.count(item)
mywords[item] = count
return sorted(mywords.items(), key = lambda item: item[1], reverse=True)
The problem is it returns back letters and counts them and not words as I hoped. Can you help me in this matter?
How about this?
>>> import string
>>> from collections import Counter
>>> s = 'One, two; three! four: five. six##$,.!'
>>> occurrence = Counter(s.translate(None, string.punctuation).split())
>>> print occurrence
Counter({'six': 1, 'three': 1, 'two': 1, 'four': 1, 'five': 1, 'One': 1})
after removing the punctuation
numberOfWords = len(txt.split(" "))
Assuming one space between words
EDIT:
a={}
for w in txt.split(" "):
if w in a:
a[w] += 1
else:
a[w] = 1
how it works
a is set to be a dict
the words in txt are iterated
if there is an entry already for dict a[w] then add one to it
if there is no entry then set one up, initialized to 1
output is the same as Haidro's excellent answer, a dict with keys of the words and values of the count of each word

Using a Regex Back-reference In a Repetition Construct ({N})

I need to match a string that is prefixed with an acceptable length for that string.
For example, {3}abc would match, because the abc part is 3 characters long. {3}abcd would fail because abcd is not 3 characters long.
I would use ^\{(\d+)\}.{\1}$ (capture a number N inside curly braces, then any character N times) but it appears that the value in the repetition construct has to be a number (or at least, it won’t accept a backreference).
For example, in JavaScript this returns true:
/^\{(\d+)\}.{3}$/.test("{3}abc")
While this returns false:
/^\{(\d+)\}.{\1}$/.test("{3}abc")
Is this possible to do in a single regex, or would I need to resort to splitting it into two stages like:
/^\{(\d+)\}/.test("{3}abc") && RegExp("^\\{" + RegExp.$1 + "\\}.{" + RegExp.$1 + "}$").test("{3}abc")
Regular expressions can't calculate, so you can't do this with a regex only.
You could match the string to /^\{(\d+)\}(.*)$/, then check whether len($2)==int($1).
In Python, for example:
>>> import re
>>> t1 = "{3}abc"
>>> t2 = "{3}abcd"
>>> r = re.compile(r"^\{(\d+)\}(.*)$")
>>> m1 = r.match(t1)
>>> m2 = r.match(t2)
>>> len(m1.group(2)) == int(m1.group(1))
True
>>> len(m2.group(2)) == int(m2.group(1))
False