Manipulating strings python 2.7 - python-2.7

I am trying to code a program that will insert specific numbers before parts of an input, for example given the input "171819-202122-232425" I would like it to split up the number into pieces and use the dash as a delimiter. I have split up the number using list(str(input)) but have no idea how to insert the appropriate numbers. It has to work for any number Thanks for the help.
Output =
(number)17
(number)18
(number)19
(number+1)20
(number+1)21
(number+1)22
(number+2)23
(number+2)24
(number+2)25

You could use split and regexps to dig out lists of your numbers:
Code
import re
mynum = "171819-202122-232425"
start_number = 5
groups = mynum.split('-') # list of numbers separated by "-"
number_of_groups = xrange(start_number , start_number + len(groups))
for (i, number_group) in zip(number_of_groups, groups):
numbers = re.findall("\d{2}", number_group) # return list of two-digit numbers
for x in numbers:
print "(%s)%s" % (i, x)
Result
(5)17
(5)18
(5)19
(6)20
(6)21
(6)22
(7)23
(7)24
(7)25

Try this:
Code:
mInput = "171819-202122-232425"
number = 9 # Just an example
result = ""
i = 0
for n in mInput:
if n == '-': # To handle dash case
number += 1
continue
i += 1
if i % 2 == 1: # Each two digits
result += "\n(" + str(number) + ")"
result += n # Add current digit
print result
Output:
(9)17
(9)18
(9)19
(10)20
(10)21
(10)22
(11)23
(11)24
(11)25

Related

Seperate numbers and Letters & Rearrange the String: Variable Letters Has an Incorrect Value

Write the function Separator that:
Input: inString -- random string scalar mixed with numbers & letters
Tasks:
Seperate Numbers & Letters
Calculate Sum of Numbers
Count Number of Letters (the white spaces counted)
Output:
numbers, string scalar of all numbers in inString with same order.
letters, string scalar of all letters in inString with same order, and white spaces Does Not removed.
sumofNumbers, double precision scalar of sum of all numbers in inString.
numberofLetters, double precision scalar to count all letters in inString, white spaces counted.
Expected Result:
inString: "eng12in13e143e553rin154g 6p547ro548bl645em 8s65ol9v56ing"
numbers: "12131435531546547548645865956"
letters: "engineering problem solving"
sumofNumbers: 131
numberofLetters: 27
Here's My Code:
function [numbers, letters, sumofNumbers, numberofLetters]=Separator(inString)
%Insert your code here
indexNum = regexp(char(inString), '[0-9]')
numbers = []
for i = 1:numel(indexNum)
numbers = [numbers , inString{1}(indexNum(i))]
end
% numbers
numbers = string(numbers)
% sumofNumbers
sumofNumbers = 0
for i = 1:numel(indexNum)
sumofNumbers = sumofNumbers + str2num(numbers{1}(i))
end
words = split(inString)
letters = []
count = 0
for i = 1:numel(words)
indexLett = regexp(char(words(i)), '[a-z]')
count = count + numel(indexLett)
for j = 1:numel(indexLett)
letters = [letters, words{i}(indexLett(j))]
end
letters = strcat(string(letters), " ")
letters = char(letters)
end
% letters
letters = strip(string(letters))
comb = split(letters)
letters = join(comb)
% number of Literal Letters
numofTrueLetters = count
% numberofLetters
numberofLetters = 0
numberofLetters = strlength(letters)
end
The Code Returns Exactly As the Expected:
numbers =
"12131435531546547548645865956"
letters =
"engineering problem solving"
sumofNumbers =
131
numberofLetters =
27
However, the MATLAB Grader gives this Answer: "Variable Letters Has an Incorrect Value" and I was confused.
I would be very appreciated if someone could point out the mistake or the error, thank you!
Alright I got this!
I modified this line, before "[a-z]" and now is "[A-Za-z]" which checks all upper and lower cases for case-insensitivity.
indexLett = regexp(char(words(i)), '[A-Za-z]')
By this way, the indexLett would record all the indexes in the string that has alphabetic letters regardless of case-sensitive,
so for more inputs like
"Apple is good for health"
or
"A42pp3113le31 is 31g11oo456d fo442r h105ea422l44t2h"
could be counted as the capitalized words in the string.

regex: Match strings of numbers up to permutation of ciphers

I am trying to find a regex query, such that, for instance, the following strings match the same expression
"1116.67711..44."
"2224.43322..88."
"9993.35599..22."
"7779.91177..55."
I.e. formally "x1x1x1x2.x2x3x3x1x1..x4x4." where xi ≠ xj if i ≠ j, and where xi is some number from 1 to 9 inclusive.
Or (another example), the following strings match the same expression, but not the same expression as before:
"94..44.773399.4"
"25..55.886622.5"
"73..33.992277.3"
I.e. formally "x1x2..x2x2.x3x3x4x4x1x1.x2" where xi ≠ xj if i ≠ j, and where xi is some number from 1 to 9 inclusive.
That is two strings should be equal if they have the same form, but with the numbers internally permuted so that they are pairwise distinct.
The dots should mean a space in the sequence, this could be any value that is not a single digit number, and two "equal" strings, should have spaces the same places. If it helps, the strings all have the same length of 81 (above they all have a length of 15, as to not write too long strings).
That is, if I have some string as above, e.g. "3566.235.225..45" i want to have some reqular expression that i can apply to some database to find out if such a string already exists
Is it possible to do this?
The answer is fairly straightforward:
import re
pattern = re.compile(r'^(\d)\1{3}$')
print(pattern.match('1234'))
print(pattern.match('333'))
print(pattern.match('3333'))
print(pattern.match('33333'))
You capture what you need once, then tell the regex engine how often you need to repeat it. You can refer back to it as often as you like, for example for a pattern that would match 11.222.1 you'd use ^(\d)\1{1}\.(\d)\2{2}\.(\1){1}$.
Note that the {1} in there is superfluous, but it shows that the pattern can be very regular. So much so, that it's actually easy to write a function that solves the problem for you:
def make_pattern(grouping, separators='.'):
regex_chars = '.\\*+[](){}^$?!:'
groups = {}
i = 0
j = 0
last_group = 0
result = '^'
while i < len(grouping):
if grouping[i] in separators:
if grouping[i] in regex_chars:
result += '\\'
result += grouping[i]
i += 1
else:
while i < len(grouping) and grouping[i] == grouping[j]:
i += 1
if grouping[j] in groups:
group = groups[grouping[j]]
else:
last_group += 1
groups[grouping[j]] = last_group
group = last_group
result += '(.)'
j += 1
result += f'\\{group}{{{i-j}}}'
j = i
return re.compile(result+'$')
print(make_pattern('111.222.11').match('aaa.bbb.aa'))
So, you can give make_pattern a good example of the pattern and it will return the compiled regex for you. If you'd like other separators than '.', you can just pass those in as well:
my_pattern = make_pattern('11,222,11', separators=',')
print(my_pattern.match('aa,bbb,aa'))

The words average from a File

I have this questions: Write a program that will calculate the average word length of a text stored in a file (i.e the sum of all the lengths of the word tokens in the text, divided by the number of word tokens).
my code:
allword = 0
words = 0
average = 0
with open('/home/......', 'r') as f:
for i in f:
me = i.split()
allword += len(me)
words += len(i)
average += allword / float(words)
print average
so , i have 4 line and 55 characters without computer blank space, i come from average: 27.54 .... and i think that the result not gut is...
Can anybody with simple words tell me, where are that problem....
Very Thanks!
#mustaccio
Maybe 27.54 to high...now the code with a little change.....
allword = 0
words = 0
average = 0
with open('/home/....', 'r') as f:
for i in f:
me = "".join(i.split(" "))
allword += len(me)
words += len(i)
average += allword / float(words)
print average
Now i come 4.32....

Pad integer after hyphen

How do you pad a number 12345-9 to display as 12345-09? I tried split and replace but they don't work on integers. If I convert it to a string, it gets rid of the numbers after the hyphen.
As Adam said, split on the hyphen, pad the number, and then rejoin.
s = "12345-9"
sp = s.split("-")
sp[1] = "%02d" % int(sp[1])
s = "-".join(sp)
print s
>>> s = '12345-9'
>>> '%s-%02i' % tuple(int(v) for v in s.split('-'))
'12345-09'

VBA code for extracting 3 specific number patterns

I am working in excel and need VBA code to extract 3 specific number patterns. In column A I have several rows of strings which include alphabetical characters, numbers, and punctuation. I need to remove all characters except those found in a 13-digit number (containing only numbers), a ten-digit number (containing only numbers), or a 9-digit number immediately followed by an "x" character. These are isbn numbers.
The remaining characters should be separated by one, and only one, space. So, for the following string found in A1: "There are several books here, including 0192145789 and 9781245687456. Also, the book with isbn 045789541x is included. This book is one of 100000000 copies."
The output should be: 0192145789 9781245687456 045789541x
Note that the number 100000000 should not be included in the output because it does not match any of the three patterns mentioned above.
I'm not opposed to a excel formula solution as opposed to VBA, but I assumed that VBA would be cleaner. Thanks in advance.
Here's a VBA function that will do specifically what you've specified
Function ExtractNumbers(inputStr As String) As String
Dim outputStr As String
Dim bNumDetected As Boolean
Dim numCount As Integer
Dim numStart As Integer
numCount = 0
bNumDetected = False
For i = 1 To Len(inputStr)
If IsNumeric(Mid(inputStr, i, 1)) Then
numCount = numCount + 1
If Not bNumDetected Then
bNumDetected = True
bNumStart = i
End If
If (numCount = 9 And Mid(inputStr, i + 1, 1) = "x") Or _
numCount = 13 And Not IsNumeric(Mid(inputStr, i + 1, 1)) Or _
numCount = 10 And Not IsNumeric(Mid(inputStr, i + 1, 1)) Then
If numCount = 9 Then
outputStr = outputStr & Mid(inputStr, bNumStart, numCount) & "x "
Else
outputStr = outputStr & Mid(inputStr, bNumStart, numCount) & " "
End If
End If
Else
numCount = 0
bNumDetected = False
End If
Next i
ExtractNumbers = Trim(outputStr)
End Function
It's nothing fancy, just uses string functions to goes through your string one character at a time looking for sections of 9 digit numbers ending with x, 10 digit numbers and 13 digit numbers and extracts them into a new string.
It's a UDF so you can use it as a formula in your workbook