Extract numbers from a string while maintaining the whitespaces - regex

I have some string like this
' 12 2 89 29 11 92 92 10'
(all the numbers are positive integers so no - and no .), and I want to extract all numbers from it, edit some of the numbers, and then put them all together with the same whitespaces. For example, if I change the number 11 to 22, I want the final string as
' 12 2 89 29 22 92 92 10'
I did some search and most questions disregard the whitespaces and only care about the numbers. I tried
match = re.match((\s*(\d+)){8}, str)
but match.group(0) gives me the whole string,, match.group(1) gives me the first match \ 12 (I added the \ otherwise the website won't show the leading whitespaces), and match.group(2) gives me 12. But it won't give me any numbers after that, any index higher than 2 gives me an error. I don't think my approach is the correct one, what is the right way to do this?
I just tried re.split('(\d+)', str) and that seems to be what I need.

I'd recommend using a regular expression with non-capturing groups, to get a list of 'space' parts and 'number' parts:
In [15]: text = ' 12 2 89 29 11 92 92 10'
In [16]: parts = re.findall('((?: +)|(?:[0-9]+))', text)
In [17]: parts
Out[17]: [' ', '12', ' ', '2', ' ', '89', ' ', '29', ' ',
'11', ' ', '92', ' ', '92', ' ', '10']
Then you can do:
for index, part in enumerate(parts):
if part == '11':
parts[index] = '22'
replaced = ''.join(parts)
(or whatever match and replacement you want to do).

Match all numbers with spaces, change desired number and join array.
import re
newNum = '125'
text = ' 12 2 89 29 11 92 92 10'
^^
marray = re.findall(r'\s+\d+', text)
marray[6] = re.sub(r'\d+', newNum, marray[6])
print(marray)
[' 12', ' 2', ' 89', ' 29', ' 11', ' 92', ' 125', ' 10']

Related

list() in a single line, can't introduce /newlines

as a project im coding to web scrape a site with statistics of certain monsters from a game, the problem is that when i append the data to a list it gets printed in the form of a very long single line.
I already tried .append(clean_data.getText().replace('\n', "\\n")).
Something to take into account is that if i don't use the .getText() I append a lot of [td] and [tr] tags into the list and it gets very messy.
I think the problem here is that the text im getting is being treated as plain text so when i replace \n with \\n it gets replaced directly as \\n like it doesnt recognize the \\n.
My code:
import requests
import pandas as pd
from bs4 import BeautifulSoup
import csv
url = 'https://guildstats.eu/monsters?world=Yonabra'
page = requests.get(url)
soup = BeautifulSoup(page.content, 'html.parser')
monsters = ('adult goannas', 'young goannas', 'manticores', 'feral sphinxes', 'ogre ruffians', 'ogre rowdies', 'ogre sages', 'dogs')
finding_td = soup.find_all('td', string=monsters)
list_of_monsters = []
for looking_for_parent in finding_td:
parent_tr = looking_for_parent.find_parents('tr')
for clean_data in parent_tr:
list_of_monsters.append(clean_data.getText().replace('\n', " "))
print(list_of_monsters)
It gives the following output:
[' 7 adult goannas 2020-05-28 1519 0 736893 133 ', ' 222 dogs 2020-05-27 143 0 40043 0 ', ' 298 feral sphinxes 2020-05-28 1158 1 480598 152 ', ' 498 manticores 2020-05-28 961 1 299491 68 ', ' 581 ogre rowdies 2020-05-28 306 0 188324 13 ', ' 582 ogre ruffians 2020-05-29 217 0 121964 7 ', ' 583 ogre sages 2020-05-28 156 0 63489 8 ', ' 911 young goannas 2020-05-28 1880 0 972217 74 ']
i want it to be more like this:
[' 7 adult goannas 2020-05-28 1519 0 736893 133 '
' 222 dogs 2020-05-27 143 0 40043 0 '
' 298 feral sphinxes 2020-05-28 1158 1 480598 152 '
' 498 manticores 2020-05-28 961 1 299491 68 '
' 581 ogre rowdies 2020-05-28 306 0 188324 13 '
' 582 ogre ruffians 2020-05-29 217 0 121964 7 '
' 583 ogre sages 2020-05-28 156 0 63489 8 '
' 911 young goannas 2020-05-28 1880 0 972217 74 ']
What you want is to change the delimiter for the array - instead of ,, you want a new line. As #QHarr mentioned, you can use the python pprint to print the results in a better format.
Try:
import requests
import pandas as pd
from bs4 import BeautifulSoup
import csv
from pprint import pprint
url = 'https://guildstats.eu/monsters?world=Yonabra'
page = requests.get(url)
soup = BeautifulSoup(page.content, 'html.parser')
monsters = ('adult goannas', 'young goannas', 'manticores', 'feral sphinxes', 'ogre ruffians', 'ogre rowdies', 'ogre sages', 'dogs')
finding_td = soup.find_all('td', string=monsters)
list_of_monsters = []
for looking_for_parent in finding_td:
parent_tr = looking_for_parent.find_parents('tr')
for clean_data in parent_tr:
list_of_monsters.append(clean_data.getText().replace("\n", " "))
pprint(list_of_monsters)
This gives:
[' 7 adult goannas 2020-05-28 1519 0 736893 133 ',
' 222 dogs 2020-05-27 143 0 40043 0 ',
' 298 feral sphinxes 2020-05-28 1158 1 480598 152 ',
' 498 manticores 2020-05-28 961 1 299491 68 ',
' 581 ogre rowdies 2020-05-28 306 0 188324 13 ',
' 582 ogre ruffians 2020-05-29 217 0 121964 7 ',
' 583 ogre sages 2020-05-28 156 0 63489 8 ',
' 911 young goannas 2020-05-28 1880 0 972217 74 ']
The \n characters you obtained are already new line characters. There is no need to add the extra escape character in python. As you have tried, replace("\n", " ") already gives you the desired replace effect. Also, since you're printing an array, even though the element ends with a new line, it will still be printed as \n. pprint will not have any effect on the original array, only printing it in a better format.

Try to split a string with particular regex expression

i'm trying to split a string using 2 separator and regex. My string is for example
"test 10 20 middle 30 - 40 mm".
and i would like to split in ["test 10", "20 middle 30", "40 mm"]. So, splittin dropping ' - ' and the space between 2 digits.
I tried to do
result = re.split(r'[\d+] [\d+]', s)
> ['test 1', '0 middle 30 - 40 mm']
result2 = re.split(r' - |{\d+} {\d+}', s)
> ['test 10 20 middle 30', '40 mm']
Is there any reg expression to split in ['test 10', '20 middle 30', '40 mm'] ?
You may use
(?<=\d)\s+(?:-\s+)?(?=\d)
See the regex demo.
Details
(?<=\d) - a digit must appear immediately on the left
\s+ - 1+ whitespaces
(?:-\s+)? - an optional sequence of a - followed with 1+ whitespaces
(?=\d) - a digit must appear immediately on the right.
See the Python demo:
import re
text = "test 10 20 middle 30 - 40 mm"
print( re.split(r'(?<=\d)\s+(?:-\s+)?(?=\d)', text) )
# => ['test 10', '20 middle 30', '40 mm']
Data
k="test 10 20 middle 30 - 40 mm"
Please Try
result2 = re.split(r"(^[a-z]+\s\d+|\^d+\s[a-z]+|\d+)$",k)
result2
**^[a-z]**-match lower case alphabets at the start of the string and greedily to the left + followed by:
**`\s`** white space characters
**`\d`** digits greedily matched to the left
| or match start of string with digits \d+ also matched greedily to the left and followed by:
`**\s**` white space characters
**`a-z`** lower case alphabets greedily matched to the left
| or match digits greedily to the left \d+ end the string $
Output

dart regex remove space phone

I tried all this regex solution but no match REGEX Remove Space
I work with dart and flutter and I tried to capture only digit of this type of string :
case 1
aaaaaaaaa 06 12 34 56 78 aaaaaa
case 2
aaaaaaaa 0612345678 aaaaaa
case 3
aaaaaa +336 12 34 56 78 aaaaa
I search to have only 0612345678 with no space and no +33. Just 10 digit in se case of +33 I need to replace +33 by 0
currently I have this code \D*(\d+)\D*? who run with the case 2
You may match and capture an optional +33 and then a digit followed with spaces or digits, and then check if Group 1 matched and then build the result accordingly.
Here is an example solution (tested):
var strs = ['aaaaaaaaa 06 12 34 56 78 aaaaaa', 'aaaaaaaa 0612345678 aaaaaa', 'aaaaaa +336 12 34 56 78 aaaaa', 'more +33 6 12 34 56 78'];
for (int i = 0; i < strs.length; i++) {
var rx = new RegExp(r"(?:^|\D)(\+33)?\s*(\d[\d ]*)(?!\d)");
var match = rx.firstMatch(strs[i]);
var result = "";
if (match != null) {
if (match.group(1) != null) {
result = "0" + match.group(2).replaceAll(" ", "");
} else {
result = match.group(2).replaceAll(" ", "");
}
print(result);
}
}
Returns 3 0612345678 strings in the output.
The pattern is
(?:^|\D)(\+33)?\s*(\d[\d ]*)(?!\d)
See its demo here.
(?:^|\D) - start of string or any char other than a digit
(\+33)? - Group 1 that captures +33 1 or 0 times
\s* - any 0+ whitespaces
(\d[\d ]*) - Group 2: a digit followed with spaces or/and digits
(?!\d) - no digit immediately to the right is allowed.
Spaces are removed from Group 2 with a match.group(2).replaceAll(" ", "") since one can't match discontinuous strings within one match operation.

Use commas to join expression components

trafficLightsCount =12
bufferDist = '5 mi. '
intersectionCount = 20
print 'Found', trafficLightCount, light in
the intersectionCount, buffer and, 'buffereDist 'intersections.'
a='something'
b=12
c='Another thing'
print 'Found : ' + '{0},{1},{2}'.format(a,b,c)
You will get the output : Found : something,12,Another thing

How to fetch a particular pattern using regular expression in Robot Framework?

I have a scenario where I need to fetch a particular pattern from the string using regular expression.
The string looks like below:
${text} = Slot 0 l 5 3 24+6
Slot 1 l 3 16 10
Slot 3 l 4 3 32
Slot 8 l 2 3
Slot 9 l 1 3
Here, I need to fetch only
Slot 0
Slot 1
Slot 3
Slot 8
Slot 9
How do I do this?
I have tried using the keywords 'Replace String Using Regexp' and 'Get Regexp Matches' for the same.
${text}= String.Replace String Using Regexp ${response} [^Slot\\s+\\d], ${EMPTY}
The result was:
${text} = Slot 0 l 5 3 24+6 Slot 1 l 3 16 10 Slot 3 l 4 3 32 Slot 8 l 2 3 Slot 9 l 1 3 –
And, Get Regexp Matches gives the below result:
${matches}= String.Get Regexp Matches ${response} [Slot\\s+\\d]
The result:
${matches}= ['S', 'l', 'o', 't', ' ', '0', ' ', ' ', ' ', 'l', ' ', ' ', ' ', ' ', ' ', ' ', ' ', '5', ' ', '3', ' ', ' ', '2', '4', '+', '6', '\r', '\n', 'S', 'l', 'o', 't', ' ', '1', ' ', ' ', ' ', 'l', ' ', ' ... –
The solution is just to remove the square brackets used for the regular expression in 'Get Regexp Matches' keyword.i.e., Use Slot\s+\d+ instead of [Slot\s+\d+] This is because [] Matches a single character from the list and my requirement was to fetch the whole substring. Thanks #Todor