Get data from string - regex

i have a string like
A & A COMPUTERS INC [RC1058054]
i want a regex to split all the data inside [ ] .Any ideas ?

To capture the data between [ and ] you can use the regex:
\[([^]]*)\]

Since the current version of the question leaves out the programming language, I just pick one.
>>> import re
>>> s = "A & A COMPUTERS INC [RC1058054]"
>>> re.search("\[(.*)\]", s).group(1)
'RC1058054'
>>> # If you want to "split all data" ...
>>> [ x for x in re.search(s).group(1) ]
['R', 'C', '1', '0', '5', '8', '0', '5', '4']

This regex (?<=\[)[^]]*(?=\]) capture all data between [ and ] for .net and java platform.

Related

Uneven dataset regex

Please suggest the regex for following dataset
gSoCa,['25','78'],fa,GT,GTM_19,gPfRec,['22','78','78'],10,fa,GT,TS/C_LE_RE,
gPreRe,['12'],10,fa,GT,TS/C_L_OW,gTLAsTe,['2'],PT,TEST/UP_P_IST,gBeAdRe,['78','2'],5,fa,GET,ulr/UTC_9,gEdiRen,['2'],fa,GT,ua/ngs_2018-Copy,
tried:
(\w+,(\['?\d+'?(?:,\s*'?\d+'?)*\]),(\w+),([\w/_-]+|[\w/]+),([\w/_-]+|[\w/]+)),
expected is python tuple:
[gSoCa,25,78,fa,GT,GTM_19,.....ua/ngs_2018-Copy]
see the demo
https://regex101.com/r/zHXUmh/1
Maybe something like this could work:
text = """gSoCa,['25','78'],fa,GT,GTM_19,
gPfRec,['22','78','78'],10,fa,GT,TS/C_LE_RE,
gPreRe,['12'],10,fa,GT,TS/C_L_OW,
gTLAsTe,['2'],PT,TEST/UP_P_IST,
gBeAdRe,['78','2'],5,fa,GET,ulr/UTC_9,
gEdiRen,['2'],fa,GT,ua/ngs_2018-Copy,"""
result = [s.strip("[]'\n") for s in text.split(",")]
print(result)
Your "expected" tuple is impossible in Python as it contains strings without quotes. That said, all it needs is a series of not-in-set characters:
print (tuple(re.findall(r"[^\[\],']+", text)))
result (as noted, not exactly what you want!):
('gSoCa', '25', '78', 'fa', 'GT', 'GTM_19', 'gPfRec', '22', '78', '78', '10', 'fa', 'GT', 'TS/C_LE_RE')

I want to get some letters using the Regular Expressions

As I said on the title, I want to get some letters using 'Regular Expressions'. But I don't know how to get it.
re.findall("\d*\.?\d+[^Successful 50/50s]", a)
'Defence\nClean sheets\n53\nGoals conceded\n118\nTackles\n186\nTackle success %\n75%\nLast man tackles\n2\nBlocked shots\n24\nInterceptions\n151\nClearances\n805\nHeaded Clearance\n380\nClearances off line\n3\nRecoveries\n666\nDuels won\n435\nDuels lost\n330\nSuccessful 50/50s\n25\nAerial battles won\n206\nAerial battles lost\n193\nOwn goals\n1\nErrors leading to goal\n1Team Play\nAssists\n2\nPasses\n7,979\nPasses per match\n56.19\nBig chances created\n3\nCrosses\n48\nCross accuracy %\n25%\nThrough balls\n10\nAccurate long balls\n936Discipline\nYellow cards\n13\nRed cards\n0\nFouls\n48\nOffsides\n2Attack\nGoals\n6\nHeaded goals\n4\nGoals with right foot\n1\nGoals with left foot\n1\nHit woodwork\n3'
I want to get just the number including floats and % but excepting the 'Successful 50/50s'. But also want to remain thousand’s place like 7,979.
You can use this regex, which will match all numbers except the one where your numbers are preceded and followed by a slash like 50/50
(?<!/)\d*(?:,\d+)*\.?\d+\b(?!/)
Regex Demo
Your updated Python code,
import re
s = '''Defence\nClean sheets\n53\nGoals conceded\n118\nTackles\n186\nTackle success %\n75%\nLast man tackles\n2\nBlocked shots\n24\nInterceptions\n151\nClearances\n805\nHeaded Clearance\n380\nClearances off line\n3\nRecoveries\n666\nDuels won\n435\nDuels lost\n330\nSuccessful 50/50s\n25\nAerial battles won\n206\nAerial battles lost\n193\nOwn goals\n1\nErrors leading to goal\n1',
'Team Play\nAssists\n2\nPasses\n7,979\nPasses per match\n56.19\nBig chances created\n3\nCrosses\n48\nCross accuracy %\n25%\nThrough balls\n10\nAccurate long balls\n936',
'Discipline\nYellow cards\n13\nRed cards\n0\nFouls\n48\nOffsides\n2',
'Attack\nGoals\n6\nHeaded goals\n4\nGoals with right foot\n1\nGoals with left foot\n1\nHit woodwork\n3'''
print(re.findall(r'(?<!/)\d*(?:,\d+)*\.?\d+\b(?!/)', s))
Prints all numbers except those 50/50,
['53', '118', '186', '75', '2', '24', '151', '805', '380', '3', '666', '435', '330', '25', '206', '193', '1', '1', '2', '7,979', '56.19', '3', '48', '25', '10', '936', '13', '0', '48', '2', '6', '4', '1', '1', '3']

sort a list of strings with name and number in

I am trying to sort a list of 100 filenames so they will used in the right order in later calculations. All the filenames have 'name_1' in the beginning of the name and '_out.txt' at the end. The difference is a number in between, going from 1-100
The list looks a bit like this:
['name_1_100_out.txt', 'name_1_10_out.txt', 'name_1_6_out.txt', 'name_1_5_out.txt', 'name_1_2_out.txt']
For this actual example I want:
['name_1_2_out.txt', 'name_1_5_out.txt', 'name_1_6_out.txt', 'name_1_10_out.txt', 'name_1_100_out.txt']
Now I have tried both list.sort and sorted(list) but with no luck. I have also tried with the key=int or key=str but none of them could help, since it seems, that it could not convert only a part of the string to int.
Can anyone help me with advice
You need leading zeros to sort the way you want.
#!/usr/bin/python
# -*- coding: utf-8 -*-
L=['name_1_100_out.txt', 'name_1_10_out.txt', 'name_1_6_out.txt', 'name_1_5_out.txt', 'name_1_2_out.txt']
OUT=[]
n='100' # max number
for item in L:
old=item[7:-8] # Faulty index
if len(old) < len(n):
new='0'*(len(n)-len(old))+old # Nice index
item=item.replace(old, new)
OUT.append(item)
OUT.sort()
print OUT
Result
['name_1_002_out.txt', 'name_1_005_out.txt', 'name_1_006_out.txt', 'name_1_010_out.txt', 'name_1_100_out.txt']
I would suggest renaming files to make life easier later on since not all file managers display faulty filenames in order.
You can use the key function for this task:
>>> l = ['name_1_100_out.txt', 'name_1_10_out.txt', 'name_1_6_out.txt', 'name_1_5_out.txt', 'name_1_2_out.txt']
>>> sorted(l,key=lambda s: int(s.split('_')[2]))
['name_1_2_out.txt', 'name_1_5_out.txt', 'name_1_6_out.txt', 'name_1_10_out.txt', 'name_1_100_out.txt']
lista = ['2','3','5','8','4','6','1']
listb = [('2','3'),('5','8'),('4','6'),('1','9')]
listc = {'a':'3','b':'5','c':'9','d':'4','e':'2','f':'0'}
d = sorted(lista, key=lambda item:int(item), reverse=True)
e = sorted(listb, key=lambda item:int(item[0]) + int(item[1]), reverse=True)
f = sorted(listc.items(), key=lambda item:int(item[1]), reverse=True)
print(d)
print(e)
print(f)
output:
['8', '6', '5', '4', '3', '2', '1']
[('5', '8'), ('4', '6'), ('1', '9'), ('2', '3')]
[('c', '9'), ('b', '5'), ('d', '4'), ('a', '3'), ('e', '2'), ('f', '0')]

Arranging nested tuples

I know that this is probably a silly question and I apologize for that, but I am very new to python and have tried to solve this for a long time now, with no success.
I have a list of tuples similar to the one bellow:
data = [('ralph picked', ['nose', '4', 'apple', '30', 'winner', '3']),
('aaron popped', ['soda', '1', 'popcorn', '6', 'pill', '4', 'question', '29'])]
I would like to sort the nested list in descending other:
data = [('ralph picked', ['apple', '30', 'nose', '4', 'winner', '3']),
('aaron popped', ['question', '29', 'popcorn', '6', 'pill', '4', 'soda', '1'])]
I tried using simple
sorted(data)
but what I get is only the first item of tuple sorted. What I am missing here? I really thank you for any help.
Let's consider only the inner list. The first issue is that it seems like you want to keep word, number pairs together. We can use zip to combine them, remembering that seq[::2] gives us every second element starting at the 0th, and seq[1::2] gives us every second starting at the first:
>>> s = ['nose', '4', 'apple', '30', 'winner', '3']
>>> zip(s[::2], s[1::2])
<zip object at 0xb5e996ac>
>>> list(zip(s[::2], s[1::2]))
[('nose', '4'), ('apple', '30'), ('winner', '3')]
Now, as you've discovered, if you call sorted on a sequence, it sorts first by the first element, then by the second to break ties, etc., going as deep as it needs to. So if we call sorted on this:
>>> sorted(zip(s[::2], s[1::2]))
[('apple', '30'), ('nose', '4'), ('winner', '3')]
Well, that looks like it works, but only by fluke because apple-nose-winner is in alphabetical order. Really we want to sort by the second term. sorted takes a key parameter:
>>> sorted(zip(s[::2], s[1::2]), key=lambda x: x[1])
[('winner', '3'), ('apple', '30'), ('nose', '4')]
That didn't work either, because it's sorting the number strings lexicographically (dictionary-style, so '30' comes before '4'). We can tell it we want to use the numerical value, though:
>>> sorted(zip(s[::2], s[1::2]), key=lambda x: int(x[1]))
[('winner', '3'), ('nose', '4'), ('apple', '30')]
Almost there -- we want this reversed:
>>> sorted(zip(s[::2], s[1::2]), key=lambda x: int(x[1]), reverse=True)
[('apple', '30'), ('nose', '4'), ('winner', '3')]
And this is almost right, but we need to flatten it. We can use either a nested list comprehension:
>>> s2 = sorted(zip(s[::2], s[1::2]), key=lambda x: int(x[1]), reverse=True)
>>> [value for pair in s2 for value in pair]
['apple', '30', 'nose', '4', 'winner', '3']
or use itertools.chain:
>>> from itertools import chain
>>> list(chain.from_iterable(s2))
['apple', '30', 'nose', '4', 'winner', '3']
And I think that's where we wanted to go.

Django paginator to include all elements of all previous pages

I read the docs on Pagination with Django and can't find a solution to my problem there. I want to paginate a queryset (5 elements per page) so that my object_list contains all elements of all previous pages up to the ones of the requested page.
This is what normally happens when I call for for the objects of page 2:
>>> p = Paginator(queryset, 5) # 5 elements per page
>>> page2 = p.page(2)
>>> page2.object_list
['6', '7', '8', '9', '10']
What I want to get is this:
['1', '2', '3', '4', '5', '6', '7', '8', '9', '10']
Any ideas?
enter code hereIt's normal, because, this is what the object Paginator do
page1 = p.page(1)
page1.object_list
[1, 2, 3, 4, 5] (5 items per page, from item(1) to item(5), this is the first page)
page2 = p.page(2)
page2.object_list
['6', '7', '8', '9', '10'](5 items per page, from item(6) to item(10),this is the second page)
The definition of object Paginator:
Give Paginator a list of objects, plus the number of items you’d like to have on each page,