Uneven dataset regex - regex

Please suggest the regex for following dataset
gSoCa,['25','78'],fa,GT,GTM_19,gPfRec,['22','78','78'],10,fa,GT,TS/C_LE_RE,
gPreRe,['12'],10,fa,GT,TS/C_L_OW,gTLAsTe,['2'],PT,TEST/UP_P_IST,gBeAdRe,['78','2'],5,fa,GET,ulr/UTC_9,gEdiRen,['2'],fa,GT,ua/ngs_2018-Copy,
tried:
(\w+,(\['?\d+'?(?:,\s*'?\d+'?)*\]),(\w+),([\w/_-]+|[\w/]+),([\w/_-]+|[\w/]+)),
expected is python tuple:
[gSoCa,25,78,fa,GT,GTM_19,.....ua/ngs_2018-Copy]
see the demo
https://regex101.com/r/zHXUmh/1

Maybe something like this could work:
text = """gSoCa,['25','78'],fa,GT,GTM_19,
gPfRec,['22','78','78'],10,fa,GT,TS/C_LE_RE,
gPreRe,['12'],10,fa,GT,TS/C_L_OW,
gTLAsTe,['2'],PT,TEST/UP_P_IST,
gBeAdRe,['78','2'],5,fa,GET,ulr/UTC_9,
gEdiRen,['2'],fa,GT,ua/ngs_2018-Copy,"""
result = [s.strip("[]'\n") for s in text.split(",")]
print(result)

Your "expected" tuple is impossible in Python as it contains strings without quotes. That said, all it needs is a series of not-in-set characters:
print (tuple(re.findall(r"[^\[\],']+", text)))
result (as noted, not exactly what you want!):
('gSoCa', '25', '78', 'fa', 'GT', 'GTM_19', 'gPfRec', '22', '78', '78', '10', 'fa', 'GT', 'TS/C_LE_RE')

Related

I want to get some letters using the Regular Expressions

As I said on the title, I want to get some letters using 'Regular Expressions'. But I don't know how to get it.
re.findall("\d*\.?\d+[^Successful 50/50s]", a)
'Defence\nClean sheets\n53\nGoals conceded\n118\nTackles\n186\nTackle success %\n75%\nLast man tackles\n2\nBlocked shots\n24\nInterceptions\n151\nClearances\n805\nHeaded Clearance\n380\nClearances off line\n3\nRecoveries\n666\nDuels won\n435\nDuels lost\n330\nSuccessful 50/50s\n25\nAerial battles won\n206\nAerial battles lost\n193\nOwn goals\n1\nErrors leading to goal\n1Team Play\nAssists\n2\nPasses\n7,979\nPasses per match\n56.19\nBig chances created\n3\nCrosses\n48\nCross accuracy %\n25%\nThrough balls\n10\nAccurate long balls\n936Discipline\nYellow cards\n13\nRed cards\n0\nFouls\n48\nOffsides\n2Attack\nGoals\n6\nHeaded goals\n4\nGoals with right foot\n1\nGoals with left foot\n1\nHit woodwork\n3'
I want to get just the number including floats and % but excepting the 'Successful 50/50s'. But also want to remain thousand’s place like 7,979.
You can use this regex, which will match all numbers except the one where your numbers are preceded and followed by a slash like 50/50
(?<!/)\d*(?:,\d+)*\.?\d+\b(?!/)
Regex Demo
Your updated Python code,
import re
s = '''Defence\nClean sheets\n53\nGoals conceded\n118\nTackles\n186\nTackle success %\n75%\nLast man tackles\n2\nBlocked shots\n24\nInterceptions\n151\nClearances\n805\nHeaded Clearance\n380\nClearances off line\n3\nRecoveries\n666\nDuels won\n435\nDuels lost\n330\nSuccessful 50/50s\n25\nAerial battles won\n206\nAerial battles lost\n193\nOwn goals\n1\nErrors leading to goal\n1',
'Team Play\nAssists\n2\nPasses\n7,979\nPasses per match\n56.19\nBig chances created\n3\nCrosses\n48\nCross accuracy %\n25%\nThrough balls\n10\nAccurate long balls\n936',
'Discipline\nYellow cards\n13\nRed cards\n0\nFouls\n48\nOffsides\n2',
'Attack\nGoals\n6\nHeaded goals\n4\nGoals with right foot\n1\nGoals with left foot\n1\nHit woodwork\n3'''
print(re.findall(r'(?<!/)\d*(?:,\d+)*\.?\d+\b(?!/)', s))
Prints all numbers except those 50/50,
['53', '118', '186', '75', '2', '24', '151', '805', '380', '3', '666', '435', '330', '25', '206', '193', '1', '1', '2', '7,979', '56.19', '3', '48', '25', '10', '936', '13', '0', '48', '2', '6', '4', '1', '1', '3']

RegEx for matching the month, day and year

I'm trying to find a regular expression to extract the month, day and year from a datetime stamp in this format:
01/20/2019 12:34:54
It should return a list:
['01', '20', '2019']
I know this can be solved using:
dt.split(' ')[0].split('/')
But, I'm trying to find a regex to do it:
[^\/\s]+
But, I need it to exclude everything after the space.
As you are expecting the date month and year to be returned as a list, you can use this Python code,
import re
s = '01/20/2019 12:34:54'
print(re.findall(r'\d+(?=[ /])', s))
Prints,
['01', '20', '2019']
Otherwise, you can better write your regex as,
(\d{2})/(\d{2})/(\d{4})
And get date, month and year from group1, group2 and group3
Regex Demo
Python code in this way should be,
import re
s = '01/20/2019 12:34:54'
m = re.search(r'(\d{2})/(\d{2})/(\d{4})', s)
if m:
print([m.group(1), m.group(2), m.group(3)])
Prints,
['01', '20', '2019']
You should absolutely be taking advantage of Python's date/time API here. Use strptime to parse your input datetime string to a bona fide Python datetime. Then, just build a list, accessing the various components you need.
dt = "01/20/2019 12:34:54"
dto = datetime.strptime(dt, '%m/%d/%Y %H:%M:%S')
list = [dto.month, dto.day, dto.year]
print(list)
[1, 20, 2019]
If you really want/need to work with the original datetime string, then split provides an option, without even formally using a regex:
dt = "01/20/2019 12:34:54"
dt = dt.split()[0].split('/')
print(dt)
['01', '20', '2019']
This RegEx might help you to do so.
([0-9]+)\/([0-9]+)\/([0-9]+)\s[0-9]+:[0-9]+:[0-9]+
Code:
import re
string = '01/20/2019 12:34:54'
matches = re.search(r'([0-9]+)/([0-9]+)/([0-9]+)', string)
if matches:
print([matches.group(1), matches.group(2), matches.group(3)])
else:
print('Sorry! No matches! Something is not right! Call 911')
Output
['01', '20', '2019']

Sort a list with Case

Example:
let list = [['aen', '2'], ['ben', '3'], ['Aen', '4'], ['Ben', '5']]
sort(list, 1)
Output:
let list = [['aen', '2'], ['Aen', '4'], ['ben', '3'], ['Ben', '5']]
Expected output:
let list = [['Aen', '4'], ['aen', '2'], ['Ben', '5'], ['ben', '3']]
In python that would be easy to do but it seems not te be possible to use the python sorted() command with a vim list.
How can I sort the list as expected?
UPDATE:
Maybe a solution would be:
for i in range(0,len(list)-1)
let #b= join(list[i], "|||")
endfor
This would put all the lines in a register but how can I sort the lines in a register?
EDIT:
Found a solution with the help of python (within Vim Function)
python3 << endpython
import vim
list2 = vim.eval('list')
list3 = sorted(list2, key=lambda v: (v[0].upper(), v[0].islower()))
vim.command("let list= %s"% list3)
endpython
echo list --> [['Aen', '4'], ['aen', '2'], ['Ben', '5'], ['ben', '3']]
Reman: Found a solution with the help of python (within Vim Function)
python3 << endpython
import vim
list2 = vim.eval('list')
list3 = sorted(list2, key=lambda v: (v[0].upper(), v[0].islower()))
vim.command("let list= %s"% list3)
endpython
echo list --> [['Aen', '4'], ['aen', '2'], ['Ben', '5'], ['ben', '3']]
sort() takes an optional parameter which is a function. Define your function that return -1, 0, or 1 depending on the first element of your sub-lists.
Note that you'll have to define the function for your altered lexical order.
May be instead, you could sort the transposed versions with tr(), and then transpose back after the sort.
I think to obtain your expected result in python, you have to write your own compare function too. In vim, you can first impl. the compare function, and pass the function ref to the sort() function.
Please check the :h sort( for detail

Arranging nested tuples

I know that this is probably a silly question and I apologize for that, but I am very new to python and have tried to solve this for a long time now, with no success.
I have a list of tuples similar to the one bellow:
data = [('ralph picked', ['nose', '4', 'apple', '30', 'winner', '3']),
('aaron popped', ['soda', '1', 'popcorn', '6', 'pill', '4', 'question', '29'])]
I would like to sort the nested list in descending other:
data = [('ralph picked', ['apple', '30', 'nose', '4', 'winner', '3']),
('aaron popped', ['question', '29', 'popcorn', '6', 'pill', '4', 'soda', '1'])]
I tried using simple
sorted(data)
but what I get is only the first item of tuple sorted. What I am missing here? I really thank you for any help.
Let's consider only the inner list. The first issue is that it seems like you want to keep word, number pairs together. We can use zip to combine them, remembering that seq[::2] gives us every second element starting at the 0th, and seq[1::2] gives us every second starting at the first:
>>> s = ['nose', '4', 'apple', '30', 'winner', '3']
>>> zip(s[::2], s[1::2])
<zip object at 0xb5e996ac>
>>> list(zip(s[::2], s[1::2]))
[('nose', '4'), ('apple', '30'), ('winner', '3')]
Now, as you've discovered, if you call sorted on a sequence, it sorts first by the first element, then by the second to break ties, etc., going as deep as it needs to. So if we call sorted on this:
>>> sorted(zip(s[::2], s[1::2]))
[('apple', '30'), ('nose', '4'), ('winner', '3')]
Well, that looks like it works, but only by fluke because apple-nose-winner is in alphabetical order. Really we want to sort by the second term. sorted takes a key parameter:
>>> sorted(zip(s[::2], s[1::2]), key=lambda x: x[1])
[('winner', '3'), ('apple', '30'), ('nose', '4')]
That didn't work either, because it's sorting the number strings lexicographically (dictionary-style, so '30' comes before '4'). We can tell it we want to use the numerical value, though:
>>> sorted(zip(s[::2], s[1::2]), key=lambda x: int(x[1]))
[('winner', '3'), ('nose', '4'), ('apple', '30')]
Almost there -- we want this reversed:
>>> sorted(zip(s[::2], s[1::2]), key=lambda x: int(x[1]), reverse=True)
[('apple', '30'), ('nose', '4'), ('winner', '3')]
And this is almost right, but we need to flatten it. We can use either a nested list comprehension:
>>> s2 = sorted(zip(s[::2], s[1::2]), key=lambda x: int(x[1]), reverse=True)
>>> [value for pair in s2 for value in pair]
['apple', '30', 'nose', '4', 'winner', '3']
or use itertools.chain:
>>> from itertools import chain
>>> list(chain.from_iterable(s2))
['apple', '30', 'nose', '4', 'winner', '3']
And I think that's where we wanted to go.

Get data from string

i have a string like
A & A COMPUTERS INC [RC1058054]
i want a regex to split all the data inside [ ] .Any ideas ?
To capture the data between [ and ] you can use the regex:
\[([^]]*)\]
Since the current version of the question leaves out the programming language, I just pick one.
>>> import re
>>> s = "A & A COMPUTERS INC [RC1058054]"
>>> re.search("\[(.*)\]", s).group(1)
'RC1058054'
>>> # If you want to "split all data" ...
>>> [ x for x in re.search(s).group(1) ]
['R', 'C', '1', '0', '5', '8', '0', '5', '4']
This regex (?<=\[)[^]]*(?=\]) capture all data between [ and ] for .net and java platform.