Custom sort with regular expressions? - regex

I have two columns, first with a list of names and second with their rating. I want to use custom sort, for which I use the following (found it here)
:
=sort(A33:B50;match(B33:B50;{"Great";"Good";"OK";"Bad");true)
Which works, but my ratings are actually:
Great+
Great
Great-
Good+
Good
Good-
OK+
...
Is there any way where I can combine the formula above with regular expressions? Something along the lines of this:
=sort(A33:B50;match(B33:B50;{"Great*";"Good*";"OK*";"Bad*");true)
Which doesn't really do anything. Checked out the regex formulas of Google sheets, but couldn't find any that would do the trick in this situation.
Cheers!
PS: A workaround would be
=sort(A33:B50;match(B33:B50;{"Great+";"Great";"Great-";"Good+";"Good";"Good-";"OK+";"OK";"OK-";"Bad+";"Bad";"Bad-");true)
but I'm curious if there's a less tedious way of doing this

=sort(A1:B7;match(regexextract(B1:B7;"Great|Good|OK|Bad");{"Great";"Good";"OK";"Bad"};0);true)
Pipeline | is for OR login in Regex.
Change A1:B7 and B1:B7 to your ranges.
Edit
for sorting Good+ Good Good- change regex to "Great|Good\+|Good\-|Good|OK|Bad", change the array to {"Great";"Good+";"Good";"Good-";"OK";"Bad"}
counter-intuitive: the order in the regextract is Good+|Good-|Good
and in the array {"Great";"Good+";"Good";"Good-";"OK";"Bomb"} (Good in the regex was already capturing Good- instances)

Related

regex finding elements in xml which contain attributes whose values contain two periods

I'm searching some xml and my tool is regex. (my only tools in this case are editors so I"m using either eclipse or notepad++). I need to find all elements which contain attributes that have values containing two periods not adjacent.
so it would find attr1 and attr3 in this:
<myelement attr1 = "ab.cd.ef", attr2="ab", attr3="zy.sa.xa"/>
I've tried this and variations in notepad++
^(([^\"\.])*(\")[^\"\.]*[\.][^\"\.]*[\.][^\"\.]*[\"])+$
but it isn't picking up second attributes with values containing two periods.
I'm going to keep trying but if someone can point me to an answer I'd appreciate it.
I think you can't do this with regex.
Unless you create a monster regex that will create a blackhole swallowing all the life in the Earth (politely saying of course).
Bear in mind that you don't have logic in regex you just use pattern matching, for instance a number is just a number you can't say if I get 1 then get 3 also in a simple way.
You can use if then else in regex like:
(?(?=condition)(then1|then2|then3)|(else1|else2|else3))
But what you want to do is to nest if conditions with multiple conditions for each case, like if 1 then 3 | if 2 then 4 | if 3 then 5 creating an enormous pattern nested.
Another regex approach would be to have multiple regex lookarounds (look ahead in this case) what will do your regex impossible to read.
I think you might find more useful a Xpath or Xquery expressions for this. That it's a better approach to match xml than regex.
I'm searching some xml and my tool is regex.
That's a bit like saying that you are cutting down trees and your tool is a screwdriver. Get the right tool for the job: an XML parser and an XPath engine.

Regex capture words inside tags

Given an XML document, I'd like to be able to pick out individual key/value pairsfrom a particular tag:
<aaa>key0:val0 key1:val1 key2:va2</aaa>
I'd like to get back
key0:val0
key1:val1
key2:val2
So far I have
(?<=<aaa>).*(?=<\/aaa>)
Which will match everything inside, but as one result.
I also have
[^\s][\w]*:[\w]*[^\s] which will also match correctly in groups on this:
key0:val0 key1:val1 key2:va2
But not with the tags. I believe this is an issue with searching for subgroups and I'm not sure how to get around it.
Thanks!
You cannot combine the two expressions in the way you want, because you have to match each occurrence of "key:value".
So in what you came up with - (?<=<abc>)([\w]*:[\w]*[\s]*)+(?=<\/abc>) - there are two matching groups. The bigger one matches everything inside the tags, while the other matches a single "key:value" occurrence. The regex engine cannot give each individual occurence because it does not work that way. So it just gives you the last one.
If you think in python, on the matcher object obtained after applying you regex, you will have access to matcher.group(1) and matcher.group(2), because you have two matching ( ) groups in the regex.
But what you want is the n occurences of "key:value". So it's easier to just run the simpler \w+:\w+ regex on the string inside the tags.
I uploaded this one at parsemarket, and I'm not sure its what you are looking for, but maybe something like this:
(<aaa>)((\w+:\w+\s)*(\w+:\w+)*)(<\/aaa>)
AFAIK, unless you know how many k:v pairs are in the tags, you can't capture all of them in one regex. So, if there are only three, you could do something like this:
<(?:aaa)>(\w+:\w+\s*)+(\w+:\w+\s*)+(\w+:\w+\s*)+<(?:\/aaa)>
But I would think you would want to do some sort of loop with whatever language you are using. Or, as some of the comments suggest, use the parser classes in the language. I've used BeautifulSoup in Python for HTML.

If duplicate within brackets, delete one of the lines

Hi i have a long list of items (~6k), that comes in this format:
'Entry': ['Entry'],
What i want to do, is if within the first bracket, the words match, i.e.:
'ACT': ['KOSOV'],
'ACT': ['STIG'],
I want it to leave only one of the entries, it doesn't matter which entry the first the second or whatever, i just need it to leave one of them.
If possible I would like to accomplish that by sublime, or notepad++ using regexp and if there is no way then do whatever you think is best to solve this.
UPD: The AWK command did the job indeed, thank you
You can't solve this using just regular expressions. You either need to remember all entries you've seen so far while scanning the text (would require writing a small utility program, probably), or you could sort the entries and then remove any repeated entries.
If you have a sorted file, then you can solve it using a regular expression, such as this one:
^(([^:]+):.+\n)(?:\2.+\n)+
Replace with \1. See it in action here

Regex, how to select all items outside of selection group

I'm a Regex noob and am pretty sure I'm not going about this in the most efficient way - wanted to get some advice.
I have a Regex expression ((\w+\b.*?){100}){1} which selects the first 100 words of my string, the length of which varies.
What I want is to select the entire string except for the first 100 words.
Is there syntax I can add to my current expression to do this, or am I better off trying to directly select the rest of the text instead.
Also, if anyone has any good resources for improving my Regex knowledge, i'd be very appreciative. Thus far I've found http://gskinner.com/RegExr/ to be very helpful.
Thanks in advance!
If you use this, you can refer to everything else as group 3 noted as $3
This one will treat hyphenated words as one word.
(\w+(-\w+|\b).*?){100}(.*)
Regex training Here

Need to create a gmail like search syntax; maybe using regular expressions?

I need to enhance the search functionality on a page listing user accounts. Rather than have multiple search boxes for each possible field, or a drop down menu where the user can only search against one field, I'd like a single search box and to use a gmail like syntax. That's the best way I can describe it, and what I mean by a gmail like search syntax is being able to type the following into the input box:
username:bbaggins type:admin "made up plc"
When the form is submitted, the search string should be split into it's separate parts, which will allow me to construct a SQL query. So for example, type:admin would form part of the WHERE clause so that it would find any record where the field type is equal to admin and the same for username. The text in quotes may be a free text search, but I'm not sure on that yet.
I'm thinking that a regular expression or two would be the best way to do this, but that's something I'm really not good at. Can anyone help to construct a regular expression which could be used for this purpose? I've searched around for some pointers but either I don't know what to search for or it's not out there as I couldn't find anything obvious. Maybe if I understood regular expressions better it would be easier :-)
Cheers,
Adam
No, you would not use regular expressions for this. Just split the string on spaces in whatever language you're using.
You don't necessarily have to use a regex. Regexes are powerful, but in many cases also slow. Regex also does not handle nested parameters very well. It would be easier for you to write a script that uses string manipulation to split the string and extract the keywords and the field names.
If you want to experiment with Regex, try the online REGex tester. Find a tutorial and play around, it's fun, and you should quickly be able to produce useful regexes that find any words before or after a : character, or any sentences between " quotation marks.
thanks for the answers...I did start doing it without regex and just wondered if a regex would be simpler. Sounds like it wouldn't though, so I'll go back to the way I was doing it and test it again.
Good old Mr Bilbo is my go to guy for any naming needs :-)
Cheers,
Adam