How can I find a string in a cell containing a link? - regex

I have some cells in openoffice calc which contain links/URLs. They display, of course, in calc as text, and hovering the mouse shows the URL. Clicking on those cells brings up the URL referenced.
I want to match a string in the displayed text. The below shows the spreadsheet:
spreadsheet
Cell A1 contains the string searched for.
Cells A4:A7 contain the links/URLs.
Cells B4:B7 are copies of A4:A7 but with Default format to remove the link/URLs. Cell B3 contains my match formula, which successfully finds the string in B4:B7.
I've tried the following in cell A3 to find the string in A4:A7
`=MATCH("^"&A1&".*";B4:B7;0)` #only works on the default formatted cells.
`=MATCH(".*"&A1&".*";A4:A7;0)` #
`=MATCH(A1&".*";A4:A7;0)` #
`=MATCH(A1;A4:A7;0)` #
Also, tried several other regular expressions, none of which work. Yes, I'm rusty on regex's, but what am I doing wrong? Or, is the literal string actually not present in the search field unless I change the format?

All the problems with the searches were caused by the fact that
'Search criteria = and <> must apply to whole cells'
was enabled in Tools->Options->Openoffice Calc->Calculate.
Turning this setting off makes everything work as advertised. The clue was that the regex ".*"&A1&".*", which of course matches a full line of plain text, worked with the range B4:B7.
The simplest solution is the expression:
=MATCH(""&A1;A4:A7;0) # "" invoked to trigger regex

Related

How do I use regex to return text following specific prefixes?

I'm using an application called Firemon which uses regex to pull text out of various fields. I'm unsure what specific version of regex it uses, I can't find a reference to this in the documentation.
My raw text will always be in the following format:
CM: 12345
APP: App Name
BZU: Dept Name
REQ: First Last
JST: Text text text text.
CM will always be an integer, JST will be sentence that may span multiple lines, and the other fields will be strings that consist of 1-2 words - and there's always a return after each section.
The application, Firemon, has me create a regex entry for each field. Something simple that looks for each prefix and then a return should work, because I return after each value. I've tried several variations, such as "BZU:\s*(.*)", but can't seem to find something that works.
EDIT: To be clear I'm trying to get the value after each prefix. Firemon has a section for each field. "APP" for example is a field. I need a regex example to find "APP:" and return the text after it. So something as simple as regex that identifies "APP:", and grabs everything after the : and before the return would probably work.
You can use (?=\w+ )(.*)
Positive lookahead will remove prefix and space character from match groups and you will in each match get text after space.
I am a little late to the game, but maybe this is still an issue.
In the more recent versions of FireMon, sample regexes are provided. For instance:
jst:\s*([^;]?)\s;
will match on:
jst:anything in here;
and result in
anything in here

Trying to select all text except for Twitter handle in Regex

I've exhausted everything I could find and just can't seem to get this to work. I have a .txt with rows of Twitter posts and I'm trying to delete everything but the #handles mentioned in the text.
For example:
Row1: This is the text of the tweet #Handle1
Row2: This text is meant for #Handle2 and #Handle3
Would result in:
Row1: #Handle1
Row2: #Handle2 #Handle3
I've come up with a regex expression to select the handles as: #[^\W]*
That works for all the handles in the set even if they have a colon or period immediately after them without a space (happens often).
I tried adding the negative lookahead command to it: (?!(#[^\W]*))
But I don't really know what else to add to make it work?
Thanks!
So you can loop through each row, and scan for the twitter handles.
For example,
str = "This text is meant for #Handle2 and #Handle3"
str.scan(/#\w+/).to_a #=> ["#Handle2", "#Handle3"]
Then you can manipulate the array however you want.
the \w is any alphanumeric and underscore character, you can modify that if you need any other characters.

kimonolabs >Text before comma

I'm trying to scrape a piece of text from a website using Kimonolabs. The text is succesfully scraped using the advanced setting:
div > div > ul > li.location > span.value
The text being scraped using this CSS selector is:
Cityname, streetname 1
However, I wish to delete everything before the comma so that only remains:
Cityname
I wish to do this with regex, but I'm totally ignorant about it. What I do konw is that it has to containof 3 blocks when using Kimonolabs: https://help.kimonolabs.com/hc/en-us/articles/203043464-Manually-input-regular-expressions
Can anybody help me setting up the correct regex? All I got so far is the following, but it's not the correct markup for Kimonolabs (it doesn't allow for it in the dashboard):
^(.+?),
See the docs you referred to:
The regular expression pattern in kimono is defined in three parts. It's important that any custom regular expression you produce retains the three part notation, with the surrounding ( ) for each part. The first part refers to the pattern to the left of the desired content. The middle part refers to the pattern that the desired content must match and the third part refers to the pattern to the right of the desired content.
So, you seem to need:
/^()([^,]+)()/
Or, /(^)([^,]+)(,)/ (it should be equivalent), and the 2nd capture group (the middle part) should capture the Cityname.

RegEX: Matching everything but a specific value

How do i match everything in an html response but this piece of text
"signed_request" value="The signed_request is placed here"
The fast solution is:
^(.*?)"signed_request" value="The signed_request is placed here"(.*)$
If value can be random text you could do:
^(.*?)"signed_request" value="[^"]*"(.*)$
This will generate two groups that.
If the result was not successful the text does not contain the word.
If the text contains the text more than once, it is only the first time that is ignored.
If you need to remove all instances of the text you can just as well use a replace string method.
But usually it is a bad idea to use regex on html.

Django custom template filter to highlight a column in a block of text

I'm rendering a list in an HTML template using {{ my_list | join:"<\br>"}} , and it appears as...
$GPGGA,062511,2816.8178,S,15322.3185,E,6,04,2.6,72.6,M,37.5,M,,*68
$GPGGA,062512,2816.8177,S,15322.3184,E,1,04,2.6,72.6,M,37.5,M,,*62
$GPGGA,062513,2816.8176,S,15322.3181,E,1,04,2.6,72.6,M,37.5,M,,*67
$GPGGA,062514,2816.8176,S,15322.3180,E,1,03,2.6,72.6,M,37.5,M,,*66
$GPGGA,062515,2816.8176,S,15322.3180,E,6,03,2.6,72.6,M,37.5,M,,*60
I am attempting to use regular expressions to insert the CSS at the 4th and 5th commas so I can highlight the text in this column, however I'm not able to figure out the expression to do this. Other methods to achieve this also appreciated.
Other info:
1) each line ends with a '\n'. Although this can be removed and the HTML display is unchanged, I've left it in for the regular expression to use if required.
2) The string will not always have a nice header such as '$GPGGA' in this example, although I could add one to help ID the start of the line if required by the regex.
3) The columns may not be a uniform number of characters as indicated in this example.
The filters I'm working on are as follows
#register.filter(is_safe=True)
def highight_start(text):
return re.sub('regex to find 4th comma in each line', ",<span class='my_highlight'>", text, flags=re.MULTILINE)
#register.filter(is_safe=True)
def highight_end(text):
return re.sub('regex to find 5th comma in each line', "</span>,", text, flags=re.MULTILINE)
Regards
You can achieve that by replacing the 5th value with the value itself wrapped in your <span> tags.
RegEx: ^((?:[\w\d\.\$]+,){4})([\d\.]+)
Replacement: \1<span class='my_highlight'>\2</span>
Explained demo here: http://regex101.com/r/cX5iA0
Note: I assumed the 5th value will be digits and dots
Thanks #ka, who got me ontrack with this solution. My working filter uses:
expression = '^((?:[^,]+,){4})([^,]+)'
replace = r'\g<1><span class="my_highlight">\g<2></span>'
#[^,] also allows matching of hidden HTML tags in the text
#To get the groups to insert back into the text and not be overwritten, they need to be referenced as indicated in 'replace'.