Trying to select all text except for Twitter handle in Regex - regex

I've exhausted everything I could find and just can't seem to get this to work. I have a .txt with rows of Twitter posts and I'm trying to delete everything but the #handles mentioned in the text.
For example:
Row1: This is the text of the tweet #Handle1
Row2: This text is meant for #Handle2 and #Handle3
Would result in:
Row1: #Handle1
Row2: #Handle2 #Handle3
I've come up with a regex expression to select the handles as: #[^\W]*
That works for all the handles in the set even if they have a colon or period immediately after them without a space (happens often).
I tried adding the negative lookahead command to it: (?!(#[^\W]*))
But I don't really know what else to add to make it work?
Thanks!

So you can loop through each row, and scan for the twitter handles.
For example,
str = "This text is meant for #Handle2 and #Handle3"
str.scan(/#\w+/).to_a #=> ["#Handle2", "#Handle3"]
Then you can manipulate the array however you want.
the \w is any alphanumeric and underscore character, you can modify that if you need any other characters.

Related

Get an exact regex match of an email value from a list of email addresses

I have a text field which stores a list of email addresses e.g: x#demo.com; a.x#demo.com. I have another text field which stores the exact value matched from the list of emails i.e. if /x#demo.com/i is in x#demo.com;a.x#demo.com then it should return x#demo.com.
The issue I am having is that if I have /a.x#demo.com/i, I will get x#demo.com instead of a.x#demo.com
I know of the regex expression /^x#demo.com$/i, but this means I can only have one email in my list of email addresses which won't help.
I have tried a couple of other regex expressions with no luck.
Any ideas on how I can achieve this?
You can use this slightly changed regex:
/(^|;)x#demo.com($|;)/i
It will match from either beginning of string or start after a semi colon and end either at end of string or at a semi colon.
Edit:
Small change, this uses look behind and look forward, then you will only get the match, you want:
(?<=^|;)x#demo.com(?=$|;)
Edit2:
To allow Spaces around the semi colon and at start and end, use this (#-quoted):
#"(?<=^\s*|;\s*)x#demo.com(?=\s*$|\s*;)"
or use double escaping:
"(?<=^\\s*|;\\s*)x#demo.com(?=\\s*$|\\s*;)"

How do I use regex to return text following specific prefixes?

I'm using an application called Firemon which uses regex to pull text out of various fields. I'm unsure what specific version of regex it uses, I can't find a reference to this in the documentation.
My raw text will always be in the following format:
CM: 12345
APP: App Name
BZU: Dept Name
REQ: First Last
JST: Text text text text.
CM will always be an integer, JST will be sentence that may span multiple lines, and the other fields will be strings that consist of 1-2 words - and there's always a return after each section.
The application, Firemon, has me create a regex entry for each field. Something simple that looks for each prefix and then a return should work, because I return after each value. I've tried several variations, such as "BZU:\s*(.*)", but can't seem to find something that works.
EDIT: To be clear I'm trying to get the value after each prefix. Firemon has a section for each field. "APP" for example is a field. I need a regex example to find "APP:" and return the text after it. So something as simple as regex that identifies "APP:", and grabs everything after the : and before the return would probably work.
You can use (?=\w+ )(.*)
Positive lookahead will remove prefix and space character from match groups and you will in each match get text after space.
I am a little late to the game, but maybe this is still an issue.
In the more recent versions of FireMon, sample regexes are provided. For instance:
jst:\s*([^;]?)\s;
will match on:
jst:anything in here;
and result in
anything in here

Oracle - Search for text - Retrieve snippet of result

I'm currently building a simple search page in Node JS Express and Oracle.
I'd like to show the user a snippet of the matching text (first instance would do) to add a bit context of what the SQL found.
Example:
Search term: 'fish'
Results: Henry really likes going fishing, and once he caug ...
I'm not sure the best way to approach this - I could retrieve the whole block of text and do it in Node JS, but I don't really like the idea of dragging the whole text across to the app, just to get a snippet.
I've been thinking that REGEXP_SUBSTR could be way to do it... But I'm not sure whether I could use a regular expression to retrieve x amount of characters before and after the matching word.
Have I got the right idea or am I going about it in the wrong way?
Thanks
SELECT text
, REGEXP_SUBSTR(LOWER(text), LOWER('fish')) AS potential_snippet
FROM table
WHERE LOWER(text) LIKE LOWER('%fish%');
Try this:
select text
, SUBSTR( TEXT, INSTR(LOWER(TEXT),'fish', 1)-50,100 )
FROM test
WHERE INSTR(LOWER(text),'fish', 1)<>0;
Play with the position and length numbers(50 and 100 in my example) to limit the length of the string.
If you need to extract some context with the help of JavaScript, you can use limiting quantifiers in a regex:
/\b.{0,15}fish.{0,15}\b/i
See demo
Here,
\b - matches at the word boundary (so that the context contains only whole words)
.{0,15} - any characters other than a newline (replace with [\s\S] or [^] if you need to include newlines)
fish - the keyword
The /i modifier enables case-insensitive search.
If you need a dynamic regex creation, use a constructor notation:
RegExp("\\b.{0,15}" + keyword + ".{0,15}\\b", "i");
Also, if you need to find multiple matches, use g modifier alongside the i.

RegEX: Matching everything but a specific value

How do i match everything in an html response but this piece of text
"signed_request" value="The signed_request is placed here"
The fast solution is:
^(.*?)"signed_request" value="The signed_request is placed here"(.*)$
If value can be random text you could do:
^(.*?)"signed_request" value="[^"]*"(.*)$
This will generate two groups that.
If the result was not successful the text does not contain the word.
If the text contains the text more than once, it is only the first time that is ignored.
If you need to remove all instances of the text you can just as well use a replace string method.
But usually it is a bad idea to use regex on html.

Using regexp with an html string to extract text

I have the following html string:
F.V.Adamian, G.G.Akopian
I want to form a single plain text string with the author names so that it looks something like (I can fine tune the punctuation later):
F.V.Adamian, G.G.Akopian.
I'm trying to use 'regexp' in Matlab. When I do the following:
regexpi(htmlstring,'">.*</a>','match')
I get:
">F.V.Adamian</a>, G.G.Akopian,
Why? I'm trying to get it to continuously output (hence I did not use the 'once' operator) all characters between "> and , which is the author's name. It works fine for the first one but not for the second. I am happy to truncate the "> and with a regexprep(regexpstring,'','') later.
I see that regexprep(htmlstr, '<.*?>','') works and does what I want. But I don't get it...
In .*? the ? is telling the .* to be lazy as opposed to greedy. By default, .* will try to match the largest thing it can. When you add the ? it instead goes for the smallest thing it can
source