RegEX: Matching everything but a specific value - regex

How do i match everything in an html response but this piece of text
"signed_request" value="The signed_request is placed here"

The fast solution is:
^(.*?)"signed_request" value="The signed_request is placed here"(.*)$
If value can be random text you could do:
^(.*?)"signed_request" value="[^"]*"(.*)$
This will generate two groups that.
If the result was not successful the text does not contain the word.
If the text contains the text more than once, it is only the first time that is ignored.
If you need to remove all instances of the text you can just as well use a replace string method.
But usually it is a bad idea to use regex on html.

Related

How do I use regex to return text following specific prefixes?

I'm using an application called Firemon which uses regex to pull text out of various fields. I'm unsure what specific version of regex it uses, I can't find a reference to this in the documentation.
My raw text will always be in the following format:
CM: 12345
APP: App Name
BZU: Dept Name
REQ: First Last
JST: Text text text text.
CM will always be an integer, JST will be sentence that may span multiple lines, and the other fields will be strings that consist of 1-2 words - and there's always a return after each section.
The application, Firemon, has me create a regex entry for each field. Something simple that looks for each prefix and then a return should work, because I return after each value. I've tried several variations, such as "BZU:\s*(.*)", but can't seem to find something that works.
EDIT: To be clear I'm trying to get the value after each prefix. Firemon has a section for each field. "APP" for example is a field. I need a regex example to find "APP:" and return the text after it. So something as simple as regex that identifies "APP:", and grabs everything after the : and before the return would probably work.
You can use (?=\w+ )(.*)
Positive lookahead will remove prefix and space character from match groups and you will in each match get text after space.
I am a little late to the game, but maybe this is still an issue.
In the more recent versions of FireMon, sample regexes are provided. For instance:
jst:\s*([^;]?)\s;
will match on:
jst:anything in here;
and result in
anything in here

Trying to select all text except for Twitter handle in Regex

I've exhausted everything I could find and just can't seem to get this to work. I have a .txt with rows of Twitter posts and I'm trying to delete everything but the #handles mentioned in the text.
For example:
Row1: This is the text of the tweet #Handle1
Row2: This text is meant for #Handle2 and #Handle3
Would result in:
Row1: #Handle1
Row2: #Handle2 #Handle3
I've come up with a regex expression to select the handles as: #[^\W]*
That works for all the handles in the set even if they have a colon or period immediately after them without a space (happens often).
I tried adding the negative lookahead command to it: (?!(#[^\W]*))
But I don't really know what else to add to make it work?
Thanks!
So you can loop through each row, and scan for the twitter handles.
For example,
str = "This text is meant for #Handle2 and #Handle3"
str.scan(/#\w+/).to_a #=> ["#Handle2", "#Handle3"]
Then you can manipulate the array however you want.
the \w is any alphanumeric and underscore character, you can modify that if you need any other characters.

How to extract FirstName and LastName from html tags with regex?

I have response body which contains
"<h3 class="panel-title">Welcome
First Last </h3>"
I want to fetch 'First Last' as a output
The regular expression I have tried are
"Welcome(\s*([A-Za-z]+))(\s*([A-Za-z]+))"
"Welcome \s*([A-Za-z]+)\s*([A-Za-z]+)"
But not able to get the result. If I remove the newline and take it as
"<h3 class="panel-title">Welcome First Last </h3>" it is detecting in online regex maker.
I suspect your problem is the carriage return between "Welcome" and the user name. If you use the "single-line mode" flag (?s) in your regex, it will ignore newlines. Try these:
(?s)Welcome(\s*([A-Za-z]+))(\s*([A-Za-z]+))
(?s)Welcome \s*([A-Za-z]+)\s*([A-Za-z]+)
(this works in jMeter and any other java or php based regex, but not in javascript. In the comments on the question you say you're using javascript and also jMeter - if it is a jMeter question, then this will help. if javaScript, try one of the other answers)
Well, usually I don't recommend regex for this kind of work. DOM manipulation plays at its best.
but you can use following regex to yank text:
/(?:<h3.*?>)([^<]+)(?:<\/h3>)/i
See demo at https://regex101.com/r/wA2sZ9/1
This will extract First and Last names including extra spacing. I'm sure you can easily deal with spaces.
In jmeter reg exp extractor you can use:
<h3 class="panel-title">Welcome(.*?)</h3>
Then take value using $1$.
In the data you shown welcome is followed by enter.If actually its part of response then you have to use \n.
<h3 class="panel-title">Welcome\n(.*?)</h3>
Otherwise above one is enough.
First verify this in jmeter using regular expression tester of response body.
Welcome([\s\S]+?)<
Try this, it will definitely work.
Regular expressions are greedy by default, try this
Welcome\s*([A-Za-z]+)\s*([A-Za-z]+)
Groups 1 and 2 contain your data
Check it here

Contents within an attribute for both single and multiple ending tags

How can I fetch the contents within value attribute of the below tag across the files
<h:graphicImage .... value="*1.png*" ...../>
<h:graphicImage .... value="*2.png*" ....>...</h:graphicImage>
My regular expression search result should result into
1.png
2.png
All I could find was content for multiple ending tags but what about the single ending tags.
Use an XML parser instead, regex cannot truly parse XML properly, unless you know the input will always follow a particular form.
However, here is a regex you can use to extract the value attribute of h:graphicImage tags, but read the caveats after:
<h:graphicImage[^>]+value="\*(.*?)\*"
and the 1.png or 2.png will be in the first captured group.
Caveats:
here I have assumed that your 1.png, 2.png etc are always surrounded by asterisks as that is what it seems from your question (that is what the \* is for)
this regex will fail if one of the attributes has a ">" character in it, for example
<h:graphicImage foo=">" value="*1.png*"
This is what I mentioned before about regex never being able to parse XML properly.
You could work around this by adjusting your regex:
<h:graphicImage.+?+value="\*(.*?)\*"
But this means that if you had <h:graphicImage /><foo value="*1.png*"> then the 1.png from the foo tag is extracted, when you only want to extract from the graphicImage tag.
Again, regex will always have issues with corner cases for XML, so you need to adjust according to your application (for example, if you know that only the graphicImage tag will ever have a "value" attribute, then the second case may be better than the first).

Using regexp with an html string to extract text

I have the following html string:
F.V.Adamian, G.G.Akopian
I want to form a single plain text string with the author names so that it looks something like (I can fine tune the punctuation later):
F.V.Adamian, G.G.Akopian.
I'm trying to use 'regexp' in Matlab. When I do the following:
regexpi(htmlstring,'">.*</a>','match')
I get:
">F.V.Adamian</a>, G.G.Akopian,
Why? I'm trying to get it to continuously output (hence I did not use the 'once' operator) all characters between "> and , which is the author's name. It works fine for the first one but not for the second. I am happy to truncate the "> and with a regexprep(regexpstring,'','') later.
I see that regexprep(htmlstr, '<.*?>','') works and does what I want. But I don't get it...
In .*? the ? is telling the .* to be lazy as opposed to greedy. By default, .* will try to match the largest thing it can. When you add the ? it instead goes for the smallest thing it can
source