Regex - return only first src url match - regex

I'm trying to extract the first jpg from a page of text. multiple paragraphs and multiple urls in each, but i only want the first url/jpg, stop after first is matched/returned.
sample page;
this is some text and a url src="https://www.someurl.jpg" more text, more text, more text.
more text, more text
more text, more text.
this is some text and a url src="https://www.anotherurl.jpg" more text, more text, more text.
more text, more text.
Current Code;
(?<=src=")(.*?)(?=")
This code returns both urls. I need the output to be just the first one it finds and stop there, just return the first.
Output required;
https://www.someurl.jpg
any help appreciated.

Your regex is quite good, just add surroundings and g flag.
/(?<=src=")(.*?)(?=")/g
Now You gave correct regex.
console.log(`I'm trying to extract the first jpg from a page of text. multiple paragraphs and multiple urls in each, but i only want the first url/jpg, stop after first is matched/returned.
sample page;
this is some text and a url src="https://www.so23123123123l.jpg" more text, more text, more text. more text, more text
more text, more text. this is some text and a url src="https://www.anotherurl.jpg" more text, more text, more text. more text, more text.`.match(/(?<=src=")(.*?)(?=")/ig));
You can read here about regexp flags.

Related

Can find match with regex

Hi I'm trying to find line start with "CGK / WIII" but just can find the the first line?
What's wrong with my text? (it is rendered from a pdf file)
Mytext
I am coding with Python to extract data from pdf invoice to dataframe with invoice2data package, and face an error with one text rendered from one pdf file.
First I tried with regex: \w{3}\s\/[\s\w{4}]* and found out that it just can find 1 line.
Then I also tried with fix text "CGK / WIII" should found 4 match. But it's NOT.
I think there are font differences in my text but not sure.
When I turn on global - Don't return after the first match in your linked example, it shows 4 matches.
Also you can not use quantifiers {4} inside a character set (inside []).
I'd do it like this:
\w{3}\s/\s\w{4}

Getting Only The Text and Not Any of the Trailing Blank Lines from a Text Widget in Python 2.7

I'm working with Python 2.7 and tkinter.
I have a text widget that I fill with lines of text where each line is terminated with a "\n" from a file. The text in the Text widget may be modified later.
Now I want to get only the text from the Text widget and ignore any trailing blank line that may be present. The get() method will get everything to the end of the Text widget including any trailing blank lines that may be present.
How can I get the text and not the trailing blank lines?
The text widget doesn't have a good way to filter out the results you get from get. You can get all but the last newline that is automatically added by tkinter by using the index end-1c, but if you have multiple blank lines at the end, the easiest way is to strip out the newlines after fetching the data:
data = the_widget.get("1.0", "end-1c").rstrip("\n")
Another way would be to use the search method going backwards from the end, with a regular expression that finds the first non-blank line. You can then get all the text up to the end of that line:
index = text.search('^.+', "end", backwards=True, regexp=True)
data = text.get("1.0", f"{index} lineend") if index else ""

VBscript regular expression

There is a txt file containing multiple lines with - Browser("something").page("something_else").webEdit("some").
I need to retrieve the names of the browser, page and fields (names surrounded by double quotes ) and replace the line with "something_somethingelse_some" (concatinating the names of the browser, page n filed respectively), please help.
the names can be anything so we should go with regex. Note we have to convert everything comes in the above format within the text file till the EOF..
You may try this:
^Browser\("(.*?)"\).page\("(.*?)"\).webEdit\("(.*?)"\).*$
and replace by:
$1_$2_$3
Regex Demo

Need to highlight particular pattern of text

I have a file with some paragraphs, what i want to do is to highlight certain pattern of text/words occurring in the text file with background yellow and text color black.
pattern = ["enough", "too much"];
Text file = "text.txt";
and show it on a webpage with highlighted text for enough and too much words in the text file.
I want to use perl to do this task.
Please tell me how i can do this in optimized way.
Make array of all the words you want to highlight.
Save input file in $file variable.
run foreach on that array and use regular expression to replace the word with word+HTML tag.
ie...
foreach(#words)
{
$file=~r/$/< font color=black, bgcolor=yellow>$< /font>/g;
}
save the $file again as a file with .html or .htm extension.
This was more like logic question than technical i guess.

RegEX: Matching everything but a specific value

How do i match everything in an html response but this piece of text
"signed_request" value="The signed_request is placed here"
The fast solution is:
^(.*?)"signed_request" value="The signed_request is placed here"(.*)$
If value can be random text you could do:
^(.*?)"signed_request" value="[^"]*"(.*)$
This will generate two groups that.
If the result was not successful the text does not contain the word.
If the text contains the text more than once, it is only the first time that is ignored.
If you need to remove all instances of the text you can just as well use a replace string method.
But usually it is a bad idea to use regex on html.