regular expression to parse image url from a html code

regular expression to parse image url from a html code - regex

Hello guys i wrote a module that creates some articles and css them properly and i want to parse the images from the Article content. The first thought came to my mind was the regular expressions. i didnt have any idea till 3 hours i started reading tutorials about regexp and i made a pattern that for me it seems kinda ok.
$pattern='^src\="images\/([a-zA-Z]+|[0-9]+)+([a-zA-Z]*|[0-9]*)*\.[jpg|png|bmp|gif]"$';
$regstring=$introtext;
preg_match($pattern,$regstring,$matches);
var_dump($matches);
INPUT:
<p>ASDADSDSASADSADSASDADSDSASADSADSASDADSDSASA</p>
<p><img src="images/authentic.jpg" alt="authentic" /></p>
<p>SASDADSDSASADSADSASDADSDSASADSADS</p>
I kinda found alot of ready patters in stackoverflow that are completly different than mine, and i didnt want to just copy some lines that i dont know what they do.Also i found out ten mins before that i can do that With DOM html , but im stubborn to make it work with regexpr so i can learn something more about it.
Can someone help me find what is my mistake/s ?
Thanks for your time.

src="images\/[a-zA-Z0-9]+\.(?:jpg|png|bmp|gif)"
You can try this.A simple version of your regex.See demo.
http://regex101.com/r/oE6jJ1/36

$pattern='/(?<=[\'\"])[\w\/-]+[.]{1}[a-zA-Z]{3,4}(?=[\'\"])/i';
$regstring=$introtext;
preg_match_all($pattern,$regstring,$matches);
var_dump($matches[0]);
you can see how this works here: http://regex101.com/r/eV6gE4/1

Use a proper solution, and please, stop killing kitties (every-time you try to parse HTML with regex, you kill a kittie) when you install perl module WWW::Mechanize, the command mech-dump become available :
$ mech-dump --images http://stackoverflow.com/questions/27151348
http://i.stack.imgur.com/qF63b.jpg?s=32&g=1
//cdn.sstatic.net/stackoverflow/img/apple-touch-icon.png
http://i.stack.imgur.com/nyAHT.jpg?s=32&g=1
/posts/27151348/ivc/677a
http://pixel.quantserve.com/pixel/p-c1rF4kxgLUzNc.gif

Related

Regex: How to extract dialogue tags from fiction, with speaker information

Totally stumped on this. I need help extracting dialogue from a story so I can hand it off for narration.
Basically, this is a problem where I have a big chunk of text (a novel), and I want to extract all the dialogue from the text in a format I can pipe into a spreadsheet.
But, I also want, if it exists, the speaker information as well. So, given a string like:
'"I'm really hungry," she said.'
I would like the values returned as:
[ "I'm really hungry", "she said" ]
If there is no dialogue, as in this example:
"I'm not hungry."
the result would just be:
["I'm really hungry."]
Is this madness? Is it even possible? I have fooled around with this regex (am not a regex guru, knowing only enough to be dangerous):
"([^"]*)"
Which seems to get the dialogue tags, but doesn't get the speaker info. Any advice in how to get the speaker info as well would be greatly appreciated. I've been wrestling with this for awhile now.
Maybe a better approach would be to get the dialogue in one field, and the entire paragraph it is found in as the second field. That could also work, but I have no idea where to start with this.
Basically I want to put these all into a spreadsheet so I can hand them off to a narrator with enough context that they know whose dialogue is who's in the story.
Any help is greatly appreciated!

It definitely is possible
Look at this regex: ^.*?'?(?P<line>\".*\")(?P<actor>[^'\n]*)'?.*?$
demo here: https://regex101.com/r/UCRZwY/5
It basically marks the outer quotes as optional, but if it does find them, stores whatever provided as '$actor' (and the line as '$line') these are of course just names i've given them, feel free to change
Note updated to include such text as part of regular sentence, see example in demo

Regex to emulate GitHub autolink references in Markdown

What would be the regex emulating GitHub's autolinked references?
It takes Markdown on input and outputs enriched Markdown where strings like #123 are converted to [#123](https://github.com/owner/repo/issues/123).
These are some examples of the transformations that I'd like the regex to do:
Input:
1. #123
2. https://github.com/owner/repo/issues/123
3. https://github.com/shoptet/sofa/pull/456
4. owner/repo#123
5. https://github.com/owner/repo/issues/123#issuecomment-123456789
Output:
1. [#123](https://github.com/owner/repo/issues/123)
2. [#123](https://github.com/owner/repo/issues/123)
3. [#123](https://github.com/owner/repo/pull/456)
4. [owner/repo#123](https://github.com/owner/repo/issues/123)
5. [#123 (comment)](https://github.com/owner/repo/issues/123#issuecomment-123456789)
I'd prefer one giant regex if possible (I know it's not going to be nice but would allow me to process Markdown in a couple of my favorite editors directly).

If you don't mind changing the format a little (using [#123-comment] instead of [#123 (comment)] for comments), you may use this:
(?:(owner/repo)?#(\d+)\b|https?://github\.com/([^/]+/[^/]+/(?:issues|pull))/(\d+)(#issue(comment)(-)\d+)?)
Replace by: [\1#\2\4\7\6](https://github.com/owner/repo/issues/\2\4\5)
You have a demo here.

I'd still prefer a (complex) regex but if anyone is looking for the same post-processing like me, this package can solve it in a Node.js script:
https://github.com/remarkjs/remark-github

VSCode Snippets: Format File Name from my_file_name to MyFileName

I am creating custom snippets for flutter/dart. My goal is to pull the file name (TM_FILENAME_BASE) remove all of the underscores and convert it to PascalCase (or camelCase).
Here is a link to what I have learned so far regarding regex and vscode's snippets.
https://code.visualstudio.com/docs/editor/userdefinedsnippets
I have been able to remove the underscores nicely with the following code
${TM_FILENAME_BASE/[\\_]/ /}
I can even make it all caps
${TM_FILENAME_BASE/(.*)/${1:/upcase}/}
However, it seems that I cannot do two steps at a time. I am not familiar with regex, this is just me fiddling around with this for the last couple of days.
If anyone could help out a fellow programmer just trying make coding simpler, it would be really appreciated!
I expect the output of "my_file_name" to be "MyFileName".

It's as easy as that: ${TM_FILENAME_BASE/(.*)/${1:/pascalcase}/}

For the camelCase version you mentioned, you can use:
${TM_FILENAME_BASE/(.*)/${1:/camelcase}/}

How do I Build a Regex Expression to Find String

I've been studying content on the regex topic, but am having trouble understanding how to make it work! I need to build a regex to locate a particular string, potentially in multiple places throughout numerous log files. If I were keying the search expression into a text editor, it would look like this...
*Failed to Install*
Following is a typical example of a line containing the string I would like to search for (exit code # will vary)
!!! Failed to install, with exit code 1603
I would really appreciate any help on how to build the regex for this. I suspect I might need the end of line character too?
I plan on using it in a variation of the script that was provided by https://stackoverflow.com/users/3142139/m-hassan in the following thread
Use PowerShell to Quickly Search Files for Regex and Output to CSV
I'm a newbie to powershell scripts, but I'd rather spend the time to figure this out, than pour over hundreds of log files!
Thanks,
Jim

You're in luck - You only require very simple regex for this. Assuming you want to capture the error code, this will work fine:
^.*Failed to install.*(exit code \d+)$
Try it online!
If you don't care about the error code, and just want to know if it failed or not, you can honestly get away with something as simple as:
^.*Failed to install.*$
Hope this helps.

Notepad++ Find/Replace Wildcard problems

Hopefully this is simple because I can't seem to figure it out.
I have a game that outputs a log with information I'd like to review, but it's bogged with tags.
<color=#9B9B9BFF>abndnd_b9o66v</color>.<color=#1EFF00FF>out_ys0a67</color>
<color=#9B9B9BFF>uknown_ospiw8</color>.<color=#1EFF00FF>p_vyuxzb</color>
<color=#9B9B9BFF>anonymous_yzgoqq</color>.<color=#1EFF00FF>pub_info_o1rotu</color>
<color=#9B9B9BFF>unidentified_t7stef</color>.<color=#1EFF00FF>out_gems04</color>
<color=#9B9B9BFF>abndnd_5vs06o</color>.<color=#1EFF00FF>public_7gshh2</color>
<color=#9B9B9BFF>anon_7kq2k4</color>.<color=#1EFF00FF>pub_wxn46t</color>
<color=#9B9B9BFF>anon_i83kkg</color>.<color=#1EFF00FF>info_ev39gs</color>
I can simply filter it by hand, but I know a regex may be able to help, I just can't seem to figure out the syntax correctly and how to trim the tags without tampering with the needed text
and my end result I'm trying to get is this:
abndnd_b9o66v.out_ys0a67
uknown_ospiw8.p_vyuxzb
anonymous_yzgoqq.pub_info_o1rotu
unidentified_t7stef.out_gems04
abndnd_5vs06o.public_7gshh2
anon_7kq2k4.pub_wxn46t
anon_i83kkg.info_ev39gs

Try this:
<color=.*?>(.*?)</color>\.<color=.*?>(.*?)</color>
Replace by this:
\1\.\2

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

regular expression to parse image url from a html code - regex

src="images\/[a-zA-Z0-9]+\.(?:jpg|png|bmp|gif)" You can try this.A simple version of your regex.See demo. http://regex101.com/r/oE6jJ1/36

$pattern='/(?<=[\'\"])[\w\/-]+[.]{1}[a-zA-Z]{3,4}(?=[\'\"])/i'; $regstring=$introtext; preg_match_all($pattern,$regstring,$matches); var_dump($matches[0]); you can see how this works here: http://regex101.com/r/eV6gE4/1

Related

Regex: How to extract dialogue tags from fiction, with speaker information

Regex to emulate GitHub autolink references in Markdown

VSCode Snippets: Format File Name from my_file_name to MyFileName

How do I Build a Regex Expression to Find String

Notepad++ Find/Replace Wildcard problems

Categories

Resources