Following this tutorial by Ray Camden...
Trying to fetch the ID from the YouTube URL:
<cfset regex = "^(?:[^?]+\?v=|[^v]+/v/)([^&##/]+).*|http://youtu.be/">;
<cfset videoid = rereplace(u, regex, "\1" ) />
But the youtu.be does not seems to work here; other YouTube URLs seems fine.
How about
<cfset regex = "#(?<=v=)[a-zA-Z0-9-]+(?=&)|(?<=v\/)[^&\n]+(?=\?)|(?<=v=)[^&\n]+|(?<=youtu.be/)[^&\n]+#">;
<cfset videoid = rereplace(u, regex, "\1" ) />
Source PHP Regex to get youtube video ID?
Related
Here my scenario I want to get the cid value without cid: from the img src in mail content. I've the inline image code like <img src="cid:ii_k4ib6vux0" alt="image.png" width="195" height="162">.
Now I want to get the cid value ii_k4ib6vux0. When I try to use the regex cid([^""']+) I've got the value with cid like cid:ii_k4ib6vux0. But I want to get the value only. Please guide me to get the exact values.
Thanks in advance!
You could try this:
<cfset aString = '<img src="cid:ii_k4ib6vux0" alt="image.png" width="195" height="162">' />
<cfset aMatch = REMatch('"cid:([^"]*)"', aString) />
<cfdump var="#replace(aMatch[1], "cid:", "")#" />
Example:
https://trycf.com/gist/9202b3dd2cca2cf0341a594dc007644f/acf2016?theme=monokai
Update:
Using REFind
<cfset aString = '<img src="cid:ii_k4ib6vux0" alt="image.png" width="195" height="162">' />
<cfset aMatch = REFind('"cid:([^"]*)"',aString,1,true,"ALL") />
<cfdump var="#aMatch[1].match[2]#" />
https://trycf.com/gist/809a30fdc16cc6deac6d1034dfb8adc2/acf2016?theme=monokai
In order to capture only that code and not having to check the groups (parts between parenthesis), you can use a positive lookbehind:
(?<=cid:)[^"']+
This will match any character besides ' and " which come after cid:.
Otherwise, using a regex similar to yours (note the : and the removed "):
cid:([^"']+)
You need to check the first group. Depending on what language you are using, the first group may be the whole captured string and you may need to check the second group.
i have the following code, but i am very loose in the regular expression, i am using coldfusion
and i want to remove the code which is inbetween before every next page call
http://beta.mysite.com/?jobpage=2page=2#brands
what i am trying is if jobpage exists, it should remove the jobpage=2 from the URL, {2} is dynamic as it can be one or 2 or 3 and so on.
I tried with listfirst and listlast or gettoken but no help.
This should do it for you
<Cfset myurl = "http://beta.mysite.com/?jobpage=2page=2##brands" />
<cfoutput>#myurl#</cfoutput><br><Br>
<cfset myurl = ReReplaceNoCase(myurl,"(jobpage=[0-9]+[\&]?)","","ALL") />
<cfoutput>#myurl#</cfoutput>
I need to replace the text inside all href values. I think a regular expression is the way to do it, but I'm no regex pro. Any thoughts on how I'd do the following using ColdFusion?
so it is changed to:
Thanks!
Here's an update to the question: I have this code and need the pattern below:
<cfset matches = ReMatch('<a[^>]*href="http[^"]*"[^>]*>(.+?)</a>', arguments.htmlCode) /> <cfdump var="#matches#">
<cfset links = arrayNew(1)>
<cfloop index="a" array="#matches#">
<cfset arrayAppend(links, rereplace(a, 'need regex'," {clickurl}","all"))>
</cfloop>
<cfdump var="#links#">
Here's how to do it with jSoup HTML parser:
<cfset jsoup = createObject('java','org.jsoup.Jsoup') />
<cfset Dom = jsoup.parse( InputHtml ) />
<cfset Dom.select('a[href]').attr('href','{replaced}') />
<cfset NewHtml = Dom.html() />
(On CF9 and earlier, this requires placing the jsoup's jar in CF's lib directory, or using JavaLoader.)
Using a HTML parser is usually better than using regex, not least because it's easier to maintain and understand.
Here's an imperfect way of doing it with a regex:
<cfset NewHtml = InputHtml.replaceAll
( '(?<=<a.{0,99}?\shref\s{0,99}?=\s{0,99}?)(?:"[^"]+|''[^'']+)(["'])'
, '$1{replaced}$1'
)/>
Which hopefully demonstrates why using a tool such as jsoup is definitely the way to go...
(btw, the above is using the Java regex engine (via string.replaceAll), so it can use the lookbehind functionality, which doesn't exist in CF's built-in regex (rereplace/rematch/etc))
Update, based on the new code sample you've provided...
Here is an example of how to use jsoup for what you're doing - it might still need some updates (depending on what {clickurl} is eventually going to be doing), but it currently functions the same as your sample code is attempting:
<cfset jsoup = createObject('java','org.jsoup.Jsoup') />
<cfset links = jsoup.parse( Arguments.HtmlCode )
<!--- select all links beginning http and change their href --->
.select('a[href^=http]').attr('href',' {clickurl}')
<!--- get HTML for all links, then split into array. --->
.outerHtml().split('(?<=</a>)(?!$)')
/>
<cfdump var=#links# />
That middle bit is all a single cfset, but I split it up and added comments for clarity. (You could of course do this with multiple variables and 3+ cfsets if you preferred that.)
Again, it's not a regex, because what you're doing involves parsing HTML, and regex is not designed for parsing tag-based syntax, so isn't very good at it - there are too many quirks and variations with HTML and describing them in a single regex gets very complicated very quickly.
New to coldfusion, new to regex...
I have a directory of files, named with "some" followed by a 13digit number, followed by underscore, ID and file ending like so:
some0000000000000_ID.jpg
ID can be any string.
How would I get the ID using regex? I guess I'd be looking for something like this, which captures everything between the underscore and file ending dot:
_\A[A-Z]*[a-z]*[0-9]*$
but I'm really not getting anywhere. Can someone point me in the right direction?
Thanks!
EDIT:
I ended up doing it like this, which is hack-ish but works nicely:
<cfset cropFront = #ListRest(ReReplaceNoCase(name, ".png|.jpg", ""), "_")#>
<cfset cropFull = #ListFirst(ReReplaceNoCase( cropFront, "xxxxx", ""), "." )#>
Maybe useful for someone else, too!
<cfdirectory name="images" directory="#path#" filter="some?????????????_ID.jpg">
The filter is not a regex pattern. It only knows the ? and * wildcard characters.
Can't test at the moment but this is the idea...
<cfdirectory name="files" directory="path" action="list" />
<cfloop query="files">
<cfset findinfo = refind("^some(\d{13})_", files.name, 0, true) />
<cfif arraylen(findinfo.pos) eq 2>
<cfset fileid = mid(files.name, findinfo.pos[2], findinfo.len[2]) />
<!--- do something --->
</cfif>
</cfloop>
Can anyone help with a function that will parse all urls into valid html links in a text string?
For example:
"Welcome to www.nerds4life.com. View our articles at nerds4life.com or at http://nerds4life.com or also http://www.nerds4life.com"
would become:
"Welcome to www.nerds4life.com. View our articles at nerds4life.com or at http://nerds4life.com or also http://www.nerds4life.com"
What would be the best way to approach this. Regex (and if so, how?) or loop through each word in the text (would think that's less efficient)
Thanks
Again... there may be a more elegant regex...
Certainly feel free to google for "good" regex's for finding URLs if this one falls short.
<cfset myText = "Welcome to www.nerds4life.com. View our articles at nerds4life.com or at http://nerds4life.com or also http://www.nerds4life.com or at https://foo.com or http://123.com" />
<cfset myNewText = rereplaceNoCase( myText, '((http(s)?://)?((www\.)?\w+\.\w{2,6}))', '\1', 'all' ) />
This will parse URL in string that starts with http or www and terminated by a space
<cfset myString = "Welcome to www.nerds4life.com. View our articles at nerds4life.com or at http://nerds4life.com or also http://www.nerds4life.com">
<cfset URLinString = rereplaceNoCase(myString, '(((http(s)?://)|(www))\.?(\S+))', '\1', 'all')>