Regular expression: match youtube links, but not youtube embed code - regex

Could you help me, please. I need regular expression that match string like this:
http://www.youtube.com/watch?v=eE4qPqMYsp8
but not this:
<object width="500" height="700"><param name="movie" value="http://www.youtube.com/v/eE4qPqMYsp8&hl=ru&fs=1&rel=0" /><param name="wmode" value="transparent" /><param name="allowFullScreen" value="true" /><param name="allowscriptaccess" value="always" /><embed src="http://www.youtube.com/v/eE4qPqMYsp8&hl=ru&fs=1&rel=0" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" wmode="transparent" width="500" height="700">
I have this code:
%(?:(http://){0,1}(www.){0,1}youtube(?:-nocookie)?\.com/(?:[^/]+/.+/|(?:v|e(?:mbed)?)/|.*[?&]v=)|(http://){0,1}(www.){0,1}youtu\.be/)([^"&?/ ]{11})%
I don't know how to exclude some parameters.

How about an expression like this:
(?:https?://)?(?:www\.)?youtube\.com/watch.+?\bv=[a-zA-Z0-9]+
You can certainly add in more options (e.g. (?:-nocookie)), but it might be specific enough like this already.

Related

Ant task replace escaped URI param

I'm trying to use Ant to remove a url parameter in an xml file.
The line in the xml is similar to below.
<from uri="http://www.google.com?q=test&somethingElse=something" />
I want to remove the "&somethingElse=something". "something" could be different values so it must be generic.
I've tried
<replaceregexp file="somefile.xml" match="&somethingElse(.*)" replace='" />' flags="gs" byline="true" />
<replaceregexp file="somefile.xml" match="\&somethingElse(.*)" replace='" />' byline="true" />
<replaceregexp file="somefile.xml" match="(&)somethingElse(.*)" replace='" />' flags="gs" byline="true" />
but those don't seem to work.
$(ant.regexp.regexpimpl) is not set so the default engine is being used.
In order to get & you need to write & in the Ant build file because it is XML. To match &somethingElse in the input of the replaceregexp Ant task you might therefore need to specify &amp;somethingElse in the Ant build file.

Regular expression in java to extract URl from HTML

I am new to regexes. I need help.
My HTML source is
<img src ="planets.gif" width="145" height="126" alt="Planets" usemap ="#planetmap">
<map name="planetmap">
<area shape="rect" coords="0,0,82,126" href="http://www.sun.htm" alt="Sun">
<area shape="circle" coords="90,58,3" href="http://www.mercur.htm" alt="Mercury">
<area shape="circle" coords="124,58,8" href="http://www.www.venus.htm" alt="Venus">
</map>
I’m trying to extract all href links out like http://www.google.com.
kindly help.
My Regex is
"href=[\\\"\\'](http:\\/\\/|\\.\\/|\\/)?\\w+(\\.\\w+)*(\\/\\w+(\\.\\w+)?)*(\\/|\\?\\w*=\\w*(&\\w*=\\w*)*)?[\\\"\\']"
it wil extract like href="http://www.google.com"
But I need only link http://www.google.com without href=
Please use a XML-parser for this kind of stuff.

regex find word in string, replace word in new string (using Notepad++)

I posted a simplified version of this question before, but I think I might have simplified it too much, so here is the actual problem.
I want to use regex (in Notepad++ or similar) to find "a_dog" in the following (sorry about the wall):
<object classid="clsid:D27CDB6E-AE6D-11cf" id="FlashID">
<param name="movie" value="../flash/words/a_dog.swf">
<param name="quality" value="high">
<param name="wmode" value="opaque">
<param name="swfversion" value="6.0.65.0">
<!--[if !IE]>-->
<object data="../flash/words/a_dog.swf" type="application/x-shockwave-flash">
<!--<![endif]-->
<param name="quality" value="high">
<param name="wmode" value="opaque">
<param name="swfversion" value="6.0.65.0">
<!--[if !IE]>-->
</object>
<!--<![endif]-->
</object>
Then I want to use a back-reference to replace all instances of øø with a_dog in the following:
<input type="button" class="ButtonNormal" onClick="audio_func_øø()">
<script>
function audio_func_øø() {
var playAudio = document.getElementById("element_øø");
playAudio.play();
}
</script>
<audio id="element_øø">
<source src="../audio/words/øø.mp3" type='audio/mpeg'>
<source src="../audio/words/øø.wav" type='audio/wav'>
</audio>
So that only the second code is left (with a_dog instead of øø), and no trace of the first code remains.
I don't know how to do this in Notepad++, but you can do this in SublimeText using regex, snippets, and multiple selection:
First make a new snippet (guide) with the following in it:
<snippet>
<content><![CDATA[
<input type="button" class="ButtonNormal" onClick="audio_func_$1()">
<script>
function audio_func_$2() {
var playAudio = document.getElementById("element_$3");
playAudio.play();
}
</script>
<audio id="element_$4">
<source src="../audio/words/$5.mp3" type='audio/mpeg'>
<source src="../audio/words/$6.wav" type='audio/wav'>
</audio>
]]></content>
<!-- Optional: Set a tabTrigger to define how to trigger the snippet -->
<tabTrigger>audioSnippet</tabTrigger>
</snippet>
Save it as whatever you like in your User package. Follow the linked article if you have any questions on how/where to save it to get it working. I will discuss how this works later on.
Next use the following regex in Sublime Text by searching (with regex enabled) using the following pattern:
(?<=value="../flash/words/).+(?=\.swf)
And hit "Find All" - this will select all the names (e.g. 'a_dog', 'a_cat', 'a_plane') using multiple selection.
Copy the selected words (Ctrl+C or equivalent on your system)
In the menu, Selection->Expand to Paragraph (This will select where the <object> begins, to where </object> ends)
Hit Delete/Backspace to remove the <object>'s
Type in your snippet shortcut (above I've defined it to be "audioSnippet") and hit Tab
Paste in your copied text (Ctrl+V or equivalent on your system)
You will notice that you have only replaced the text in the snippet where the $1 appears. you will need to hit Tab to jump to $2, paste the text again (Ctrl+V), and repeat until you get to tab stop $6.
I've made a screen capture that you can look at here: http://youtu.be/oo2MQV3X244 (unlisted video on YouTube)

Set Ant property based on a regular expression in a file

I have the following in a file
version: [0,1,0]
and I would like to set an Ant property to the string value 0.1.0.
The regular expression is
version:[[:space:]]\[([[:digit:]]),([[:digit:]]),([[:digit:]])\]
and I need to then set the property to
\1.\2.\3
to get
0.1.0
I can't workout how to use the Ant tasks together to do this.
I have Ant-contrib so can use those tasks.
Based on matt's second solution, this worked for me for any (text) file, one line or not. It has no apache-contrib dependencies.
<loadfile property="version" srcfile="version.txt">
<filterchain>
<linecontainsregexp>
<regexp pattern="version:[ \t]\[([0-9]),([0-9]),([0-9])\]"/>
</linecontainsregexp>
<replaceregex pattern="version:[ \t]\[([0-9]),([0-9]),([0-9])\]" replace="\1.\2.\3" />
</filterchain>
</loadfile>
Solved it with this:
<loadfile property="burning-boots-js-lib-build.lib-version" srcfile="burning-boots.js"/>
<propertyregex property="burning-boots-js-lib-build.lib-version"
override="true"
input="${burning-boots-js-lib-build.lib-version}"
regexp="version:[ \t]\[([0-9]),([0-9]),([0-9])\]"
select="\1.\2.\3" />
But it seems a little wasteful - it loads the whole file into a property!
If anyone has any better suggestions please post :)
Here's a way that doesn't use ant-contrib, using loadproperties and a filterchain (note that replaceregex is a "string filter" - see the tokenfilter docs - and not the replaceregexp task):
<loadproperties srcFile="version.txt">
<filterchain>
<replaceregex pattern="\[([0-9]),([0-9]),([0-9])\]" replace="\1.\2.\3" />
</filterchain>
</loadproperties>
Note the regex is a bit different, we're treating the file as a property file.
Alternatively you could use loadfile with a filterchain, for instance if the file you wanted to load from wasn't in properties format.
For example, if the file contents were just [0,1,0] and you wanted to set the version property to 0.1.0, you could do something like:
<loadfile srcFile="version.txt" property="version">
<filterchain>
<replaceregex pattern="\s+\[([0-9]),([0-9]),([0-9])\]" replace="\1.\2.\3" />
</filterchain>
</loadfile>

Get a block of text in a list of blocks using Regular Expressions

Edit2: only regex match solutions, please. thank you!
Edit: I'm looking for regex solution, if it's exist. I have other blocks with the same data that are not XML, and I can't use Perl, I added Perl tag as I'm more familiar with regexes in Perl. Thanks in advance!
I Have list like this:
<Param name="Application #" value="1">
<Param name="app_id" value="32767" />
<Param name="app_name" value="App01" />
<Param name="app_version" value="1.0.0" />
<Param name="app_priority" value="1" />
</Param>
<Param name="Application #" value="2">
<Param name="app_id" value="3221" />
<Param name="app_name" value="App02" />
<Param name="app_version" value="1.0.0" />
<Param name="app_priority" value="5" />
</Param>
<Param name="Application #" value="3">
<Param name="app_id" value="32" />
<Param name="app_name" value="App03" />
<Param name="app_version" value="1.0.0" />
<Param name="app_priority" value="2" />
</Param>
How can I get a block for one app if I only know, say, a value of app_name. For example for App02 I want to get
<Param name="Application #" value="2">
<Param name="app_id" value="3221" />
<Param name="app_name" value="App02" />
<Param name="app_version" value="1.0.0" />
<Param name="app_priority" value="5" />
</Param>
Is it possible to get it, if other "name=" lines are not known (but there's always name="app_name" and Param name="Application #")?
Can it be done in a single regex match? (doesn't have to be, but feels like there's probably a way).
since your content seems to be some XML why don't use a real parser to do the task ?
use XML::XPath;
use XML::XPath::XMLParser;
my $xp = XML::XPath->new(filename => 'test.xhtml');
my $nodeset = $xp->find('/Param[#name=\'Application #\']'); # find all applications
foreach my $node ($nodeset->get_nodelist) {
print "FOUND\n\n",
XML::XPath::XMLParser::as_string($node),
"\n\n";
}
you can read a bit more about XPath here and have full reference at the w3c.
I advise you not to use reg exp to do that task because it's going to be complicate and not maintenable.
note: also possible to use the DOM API just depend the one you like the most.
This seems to be a sad case of bogus XML. A misguided attempt to create enterprisey software at best. The developers could have used a sane configuration file format such as:
[App03]
app_id = 32767
app_version = 1.0.0
...
but they decided to drive everyone insane with meaningless BSXML.
I would say, if this file is less than 10 MB in size, just go ahead and use XML::Simple. If the file indeed consists of nothing but repeated blocks of exactly what you posted, you can use the following solution:
#!/usr/bin/perl
use strict; use warnings;
my %apps;
{
local $/ = "</Param>\n";
while ( my $block = <DATA> ) {
last unless $block =~ /\S/;
my %appinfo = ($block =~ /name="([^"]+?)"\s+value="([^"]+?)"/g);
$apps{ $appinfo{app_name} } = \%appinfo;
}
}
use Data::Dumper;
print Dumper $apps{App03};
Edit: If you cannot use Perl and you won't tell us what you can use, there is not much I can do but point out that
/name="([^"]+?)"\s+value="([^"]+?)"/g
will give you all name-value pairs.
I would prefer a parser solution, too. If you absolutely have to use a regex and understand all the disadvantages of this approach, then the following regex should work:
<Param name="Application #"[^>]*>\s+<Param[^>]*>\s+<Param name="app_name" value="App02" />\s+(?:<Param[^>]*>\s+){2}</Param>
This relies heavily on the structure present in your example. A re-ordering of tags, introduction of additional tags or (shudder) nesting of tags will break the regex.
Seems like it would be more appropriate to use an XML reader library, but I don't know Perl enough to suggest one.
Perl's XML DOM Parser may be appropriate here.
I would suggest using one of XML parsers, but if you cannot do so, then the following quick and dirty code should do:
my ($rez) = $data =~/\<Param\s+name\s*=\s*"Application\s#"\s+value\s*=\s*"2"\>((?:.|\n)*?)^\<\/Param\>/m;
print $rez;
(assuming $data contains your xml as a single string, possibly multiline )