regex to separate HTML GET parameters - regex

How can I use a regular expression to separate GET parameters in a URI and extract a certain one? Specifically, I'm trying to get just the v= part of a YouTube watch URI. I've come up with youtube.com\/watch\?(\w+=[\w-]+&?)*(v=[\w-]+)&?*(\w+=[\w-]+&?)*, but that looks awfully repetitive. Is there a better (shorter?) way to do this?

A simplified regex :
^(?:http://www.)?youtube.[^/]+?/watch?(.?)(v=([^&]+))(.)$

I know there are a lot of similar questions out there, but none has quite what I wanted. I'm looking for something capable of pulling out just the video ID—regardless of whether it's first in the parameter list, last, or buried in between others. Nothing I've seen has worked quite like that yet.
For reference, I'm using this web app for testing, and this set of test URIs:
http://www.youtube.com/watch?v=XXXXXXXXXXX
http://www.youtube.com/watch?v=XXXXXXXXXXX&feature=results_video&playnext=1&list=XXXXXXXXXXXXXXXXXX
http://www.youtube.com/watch?feature=player_embedded&v=XXXXXXXXXXX#!
http://www.youtube.com/watch?annotation_id=annotation_xxxxxx&feature=iv&src_vid=XXXXXXXXXXX&v=XXXXXXXXXX
Fellow Stack Exchangers, I propose the following regular expression to solve this:youtube.com\/watch\?(\S*)v=([\w-]+)

Related

folding sections denoted by headers in rmarkdown

I have been unsuccessful at getting Rmarkdown to fold sections denoted by a #section_header. I know code folding is great, but my goal is to be able to compare results across a number of different analytic pipelines without having to scroll all over the place to find what I am looking for. If I could fold each iteration of my analyses neatly into a foldable section, then I would be able to easily compare any two pipelines, while ignoring the rest. I've seen a bit posted about how to implement this in a pdf output but does anyone know how to implement something like this in an html output?
Thanks in advance.
-N
You want to slide whole sections, not just code -- otherwise an HTML notebook would probably do it.
So take a look at this Gist, where I cobbled something together using a little bit of jQuery, probably from this answer here:
<script src="https://gist.github.com/flynn-d/b756e512f5be7f553aad007f0ac37220.js"></script>
Link to Gist

Regex help for somebody that realy need help - Visual basic

I'm actually starting creating a small language (in vb net, yes I know, maybe not a good idea).
I already started learning tutorials about regex, but apparently this function is saying me to get out).
I want to add some kind of commands, such as a command that allow you to arg. a /print command, something like:
/PRINT["Hello world";"blue";propety:{bold;italic}]
So, for me, the regex is :
"{{^\^{\|^#\^~\{}~\^]|\~^[}^\}^#~\[}~^\}^##{\~{^}^#\#~#}\^#}^]|\|}]#\|{"
So you understand that's not something I like writing.
Would you show me how to construct regex code for the first command I let?
Regex alone isn't the best way to create a language that, well, actually works.
Read this article for more info. I'm sure you can find better way to write a language if you really need to write it. In vb.net...
Anyway, if you insist on writing it in vb, I found a video that will help you with it.

How to retrieve data from the web with marmalade

I am trying to make a basic app using Marmalade and am currently looking into using regular expressions to retrieve a very specific piece of information from within a web page. Is there an easier way to do this? I literally want the contents of a single p tag, which annoyingly doesn't have a unique id hence my thought to use regex.
I am completely new to c++ and Marmalade, hence my asking.
Sorry if this is a stupid question but I can find anything helpful on the internet.
Don't know about regex, but if your webservice supports GET/POST, that's what you need -
CIwHttp* m_Http = new CIwHTTP;
m_Http->SetFormData("user", user);
m_Http->SetFormData("pass", password);
//Gotheader is the callback function which will get called when header information is received from webservice
m_Http->Post(LOGIN_PATH, NULL, 0, GotHeaders, NULL);
More info in this IWHTTP example.
There are C++ functions for finding things inside strings, so it might be better to write C++ logic to find it, unless you are better at regexes. It sounds like you are not in control of the target page's content, though, so it will only work as long as the page's format remains something where your code can find the text you want.

solr PatternReplaceCharFilterFactory working unexpectedly

I am relatively new to Solr so please forgive me if I'm missing something obvious. I have an application that allows users to search for musical artists. The indexing comes from a read-only database with correct spellings so on the index side I have it figured out.
On the query side however I need to anticipate various spelling errors/differences and want to help solr find those instances. From our old home-grown search solution, I have a list of regex's and the artists they apply to. When I was trying to translate those to solr using the PatternReplaceCharFilterFactory, I noticed that some worked perfectly, while others didn't work at all ... with seeming no rhyme nor reason between them.
For example:
<charFilter class="solr.PatternReplaceCharFilterFactory" pattern="em[ei]n[ei]m" replacement="Eminem"/>
accurately captures the common misspellings of Eminem. But for the band 311:
<charFilter class="solr.PatternReplaceCharFilterFactory" pattern="[Tt]hree [Ee]leven" replacement="311"/>
Does not work. Another example is Nine Inch Nails:
<charFilter class="solr.PatternReplaceCharFilterFactory" pattern="((nine|9).*inch.*nails\b)|(n\.? ?i\.? ?n\.?\b)" replacement="Nine Inch Nails"/>
works perfectly for finding the most common patterns for the band's name. But for Eve 6:
<charFilter class="solr.PatternReplaceCharFilterFactory" pattern="[Ee]ve.{0,4}([Ss]ix|6)" replacement="Eve 6"/>
Is there something fundamental I'm missing on the usage of this filter? I've tried a number of variations on the regex's I've mentioned above (even going so far as using literals like 'three eleven'), but still with no success. I've tried making the filter in question the only PatternReplaceCharFilterFactory in the analyzer. I also know for sure that these items are in the index correctly because when I search for the correct spelling it returns the proper results.
Any suggestions?
Snowdall
I suspect the problem is not with your Char Factory, but with what comes after all, specifically the tokenizer. If you use standard tokenizer, it will get rid of the numbers you have just put into your stream. If you don't need the text to be split into tokens, you could look at KeywordTokenizerFactory instead.
In general, the best way to troubleshoot this in Solr 4+ is the Analysis screen in the Admin WebUI. It allows you to enter your text against particular field type and see what happens with it after each component in the analysis chain.
I would recommend using the SynonymFilter for the kind of application you describe. It allows you to provide an external file where you list words and their synonyms, like:
eminem <=> emenem
nine <=> 9
If you precede this with a LowerCaseFilter, you won't have to fuss about case normalization in your synonyms. You should be able to handle the 311 case too as long as you don't tokenize (ie use a KeywordTokenizer as Alexander Rafalovitch suggested).

How can I highlight different types of file in dired mode in Emacs?

In a nutshell, I want to have different faces for some types of file in dired mode. I don't think it matters, but I am using Aquamacs.
The example I will use here is .tex files. If I can do it for .tex, then I can just apply the same structure to do create other faces for other types of files.
From what I understand, I have to create a variable, write a regular expression, then apply a hook. I read a bit about regex and so far I have
^(.+)\.tex$
I think my structure and regular expression are not really correct. I am not a programmer (though I have an interest on it), I have only been using Emacs for 2 weeks or so, so any help would be greatly appreciated.
What I need is at least the basic structure of what I have to do. I understand there may be modes already created that do something similar (such as maybe Wdired and Dired-X), and I would not complain if someone told me about them, but what I really want is to have an elisp code (either already written or that I can work on), as I plan on learning a bit of elisp to be able to write my own customisations and this would be a way to learn.
Thank you!
Since you want to learn how to do it, try checking out the extension dired+.el. This mode does a lot more than what you want, but it does add new faces. Specifically, look for the variable diredp-font-lock-keywords-1 and how it is used. That should get you going.
Other SO questions that seem relevant are:
Match regular expression as keyword in define-generic-mode
Highlighting correctly in an emacs major mode
A hello world example for a major mode in emacs?