I am using vbscript regex to find self-defined tags within a file.
"\[\$[\s,\S]*\$\]"
Unfortunately, I am doing something wrong, so it will grab all of the text between two different tags. I know this is caused by not excluding "$]" between the pre and post tag, but I can't seem to find the right way to fix this. For example:
[$String1$]
useless text
[$String2$]
returns
[$String1$]
useless text
[$String2$]
as one match.
I want to get
[$String1$]
[$String2$]
as two different matches.
Any help is appreciated.
Wade
The RegEx is greedy and will try to match as much as it can in one go.
For this kind of matching where you have a specific format, instead of matching everything until the closing tag, try matching NOT CLOSING TAG until closing tag. This will prevent the match from jumping to the end.
"\[\$[^\$]*\$\]"
Make the * quantifier lazy by adding a ?:
"\[\$[\s\S]*?\$\]"
should work.
Or restrict what you allow to be matches between your delimiters:
"\[\$.*\$\]"
will work as long as there is only one [$String$] section per line, and sections never span multiple lines;
"\[\$(?:(?!\$\])[\s\S])*\$\]"
checks before matching each character after a [$ that no $] follows there.
No need to use regex. try this. If your tags are always defined by [$...$]
Set objFS = CreateObject( "Scripting.FileSystemObject" )
strFile=WScript.Arguments(0)
Set objFile = objFS.OpenTextFile(strFile)
strContent = objFile.ReadAll
strContent = Split(strContent,"$]")
For i=LBound(strContent) To UBound(strContent)
m = InStr( strContent(i) , "[$" )
If m > 0 Then
WScript.Echo Mid(strContent(i),m) & "$]"
End If
Next
Related
I am trying to write a reg expression to find match of strings / code in a database.
here is some of the sample code / string which i need to remove using the regular expression.
[b:1wkvatkt]
[/b:1wkvatkt]
[b:3qo0q63v]
[/b:3qo0q63v]
[b:2r2hso9d]
[/b:2r2hso9d]
Anything that match [b:********] and [/b:********]
Anybody please help me out. Thanks in advance.
You can use the following pattern (as stated by LukStorms in the comments):
\[\/?b:[a-z0-9]+\]
If you want to replace [b:********] with <b> (and also the closing one), you can use the following snippet (here in JavaScript, other languages are similar):
var regex = /\[(\/)?b:[a-z0-9]+\]/g;
var testText = "There was once a guy called [b:12a345]Peter[/b:12a345]. He was very old.";
var result = testText.replace(regex, "<$1b>");
console.log(result);
It matches an optional / and puts it into the first group ($1). This group can then be used in the replacement string. If the slash is not found, it won't be added, but if it is found, it will be added to <b>.
I'm currently building a simple search page in Node JS Express and Oracle.
I'd like to show the user a snippet of the matching text (first instance would do) to add a bit context of what the SQL found.
Example:
Search term: 'fish'
Results: Henry really likes going fishing, and once he caug ...
I'm not sure the best way to approach this - I could retrieve the whole block of text and do it in Node JS, but I don't really like the idea of dragging the whole text across to the app, just to get a snippet.
I've been thinking that REGEXP_SUBSTR could be way to do it... But I'm not sure whether I could use a regular expression to retrieve x amount of characters before and after the matching word.
Have I got the right idea or am I going about it in the wrong way?
Thanks
SELECT text
, REGEXP_SUBSTR(LOWER(text), LOWER('fish')) AS potential_snippet
FROM table
WHERE LOWER(text) LIKE LOWER('%fish%');
Try this:
select text
, SUBSTR( TEXT, INSTR(LOWER(TEXT),'fish', 1)-50,100 )
FROM test
WHERE INSTR(LOWER(text),'fish', 1)<>0;
Play with the position and length numbers(50 and 100 in my example) to limit the length of the string.
If you need to extract some context with the help of JavaScript, you can use limiting quantifiers in a regex:
/\b.{0,15}fish.{0,15}\b/i
See demo
Here,
\b - matches at the word boundary (so that the context contains only whole words)
.{0,15} - any characters other than a newline (replace with [\s\S] or [^] if you need to include newlines)
fish - the keyword
The /i modifier enables case-insensitive search.
If you need a dynamic regex creation, use a constructor notation:
RegExp("\\b.{0,15}" + keyword + ".{0,15}\\b", "i");
Also, if you need to find multiple matches, use g modifier alongside the i.
I have to split URIs on the second portion:
/directory/this-part/blah
The issue I'm facing is that I have 2 URIs which logically need to be one
/directory/house-&-home/blah
/directory/house-%26-home/blah
This comes back as:
house-&-home and house-%26-home
So logically I need a regex to retrieve the second portion but also remove everything between the hyphens.
I have this, so far:
/[^(/;\?)]*/([^(/;\?)]*).*
(?<=directory\/)(.+?)(?=\/)
Does this solve your issue? This returns:
house-&-home and house-%26-home
Here is a demo
If you want to get the result:
house--home
then you should use a replace method. Because I am not sure what language you are using, I will give my example in java:
String regex = (?<=directory\/)(.+?)(?=\/);
String str = "/directory/house-&-home/blah"
Pattern.compile(regex).matcher(str).replaceAll("\&", "");
This replace method allows you to replace a certain pattern ( The & symbol ) with nothing ""
I am trying to filter out spam before being posted using a few routines and external services (akismet) but they all seem to fail when pushing in a comma delimited word or a word formed with empty tags. Eg
b[u][/u]u[u][/u]y[i][/i]m[b][/b] e <-> buyme
b,u,y,m,e <-> buyme
Does anyone know of a good ColdFusion regex to strip out this sort of behavior before I can post it to aksimet for processing?
Firstly: Have you checked whether is Akismet not already doing this?
I would very much suspect it already does all this processing (and more), so you don't actually need to.
Anyway, assuming this is bbcode, and thus the relevant tags will be for bold/italic/underline, you can replace them with:
TextForAkismet = rereplace( TextForAkismet , '\[([biu])\]\[/\1\]' , '' , 'all' )
If there are other empty tags you want to remove, simply update the captured group (the bit in parentheses) as appropriate. To also cater for potentially attributes (but still an empty tag), a quick and dirty way is to use [^\]]* after the tag name (outside the captured group).
'\[([biu]|img|url)[^\]]*\]\[/\1\]'
Depending on the dialect of bbcode you're working with, you may need to handle quoted brackets which would need a more complex expression.
To remove commas that appear between letters, use:
TextForAkismet = rereplace( TextForAkismet , '\b,\b' , '' , 'all' )
(Where \b matches any position between alphanumeric and non-alphanumeric.)
I am trying to get the content between Start and End tag for the below mentioned strings
Products
Services & Solution
Regex used:
<([a-z0-9]+)([^<]+)\*(?:>(.\*?)</\\2>|\\D+/>)
It is working fine for the first string but not with the later once
Why so complex? Won't simple />([^<]+)</ capture the content of an element?
Depending on the flavour of regex - use lookahead and lookbehind methods to get just the match between > and < i.e.
(?<=>)[^>]*(?=<)
(?<=>) - looks ahead for a >
(?=<) - looks behind for a <
[^>]* - matches the text in the link itself
lookahead and lookbehind are zero width matches so will will just get what you need
Usually you don't want to parse HTML your self with regex, parser are better at that.
Assuming you are using PCRE here's a random guess at the expression you are looking for:
(?is)<([a-z]+)\b[^<>]*(?:>(.*?)</\1>|/>)
Note that this will not work with nested tags.
Just get rid of the tags.
var str = 'Products '
var str2 = 'Services & Solution '
var RE_findOpenAndCloseTag = /^<[^>]+>|<\/[^>]>$/g;
str.replace( RE_findOpenAndCloseTag, '' ) == "Products ";
str2.replace( RE_findOpenAndCloseTag, '' ) == "Services & Solution ";
Note that RE_findOpenAndCloseTag assumes that tags will always start with a < and not contain an > unless it's closing the tag.
Thus this will fail.
'>">This will fail
But an easier way would be to convert the tags into a node, then get the innerHTML.
Try this it will resolve your issue (Just Add |</\1>)
<([a-z0-9]+)([^<]+)*(?:>(.*?)|\D+/>|</\1>)
For more detail please refer