I wondering how should be the regex string for the string containig '#'
e.g.
abc#def#ghj#ijk
I wanna get
#def
#ghj
#ijk
I tried #[\S]+ but it selects the whole #def#ghj#ijk Any ideas ?
Edit
The code below selects only #Me instead of #MessageBox. Why ?
var m = new RegExp('#[^\s#]+').exec('http://localhost/Lorem/10#MessageBox');
if (m != null) {
var s = '';
for (i = 0; i < m.length; i++) {
s = s + m[i] + "\n";
}
}
Edit 2
the double backslash solved that problem. '#[^\\s#]+'
Try #[^\s#]+ to match # followed by a sequence of one or mor characters which are neither # nor whitespace.
Match all characters that are not #:
#[^#]+
Related
I want paragraphs to be up to 3 sentences only.
For that, my strategy is to loop on all paragraphs and find the 3rd sentence ending (see note). And then, to add a "\r" char after it.
This is the code I have:
for (var i = 1; i < paragraphs.length; i++) {
...
sentEnds = paragraphs[i].getText().match(/[a-zA-Z0-9_\u0590-\u05fe][.?!](\s|$)|[.?!][.?!](\s|$)/g);
//this array is used to count sentences in Hebrew/English/digits that end with 1 or more of either ".","?" or "!"
...
if ((sentEnds != null) && (sentEnds.length > 3)) {
lineBreakAnchor = paragraphs[i].getText().match(/.{10}[.?!](\s)/g);
paragraphs[i].replaceText(lineBreakAnchor[2],lineBreakAnchor[2] + "\r");
}
}
This works fine for round 1. But if I run the code again- the text after the inserted "\r" char is not recognized as a new paragraph. Hence, more "\r" (new lines) will be inserted each time the script is running.
How can I make the script "understand" that "\r" means new, separate paragraph?
OR
Is there another character/approach that will do the trick?
Thank you.
Note: I use the last 10 characters of the sentence assuming the match will be unique enough to make only 1 replacement.
Without modifying your own regex expression you can achieve this.
Try this approach to split the paragraphs:
Grab the whole content of the document and create an array of sentences.
Insert paragraphs with up to 3 sentences after original paragraphs.
Remove original paragraphs from hell.
function sentenceMe() {
var doc = DocumentApp.getActiveDocument();
var paragraphs = doc.getBody().getParagraphs();
var sentences = [];
// Split paragraphs into sentences
for (var i = 0; i < paragraphs.length; i++) {
var parText = paragraphs[i].getText();
//Count sentences in Hebrew/English/digits that end with 1 or more of either ".","?" or "!"
var sentEnds = parText.match(/[a-zA-Z0-9_\u0590-\u05fe][.?!](\s|$)|[.?!][.?!](\s|$)/g);
if (sentEnds){
for (var j=0; j< sentEnds.length; j++){
var initIdx = 0;
var sentence = parText.substring(initIdx,parText.indexOf(sentEnds[j])+3);
var parInitIdx = initIdx;
initIdx = parText.indexOf(sentEnds[j])+3;
parText = parText.substring(initIdx - parInitIdx);
sentences.push(sentence);
}
}
// console.log(sentences);
}
inThrees(doc, paragraphs, sentences)
}
function inThrees(doc, paragraphs, sentences) {
// define offset
var offset = paragraphs.length;
// Create paragraphs with up to 3 sentences
var k=0;
do {
var parText = sentences.splice(0,3).join(' ');
doc.getBody().insertParagraph(k + offset , parText.concat('\n'));
k++
}
while (sentences.length > 0)
// Remove paragraphs from hell
for (var i = 0; i < offset; i++){
doc.getBody().removeChild(paragraphs[i]);
}
}
In case you are wondering about the custom menu, here is it:
function onOpen() {
var ui = DocumentApp.getUi();
ui.createMenu('Custom Menu')
.addItem("3's the magic number", 'sentenceMe')
.addToUi();
}
References:
DocumentApp.Body.insertParagraph
Actually the detection of sentences is not an easy task.
A sentence does not always end with a dot, a question mark or an exclamation mark. If the sentence ends with a quote then punctuation rules in some countries force you to put the end of the sentence mark inside the quote:
John asked: "Who's there?"
Not every dot means an end of a sentence, usually the dot after an uppercase letter does not end the sentence, because it occurs after an initial. The sentence does not end after J. here:
The latest Star Wars movie has been directed by J.J. Abrams.
However, sometimes the sentence does end after a capital letter followed by a dot:
This project has been sponsored by NASA.
And abbreviations can make it very hard:
For more information check the article in Phys. Rev. Letters 66, 2697, 2013.
Having in mind these difficulties let's still try to get some expression which will work in "usual" cases.
Make a global match and substitution. Match
((?:[^.?!]+[.?!] +){3})
and substitute it with
\1\r
Demo
This looks for 3 sentences (a sentence is a sequence of not-dot, not-?, not-! characters followed by a dot, a ? or a ! and some spaces) and puts a \r after them.
UPDATED 2020-03-04
Try this:
var regex = new RegExp('((?:[a-zA-Z0-9_\\u0590-\\u05fe\\s]+[.?!]+\\s+){3})', 'gi');
for (var i = 1; i < paragraphs.length; i++) {
paragraphs[i].replaceText(regex, '$1\\r');
}
I have string as shown below. In dart trim() its removes the whitespace end of the string. My question is: How to replace spaces middle of string in Dart?
Example-1:
- Original: String _myText = "Netflix.com. Amsterdam";
- Expected Text: "Netflix.com. Amsterdam"
Example-2:
- Original: String _myText = "The dog has a long tail. ";
- Expected Text: "The dog has a long tail."
Using RegExp like
String result = _myText.replaceAll(RegExp(' +'), ' ');
In my case I had tabs, spaces and carriage returns mixed in (i thought it was just spaces to start)
You can use:
String result = _myText.replaceAll(RegExp('\\s+'), ' ');
If you want to replace all extra whitespace with just a single space.
To replace white space with single space we can iterate through the string and add the characters into new string variable by checking whitespace condition as below code.
import 'package:flutter/foundation.dart';
import 'package:flutter/material.dart';
void main() {
String str = "Dart remove empty space ";
String stringAfterRemovingWhiteSpace = '';
for (int i = 0; i < str.length; i++) {
if (!str[i].contains(' ')) {
stringAfterRemovingWhiteSpace = stringAfterRemovingWhiteSpace + "" + str[i];
}
}
print(stringAfterRemovingWhiteSpace);
}
Originally published at https://kodeazy.com/flutter-remove-whitespace-string/
I need to extract everything between a # and a space or any punctation character ( .,;:_-)
This is what I have so far.
var str = "#hello. foo bar"
var filter:RegExp = /#(.*?)(!.,?;:_-)/g;
var matches:Object = filter.exec(str);
if(matches != null){
trace("Found: "+matches[1])
} else {
trace("nothing found")
}
It only works if the word is #hello! - I guess this part (!.,?;:_-) is wrong?
Regex101
This example will be taking advantage of the [^] character group, this will match any character that is not in the group. This allows you to simply say any not-in-list character.
#([^[\., -\/#!$%\^&\*;:{}=\-_`~()]+)
Debuggex Demo
jsFiddle
var str = "#hello. foo bar";
var filter = RegExp = /#([^[\., -\/#!$%\^&\*;:{}=\-_`~()]+)/g;
var matches = Object = filter.exec(str);
if (matches !== null) {
console.log("Found: " + matches[1]);
} else {
console.log("nothing found");
}
#(.*?)[!.,?;:_-]
What you need is character class [].
[!.,?;:_-] match a single character present in the list below
`!.,?;:_- a single character in the list !.,?;:_- literally`
What is the best way to produce a highlighted string found within another string?
I want to ignore all character that are not alphanumeric but retain them in the final output.
So for example a search for 'PC3000' in the following 3 strings would give the following results:
ZxPc 3000L = Zx<font color='red'>Pc 3000</font>L
ZXP-C300-0Y = ZX<font color='red'>P-C300-0</font>Y
Pc3 000 = <font color='red'>Pc3 000</font>
I have the following code but the only way i can highlight the search within the result is to remove all the whitespace and non alphanumeric characters and then set both strings to lowercase. I'm stuck!
public string Highlight(string Search_Str, string InputTxt)
{
// Setup the regular expression and add the Or operator.
Regex RegExp = new Regex(Search_Str.Replace(" ", "|").Trim(), RegexOptions.IgnoreCase);
// Highlight keywords by calling the delegate each time a keyword is found.
string Lightup = RegExp.Replace(InputTxt, new MatchEvaluator(ReplaceKeyWords));
if (Lightup == InputTxt)
{
Regex RegExp2 = new Regex(Search_Str.Replace(" ", "|").Trim(), RegexOptions.IgnoreCase);
RegExp2.Replace(" ", "");
Lightup = RegExp2.Replace(InputTxt.Replace(" ", ""), new MatchEvaluator(ReplaceKeyWords));
int Found = Lightup.IndexOf("<font color='red'>");
if (Found == -1)
{
Lightup = InputTxt;
}
}
RegExp = null;
return Lightup;
}
public string ReplaceKeyWords(Match m)
{
return "<font color='red'>" + m.Value + "</font>";
}
Thanks guys!
Alter your search string by inserting an optional non-alphanumeric character class ([^a-z0-9]?) between each character. Instead of PC3000 use
P[^a-z0-9]?C[^a-z0-9]?3[^a-z0-9]?0[^a-z0-9]?0[^a-z0-9]?0
This matches Pc 3000, P-C300-0 and Pc3 000.
One way to do this would be to create a version of the input string that only contains alphanumerics and a lookup array that maps character positions from the new string to the original input. Then search the alphanumeric-only version for the keyword(s) and use the lookup to map the match positions back to the original input string.
Pseudo-code for building the lookup array:
cleanInput = "";
lookup = [];
lookupIndex = 0;
for ( index = 0; index < input.length; index++ ) {
if ( isAlphaNumeric(input[index]) {
cleanInput += input[index];
lookup[lookupIndex] = index;
lookupIndex++;
}
}
Problem: write a program in any language which, given a string of characters, generates a regex that matches any anagram of the input string. For all regexes greater than some length N, The regex must be shorter than the "brute force" solution listing all possible anagrams separated by "|", and the length of the regex should grow "slowly" as the input string grows (ideally linearly, but possibly n ln n).
Can you do it? I've tried, but my attempts are so far from succeeding, that I'm beginning to doubt it's possible. The only reason I ask is I thought I had seen a solution on another site, but much pointless googling failed to uncover it a second time.
I think this javascript code will work according to your specifications. The regex length will increase linearly with the length of the input. It generates a regex which uses positive lookahead to match the anagram of the input string. The lookahead part of regex makes sure all the characters are present in the test input string ignoring their order and the matching part ensures that the length of the test input string is same as the length of the input string (for which regex is constructed).
function anagramRegexGenerator(input) {
var lookaheadPart = '';
var matchingPart = '^';
var positiveLookaheadPrefix='(?=';
var positiveLookaheadSuffix=')';
var inputCharacterFrequencyMap = {}
for ( var i = 0; i< input.length; i++ )
{
if (!inputCharacterFrequencyMap[input[i]]) {
inputCharacterFrequencyMap[input[i]] = 1
} else {
++inputCharacterFrequencyMap[input[i]];
}
}
for ( var j in inputCharacterFrequencyMap) {
lookaheadPart += positiveLookaheadPrefix;
for (var k = 0; k< inputCharacterFrequencyMap[j]; k++) {
lookaheadPart += '.*';
if (j == ' ') {
lookaheadPart += '\\s';
} else {
lookaheadPart += j;
}
matchingPart += '.';
}
lookaheadPart += positiveLookaheadSuffix;
}
matchingPart += '$';
return lookaheadPart + matchingPart;
}
Sample input and output is the following
anagramRegexGenerator('aaadaaccc')
//generates the following string.
"(?=.*a.*a.*a.*a.*a)(?=.*d)(?=.*c.*c.*c)^.........$"
anagramRegexGenerator('abcdef ghij');
//generates the following string.
"(?=.*a)(?=.*b)(?=.*c)(?=.*d)(?=.*e)(?=.*f)(?=.*\s)(?=.*g)(?=.*h)(?=.*i)(?
=.*j)^...........$"
//test run returns true
/(?=.*a)(?=.*b)(?=.*c)(?=.*d)(?=.*e)(?=.*f)(?=.*\s)(?=.*g)(?=.*h)(?=.*i)(?
=.*j)^...........$/.test('acdbefghij ')
//or using the RegExp object
//this returns true
new RegExp(anagramRegexGenerator('abcdef ghij')).test('acdbefghij ')
//this returns false
new RegExp(anagramRegexGenerator('abcdef ghij')).test('acdbefghijj')