I am using,
#replaceList(oe.sql,";,&,<,>,`,',!,#,$,%,(,),=,+,{,},[,],\","")#
to remove unwanted characters from user input.
My problem is, that when I try all forbidden characters in the input-field replacelist removes every unwanted sign but leaves every comma there.
;,&,<,>,`,',!,#,$,%,(,),=,+,{,},[,],\ --> ,,,,,,,,,,,,,,,,,,,,
Does anyone know how to remove this? I tried:
,,, and ,[^,], in the "filter-string" and none of these worked...
#replace(#replaceList(oe.sql,";,&,<,>,`,',!,#,$,%,(,),=,+,{,},[,],\","")#,",","", "All")#
did also not work.
Use REReplace or REReplaceNoCase functions to remove unwanted characters (specified as regular expression) from a string:
#REReplace(";,&,<,>,`,',!,#,$,%,(,),=,+,{,},[,],\", "[;&<>`'!#$%()=+{}[\]\\,""]*", "", "ALL")#
ReplaceList function is useful in case if you need to replace certain values from one list with corresponding values from another.
You can get rid of the empty list elements like this:
NewList = ArrayToList(ListToArray(OldList));
Related
I am executing a regular expression against a long string, capturing portions of it.
One of this portion is between quotes and it can have any number of subportions delimited by slash, such as:
'george'
'paul/john'
'john/peter/charles'
...
the subportions are unknown and can be in any order.
I need to retrieve the string between the quotes, but also I would like to be able to remove unwanted leading and trailing groups while executing it.
For example, if the string starts with bruce or bongo, I want to remove it
'bruce/peter/marc' -> peter/marc
'bongo/bob/kevin/chris' -> bob/kevin/chris
However if the strings starts with anything else, then I want to keep it
'alfie/george/paul' -> alfie/george/paul
Only one word in the group can be present at at time, in the example above only bruce or bongo can be present at the beginning.
To do it I successfully used the following regular expression:
/'(?:bruce|bongo|)\/?([^']+)'/
In a similar way I want to remove a trailing group.
Let' say that if the string ends with sam or mark I want to remove this portion as well, for example:
'emily/grace/poppy/sam' -> emily/grace/poppy
'connor/barnaby/mark' -> connor/barnaby
Again, only one word of the group can be present at the end, in the example only sam or mark can end the string.
I thought to use the same as above and going with something similar to:
/'(?:bruce|bongo|)\/?([^']+)(?:sam|mark|)'/
But it's not working: bruce or bongo are removed if present, while sam or mark are always kept if present.
I know I can extract the match as it is and remove it with string manipulation methods. I am using javascript at the moment, and I can use:
"bruce/john/charles/sam".replace(/^(?:bruce|bongo)\//, '').replace(/\/(?:sam|mark)$/, '');
But I was wondering if there's a way to achieve the same result using directly the initial regular expression I execute against the long original string.
What am I missing?
You just have to make the middle part lazy, by adding a ? after the +:
'(?:bruce|bongo|)\/?([^']+?)(?:sam|mark|)'
And if you want the capture group to exclude the / that occurs before sam or mark, then:
'(?:bruce|bongo|)\/?([^']+?)(?:\/sam|\/mark|)'
I am attempting to use REGEXREPLACE in Google Sheets to remove the repeating special character \n.
I can't get it to replace all repeating instances of the characters with a single instance.
Here is my code:
REGEXREPLACE("Hi Gene\n\n\n\n\nHope","\\n+","\\n")
I want the results to be:
Hi Gene\nHope
But it always maintains the new lines.
Hi Gene\n\n\n\n\nHope
It has to be an issue with replacing the special characters because this:
REGEXREPLACE("Hi Gennnne\nHope","n+","n")
Produces:
Hi Gene\nHope
How do I remove repeating instances of special characters with a single instance of the special character in Google Sheets?
Edit
Just found easier way:
=REGEXREPLACE("Hi Gene\n\n\n\n\nHope","(\\n)+","\\n")
Original solution
Thy this formula:
=REGEXREPLACE(A1,REPT(F2,(len(A1)-len(REGEXREPLACE(A1,"\\n","")))/2),"\\n")
Put your text in A1.
How it works
It's workaround, we want to use final formula like this:
REGEXREPLACE("Hi Gene\n\n\n\n\nHope","\\n+\\n+\\n+\\n+\\n+","\\n")
First target is to find, how many times to repeat \\n+:
=(len(F1)-len(REGEXREPLACE(F1,F2,F3)))/2
Then just combine RegEx.
https://support.google.com/docs/answer/3098245?hl=en
REGEXREPLACE(text, regular_expression, replacement)
The problem seems to be how it interprets the "text". If I put this in a cell REGEXREPLACE("Hi Gene\n\n\n\n\nHope","","")
the output is Hi Gene\n\n\n\n\nHope as well.
If I place the text in a cell by itself with proper newlines and have this REGEXREPLACE(A1, "(\n)\n*", "$1") it works.
Note I could not just do s/\n+/\n/ as it still does not interpret the newline notation as anything special. It would just output \n instead of a newline.
I believe that you don't need to double escape the newlines, e.g. just search for \n:
REGEXREPLACE("Hi Gene\n\n\n\n\nHope", "\n+", "\n")
When you replace \\n you are searching for the literal text \n, rather than newline.
I am trying to use REGEX in Google Sheets to clean up form data arriving as comma delimited data with arbitrary leading commas and single spaces.
sample data from form:
,,Refrigerator,,,,, ,,Slide,,Dual Slide,,Microwave Oven,,Indoor Shower,Built in Stereo,Day/Night Switch,,BluRay/DVD
I want to use
REGEXREPLACE(text, regular_expression, replacement)
to remove multiple commas and single spaces that may occur between commas, replacing with a single comma so the line reads
Refrigerator,Slide,Dual Slide,Microwave Oven, . . . etc
The match string (^,+|(,+ ,)|,+) works properly in the Rubular.com simulator, but when used in the Google Spreadsheet as in example with raw data above pasted in at cell M12 as source text:
REGEXREPLACE("M12","(^,+|(,+ ,)|,+)",",")
it fails by not removing one of the leading commas.
,Refrigerator,,,,, ,,Slide,,Dual Slide,,Microwave Oven,,Indoor Shower,Built in Stereo,Day/Night Switch,,BluRay/DVD
The Googlesheet REGEX help points to https://github.com/google/re2/blob/master/doc/syntax.txt which seems to describe the operations the same as the simulator.
From what you're describing, Google is working as expected and the other site linked isn't. Your regex is matching ^,+, amongst other things, (ie one or more commas at the start), and replacing them with a single comma. If the input string has commas at the start, I would expect the output to have one too.
You could build on what you've done with another regular expression replace, and strip any leading commas:
REGEXREPLACE(REGEXREPLACE(M12,"((,+ ,)|,+)",","), "^,+", "")
This uses your original one, minus the leading commas part, to do the original replace, then wraps it in a second call looking for just leading commas, and replacing those with nothing.
Having said that, your original regex is also not quite working as expected either and isn't stripping all the commas and spaces down to a single comma in all circumstances. Instead, you can use this one:
REGEXREPLACE(REGEXREPLACE(M12,"( ?(, *)+)",","), "^,+", "")
This looks for an optional space, followed by one or more commas, each with zero or more commas after them, replacing the whole lot with a single comma, then keeping the new "remove all commas at the start" replace also.
One more good way to do this:
=TEXTJOIN(", ",1,SPLIT(A1,", "))
Given a text "article_utf8" i want to remove a list of words:
remove = "el|la|de|que|y|a|en|un|ser|se|no|haber|..."
regex = re.compile(r'\b('+remove+r')\b', flags=re.IGNORECASE)
article_out = regex.sub("", article_utf8)
however this is incorrectly removing some words and parts of words for example:
1- aseguro becomes seguro
2- sería becomes í
3- coma becomes com
4- miercoles becomes 'ercoles'
Technically parts of a word can match a regexp. To solve this you would have to make sure that whatever sequence of letters your regexp matches is a single word and not part of it.
One way would be to make the regexp contain leading and trailing spaces, but words could also be separated with periods or commas so you would have to take those into account too if you want to catch all instances.
Alternatively, you can try splitting the list first into words using the built-in split method (https://docs.python.org/2/library/stdtypes.html#str.split). Then I would check each word in the resulting list, remove the ones I don't want and rejoin the strings. This method, however doesn't even need regexps so it's probably not what you intended despite being simple and practical.
After much testing, the following will remove the small words in a natural language string, without removing them from parts of other words:
regex = re.compile(r'[\s]?\b('+remove+')[\b\s\.\,]', flags=re.IGNORECASE)
Hey,
I can't figure out how to write a regular expression for my website, I would like to let the user input a list of items (tags) separated by comma or by comma and a space, for example "apple, pie,applepie". Would it be possible to have such regexp?
Thanks!
EDIT:
I would like a regexp for javascript in order to check the input before the user submits a form.
What you're looking for is deceptively easy:
[^,]+
This will give you every comma-separated token, and will exclude empty tokens (if the user enters "a,,b" you will only get 'a' and 'b'), BUT it will break if they enter "a, ,b".
If you want to strip the spaces from either side properly (and exclude whitespace only elements), then it gets a tiny bit more complicated:
[^,\s][^\,]*[^,\s]*
However, as has been mentioned in some of the comments, why do you need a regex where a simple split and trim will do the trick?
Assuming the words in your list may be letters from a to z and you allow, but do not require, a space after the comma separators, your reg exp would be
[a-z]+(,\s*[a-z]+)*
This is match "ab" or "ab, de", but not "ab ,dc"
Here's a simpler solution:
console.log("test, , test".match(/[^,(?! )]+/g));
It doesn't break on empty properties and strips spaces before and after properties.
This thread is almost 7 years old and was last active 5 months ago, but I wanted to achieve the same results as OP and after reading this thread, came across a nifty solution that seems to work well
.match(/[^,\s?]+/g)
Here's an image with some example code of how I'm using it and how it's working
Regarding the regular expression... I suppose a more accurate statement would be to say "target anything that IS NOT a comma followed by any (optional) amount of white space" ?
I often work with coma separated pattern, and for me, this works :
((^|[,])pattern)+
where "pattern" is the single element regexp
This might work:
([^,]*)(, ?([^,]*))*
([^,]*)
Look For Commas within a given string, followed by separating these. in regards to the whitespace? cant you just use commas? remove whitespace?
I needed an strict validation for a comma separated input alphabetic characters, no spaces. I end up using this one is case anyone needed:
/^[a-z]+(,[a-z]+)*$/
Or, to support lower- and uppercase words:
/^[A-Za-z]+(?:,[A-Za-z]+)*$/
In case one need to allow whitespace between words:
/^[A-Za-z]+(?:\s*,\s*[A-Za-z]+)*$/
/^[A-Za-z]+(?:,\s*[A-Za-z]+)*$/
You can try this, it worked for me:
/.+?[\|$]/g
or
/[^\|?]+/g
but replace '|' for the one you need. Also, don't forget about shielding.
something like this should work: ((apple|pie|applepie),\s?)*