Google Apps Script replace all string except numbers with regex [duplicate] - regex

I've got this text: 3,142 people. I need to remove the people from it and get only the number, also removing comma(s). I need it to work with any higher numbers too like 13,142 or even 130,142 (at every 3 digits it will get a new comma).
So, in short, I need to get the numeric characters only, without commas and people. Ex: 3,142 people -> 3142.
My first version that didn't work was:
var str2 = "3,142 people";
var patt2 = /\d+/g;
var result2 = str2.match(patt2);
But after I changed patt2 to /\d+[,]\d+/g, it worked.

you can use this:
var test = '3,142 people';
test.replace(/[^0-9.]/g, "");
It will remove every thing except digit and decimal point

'3,142 people'.replace(/[^\d]/g, ''); // 3142
JSFiddle Demo: http://jsfiddle.net/zjx2hn1f/1/
Explanation
[] // match any character in this set
[^] // match anything NOT in character set
\d // match only digit
[^\d] // match any character that is NOT a digit
string.replace(/[^\d]/g, '') // replace any character that is NOT a digit with an empty string, in other words, remove it.

Related

Regex: Separate a string of characters with a non-consistent pattern (Oracle) (POSIX ERE)

EDIT: This question pertains to Oracle implementation of regex (POSIX ERE) which does not support 'lookaheads'
I need to separate a string of characters with a comma, however, the pattern is not consistent and I am not sure if this can be accomplished with Regex.
Corpus: 1710ABCD.131711ABCD.431711ABCD.41711ABCD.4041711ABCD.25
The pattern is basically 4 digits, followed by 4 characters, followed by a dot, followed by 1,2, or 3 digits! To make the string above clear, this is how it looks like separated by a space 1710ABCD.13 1711ABCD.43 1711ABCD.4 1711ABCD.404 1711ABCD.25
So the output of a replace operation should look like this:
1710ABCD.13,1711ABCD.43,1711ABCD.4,1711ABCD.404,1711ABCD.25
I was able to match the pattern using this regex:
(\d{4}\w{4}\.\d{1,3})
It does insert a comma but after the third digit beyond the dot (wrong, should have been after the second digit), but I cannot get it to do it in the right position and globally.
Here is a link to a fiddle
https://regex101.com/r/qQ2dE4/329
All you need is a lookahead at the end of the regular expression, so that the greedy \d{1,3} backtracks until it's followed by 4 digits (indicating the start of the next substring):
(\d{4}\w{4}\.\d{1,3})(?=\d{4})
^^^^^^^^^
https://regex101.com/r/qQ2dE4/330
To expand on #CertainPerformance's answer, if you want to be able to match the last token, you can use an alternative match of $:
(\d{4}\w{4}\.\d{1,3})(?=\d{4}|$)
Demo: https://regex101.com/r/qQ2dE4/331
EDIT: Since you now mentioned in the comment that you're using Oracle's implementation, you can simply do:
regexp_replace(corpus, '(\d{1,3})(\d{4})', '\1,\2')
to get your desired output:
1710ABCD.13,1711ABCD.43,1711ABCD.4,1711ABCD.404,1711ABCD.25
Demo: https://regex101.com/r/qQ2dE4/333
In order to continue finding matches after the first one you must use the global flag /g. The pattern is very tricky but it's feasible if you reverse the string.
Demo
var str = `1710ABCD.131711ABCD.431711ABCD.41711ABCD.4041711ABCD.25`;
// Reverse String
var rts = str.split("").reverse().join("");
// Do a reverse version of RegEx
/*In order to continue searching after the first match,
use the `g`lobal flag*/
var rgx = /(\d{1,3}\.\w{4}\d{4})/g;
// Replace on reversed String with a reversed substitution
var res = rts.replace(rgx, ` ,$1`);
// Revert the result back to normal direction
var ser = res.split("").reverse().join("");
console.log(ser);

Regex replacing special characters in a string

I have numerical values that contain special characters and I would like to replace those special characters with "x"
I already tried [^\w*], and it will only work when there is one special character
When there is more than 1234?12?, it won't capture the second special character, what am i doing wrong?
Here is something you could use. It will replace all none numeric characters. Good luck!
var str = "rt5121212?232?2*dse%e&323"
var pattern = /([^![0-9])/gi;
var sanitized = str.replace(pattern,'');
console.log(sanitized);

Regex to create url friendly string

I want to create a url friendly string (one that will only contain letters, numbers and hyphens) from a user input to :
remove all characters which are not a-z, 0-9, space or hyphens
replace all spaces with hyphens
replace multiple hyphens with a single hyphen
Expected outputs :
my project -> my-project
test project -> test-project
this is # long str!ng with spaces and symbo!s -> this-is-long-strng-with-spaces-and-symbos
Currently i'm doing this in 3 steps :
$identifier = preg_replace('/[^a-zA-Z0-9\-\s]+/','',strtolower($project_name)); // remove all characters which are not a-z, 0-9, space or hyphens
$identifier = preg_replace('/(\s)+/','-',strtolower($identifier)); // replace all spaces with hyphens
$identifier = preg_replace('/(\-)+/','-',strtolower($identifier)); // replace all hyphens with single hyphen
Is there a way to do this with one single regex ?
Yeah, #Jerry is correct in saying that you can't do this in one replacement as you are trying to replace a particular string with two different items (a space or dash, depending on context). I think Jerry's answer is the best way to go about this, but something else you can do is use preg_replace_callback. This allows you to evaluate an expression and act on it according to what the match was.
$string = 'my project
test project
this is # long str!ng with spaces and symbo!s';
$string = preg_replace_callback('/([^A-Z0-9]+|\s+|-+)/i', function($m){$a = '';if(preg_match('/(\s+|-+)/i', $m[1])){$a = '-';}return $a;}, $string);
print $string;
Here is what this means:
/([^A-Z0-9]+|\s+|-+)/i This looks for any one of your three quantifiers (anything that is not a number or letter, more than one space, more than one hyphen) and if it matches any of them, it passes it along to the function for evaluation.
function($m){ ... } This is the function that will evaluate the matches. $m will hold the matches that it found.
$a = ''; Set a default of an empty string for the replacement
if(preg_match('/(\s+|-+)/i', $m[1])){$a = '-';} If our match (the value stored in $m[1]) contains multiple spaces or hyphens, then set $a to a dash instead of an empty string.
return $a; Since this is a function, we will return the value and that value will be plopped into the string wherever it found a match.
Here is a working demo
I don't think there's one way of doing that, but you could reduce the number of replaces and in an extreme case, use a one liner like that:
$text=preg_replace("/[\s-]+/",'-',preg_replace("/[^a-zA-Z0-9\s-]+/",'',$text));
It first removes all non-alphanumeric/space/dash with nothing, then replaces all spaces and multiple dashes with a single one.
Since you want to replace each thing with something different, you will have to do this in multiple iterations.
Sorry D:

Regex match a string and allow specific character to appear randomly

I want to extract a portion of a string, allowing for the dash character to appear randomly throughout. In my match, I want the dash character occurrences to be included.
Let's say I have a scenario like so:
haystack = "arandomse-que-nce"
needle = "sequence"
and I want to come out on the other end with a string like se-que-nce this this case, what would the regex pattern look like?
I would split the string and then join by -*; for example, in JavaScript:
var needle = "sequence"
var regex = new RegExp(needle.split('').join('-*'))
var result = "arandomse-que-nce".match(regex) // ["se-que-nce"]
var result2 = "a-bad-sequ_ence".match(regex) // null
You could also use a regex to insert -* between each character:
var regex = new RegExp(needle.replace(/(?!$|^)/g, '-*'))
Both the split/join method and the replace method return 's-*e-*q-*u-*e-*n-*c-*e' for the regex.
If you have characters like * in your string, that have meanings in regular expressions, you may want to escape them, like so:
var regex = new RegExp(needle.replace(/(?!$|^)/g, '-*')
.replace(/([-\\^$*+?.()|[\]{}])/g, '\\$1'))
Then, if needle was 1+1, for example, it would give you 1-*\+-*1 for the regex.
s-*e-*q-*u-*e-*n-*c-*e-*
The assumes that multiple hyphens in a row are okay.
EDIT: Doorknob's split/join solution is good, but be aware that it only works for character that aren't special characters (*, +, etc.)
I don't know what the specifications are, but if there are special characters, make sure to escape them:
new RegExp(needle.split('').map(function(c) { return '\\' + c; }).join('-*'))
You could try to use:
s-?e-?q-?u-?e-?n-?c-?e

Regex to remove characters up to a certain point in a string

How do I use regex to convert
11111aA$xx1111xxdj$%%`
to
aA$xx1111xxdj$%%
So, in other words, I want to remove (or match) the FIRST grouping of 1's.
Depending on the language, you should have a way to replace a string by regex. In Java, you can do it like this:
String s = "11111aA$xx1111xxdj$%%";
String res = s.replaceAll("^1+", "");
The ^ "anchor" indicates that the beginning of the input must be matched. The 1+ means a sequence of one or more 1 characters.
Here is a link to ideone with this running program.
The same program in C#:
var rx = new Regex("^1+");
var s = "11111aA$xx1111xxdj$%%";
var res = rx.Replace(s, "");
Console.WriteLine(res);
(link to ideone)
In general, if you would like to make a match of anything only at the beginning of a string, add a ^ prefix to your expression; similarly, adding a $ at the end makes the match accept only strings at the end of your input.
If this is the beginning, you can use this:
^[1]*
As far as replacing, it depends on the language. In powershell, I would do this:
[regex]::Replace("11111aA$xx1111xxdj$%%","^[1]*","")
This will return:
aA$xx1111xxdj$%%
If you only want to replace consecutive "1"s at the beginning of the string, replace the following with an empty string:
^1+
If the consecutive "1"s won't necessarily be the first characters in the string (but you still only want to replace one group), replace the following with the contents of the first capture group (usually \1 or $1):
1+(.*)
Note that this is only necessary if you only have a "replace all" capability available to you, but most regex implementations also provide a way to replace only one instance of a match, in which case you could just replace 1+ with an empty string.
I'm not sure but you can try this
[^1](\w*\d*\W)* - match all as a single group except starting "1"(n) symbols
In Javascript
var str = '11111aA$xx1111xxdj$%%';
var patt = /^1+/g;
str = str.replace(patt,"");