Regex Split after 20 characters - regex

I have a fixed width text file where each field is given 20 characters total. Usually only 5 characters are used and then there is trailing whitespace. I'd like to use the Split function to extract the data, rather than the Match function. Can someone help me with a regex for this? Thanks in advance.

I would do this with string manipulation, rather than regex. If you're using JavaScript:
var results = [];
for (i = 0; i < input.length; i += 20) {
results.push(input.substring(i, i + 20));
}
Or to trim the whitespace:
var results = [];
for (i = 0; i < input.length; i += 20) {
results.push(input.substring(i, i + 20).replace(/^\s+|\s+$/g, ''));
}
If you must use regex, it should just be something like .{20}.

Split on whitespaces and get the first returned element. This is under the assumption that you do not have whitespaces within the actual data.
cheers

If you must:
^(.{20})(.{20})(.{20})$ // repeat the part in parentheses for each field
You still need to trim each field to remove trailing whitespace.
It seems simpler to use substr() or your languages equivalent. Or in PHP you could use str_split($string, 20).

Related

Regex replace phone numbers with asterisks pattern

I want to apply a mask to my phone numbers replacing some characters with "*".
The specification is the next:
Phone entry: (123) 123-1234
Output: (1**) ***-**34
I was trying with this pattern: "\B\d(?=(?:\D*\d){2})" and the replacing the matches with a "*"
But the final input is something like (123)465-7891 -> (1**)4**-7*91
Pretty similar than I want but with two extra matches. I was thinking to find a way to use the match zero or once option (??) but not sure how.
Try this Regex:
(?<!\()\d(?!\d?$)
Replace each match with *
Click for Demo
Explanation:
(?<!\() - negative lookbehind to find the position which is not immediately preceded by (
\d - matches a digit
(?!$) - negative lookahead to find the position not immediately followed by an optional digit followed by end of the line
Alternative without lookarounds :
match \((\d)\d{2}\)\s+\d{3}-\d{2}(\d{2})
replace by (\1**) ***-**\2
In my opinion you should avoid lookarounds when possible. I find them less readable, they are less portable and often less performant.
Testing Gurman's regex and mine on regex101's php engine, mine completes in 14 steps while Gurman's completes in 80 steps
Some "quickie":
function maskNumber(number){
var getNumLength = number.length;
// The number of asterisk, when added to 4 should correspond to length of the number
var asteriskLength = getNumLength - 4;
var maskNumber = number.substr(-4);
for (var i = 0; i < asteriskLength; i++) maskNumber+= '*';
var mask = maskNumber.split(''), maskLength = mask.length;
for(var i = maskLength - 1; i > 0; i--) {
var j = Math.floor(Math.random() * (i + 1));
var tmp = mask[i];
mask[i] = mask[j];
mask[j] = tmp;
}
return mask.join('');
}

how can I linebreak javascript with regex keeping sperator

what I wanna do is
change
1.apple2.cat3.green(1)table(2)computer①what②can i③do?●help●me●plz
this to
1.apple
2.cat
3.green
(1)table
(2)computer
①what
②can i
③do?
●help
●me
●plz
this
there're many kind of delimiter
"1.", "2." .. "(1)".."(2)"..■ ○
and so on
number is only a single digit
I want to list many delimiter can split or add linebreak, but keep delimiter
number or bullet should not be deleted.
You can use regex like below:
let s = '1.apple2.cat3.green(1)table(2)computer①what②can i③do?●help●me●plz';
let regex = /(\d\.|\(\d\)|[①-⑳]|●|■|○)[a-z]+/ig;
let result = null;
while (result = regex.exec(s)) {
console.log(result[0]); // or you can push into an array, etc.
}

Regex to match all occurrences that begin with n characters in sequence

I'm not sure if it's even possible for a regular expression to do this. Let's say I have a list of the following strings:
ATJFH
ABHCNEK
BKDFJEE
NCK
ABH
ABHCNE
KDJEWRT
ABHCN
EGTI
And I want to match all strings that begin with any number of characters for the following string: ABHCNEK
The matches would be:
ABH
ABHCN
ABHCNE
ABHCNEK
I tried things like ^[A][B][H][C][N][E][K] and ^A[B[H[C[N[E[K]]]]]], but I can't seem to get it to work...
Can this be done in regex? If so, what would it be?
The simplest can be
^(?:ABHCNEK|ABHCNE|ABHCN|ABHC|ABH|AB|A)$
See demo.
https://regex101.com/r/eB8xU8/6
Use this regular expression:
^[ABHCNEK]+$
You haven't said how you want to use it, but one option doesn't require regex. Loop through the various strings and check for a match within your test string:
var strings = ['ATJFH', 'ABHCNEK', 'BKDFJEE', 'NCK', 'ABH', 'ABHCNE', 'KDJEWRT', 'ABHCN', 'EGTI'];
var test = 'ABHCNEK';
for (var i = 0; i < strings.length; i++) {
if (test.match(strings[i])) {
console.log(strings[i]);
}
}
This returns:
ABHCNEK
ABH
ABHCNE
ABHCN

How can I split a string into an array and determine WHICH character came after the split?

I'm trying to split all the words in a string into an array in AS3. The obvious answer of course would be to simply do this:
str.split(/\s/);
The problem here is that I need to be able to tell whether the split occurred on a newline or a space. I'm trying to put the words of a string into draggable boxes, and I want the ones after a newline to go, well, on a new line.
Any idea the best way to go about this? Clearly, the above split method will get rid of the crucial newline character that will tell me what I need to know. Should I use a regex.exec with a while loop, or is there any way to use split to preserve the characters I need?
Example string :
This is an example string
with spaces as well as newlines
and needs a regex
1/ Split the string on newline, get array#1.
array#1 = [ "This is an example string","with spaces as well as
newlines","and needs a regex" ]
2/ For each element in array#1 , split based on your current regex which will break the strings
only on spaces as newlines have already been dealth with, this 2-D array is array#2
array#2 = [
["This","is","an","example","string"] ,
["with","spaces","as","well","as","newlines"],
["and","needs","a","regex"]
]
3/ Process elements of array#2 as you want.
First split you string at the newline
var lines:Array = str.split("\n");
Now you can loop on you lines and split each of these in to seperate words
for(var i:int = 0; i < lines.length; i++){
var words = str[i].split(" ");
for(var j:int = 0; j < words.length; j++){
trace("word", words[i]);
}
trace("newline");
}

RegEx to find words with characters

I've found answers to many of my questions here but this time I'm stuck. I've looked at 100's of questions but haven't found an answer that solves my problem so I'm hoping for your help :D
Considering the following list of words:
iris
iridium
initialization
How can I use regex to find words in this list when I am looking using exactly the characters u, i, i? I'm expecting the regex to find "iridium" only because it is the only word in the list that has two i's and one u.
What I've tried
I've been searching both here and elsewhere but haven't come across any that helps me.
[i].*[i].*[u]
matches iridium, as expected, and not iris nor initialization. However, the characters i, i, u must be in that sequence in the word, which may or may not be the case. So trying with a different sequence
[u].*[i].*[i]
This does not match iridium (but I want it to, iridium contains u, i, i) and I'm stuck for what to do to make it match. Any ideas?
I know I could try all sequences (in the example above it would be iiu; iui; uii) but that gets messy when I'm looking for more characters (say 6, tnztii which would match initialization).
[t].*[n].*[z].*[t].*[i].*[i]
[t].*[z].*[n].*[t].*[i].*[i]
[t].*[z].*[n].*[i].*[t].*[i]
..... (long list until)
[i].*[n].*[i].*[t].*[z].*[t] (the first matching sequence)
Is there a way to use regex to find the word, irrespective of the sequence of the characters?
I don't think there's a way to solve this with RegularExpressions which does not end in a horribly convoluted expression - might be possible with LookForward and LookBehind expressions, but I think it's probably faster and less messy if you simply solve this programmatically.
Chop the string up by its whitespaces and then iterate over all the words and count the instances your characters appear inside this word. To speed things up, discard all words with a length less than your character number requirement.
Is this an academic exercise, or can you use more than a single regular expression? Is there a language wrapped around this? The simplest way to do what you want is to have a regexp that matches just i or u, and examine (count) the matches. Using python, it could be a one-liner. What are you using?
The part you haven't gotten around to yet is that there might be additional i's or u's in the word. So instead of matching on .*, match on [^iu].
Here's what I would do:
Array.prototype.findItemsByChars = function(charGroup) {
console.log('charGroup:',charGroup);
charGroup = charGroup.toLowerCase().split('').sort().join('');
charGroup = charGroup.match(/(.)\1*/g);
for (var i = 0; i < charGroup.length; i++) {
charGroup[i] = {char:charGroup[i].substr(0,1),count:charGroup[i].length};
console.log('{char:'+charGroup[i].char+' ,count:'+charGroup[i].count+'}');
}
var matches = [];
for (var i = 0; i < this.length; i++) {
var charMatch = 0;
//console.log('word:',this[i]);
for (var j = 0; j < charGroup.length; j++) {
try {
var count = this[i].match(new RegExp(charGroup[j].char,'g')).length;
//console.log('\tchar:',charGroup[j].char,'count:',count);
if (count >= charGroup[j].count) {
if (++charMatch == charGroup.length) matches.push(this[i]);
}
} catch(e) { break };
}
}
return matches.length ? matches : false;
};
var words = ['iris','iridium','initialization','ulisi'];
var matches = words.findItemsByChars('iui');
console.log('matches:',matches);
EDIT: Let me know if you need any explanation.
I know this is a really old post, but I found this topic really interesting and thought people might look for a similar answer some day.
So the goal is to match all words with a specific set of characters in any order. There is a simple way to do this using lookaheads :
\b(?=(?:[^i\W]*i){2})(?=[^u\W]*u)\w+\b
Here is how it works :
We use one lookahead (?=...) for each letter to be matched
In this, we put [^x\W]*x where x is the the letter that must be present.
We then make this pattern occur n times, where n is the number of times that x must appear in th word using (?:...){n}
The resulting regex for a letter x having to appear n times in the word is then (?=(?:[^x\W]*x){n})
All you have to do then is to add this pattern for each letter and add \w+ at the end to match the word !