Find all possible positions for a wildcard in string - combinations

Let's say I have the string "hello" now I would like find all possible combinations to replace each character in the string with a wildcard. Where string length -1 wildcards are possible (4 in example).
e.g
_ello
h_llo
he_lo
hel_o
hell_
__llo
_e_lo
etc. (should be 30 possible one in this case)
Are there any simple algorithms for this?

Related

How to get a count of the word sizes in a large amount of text?

I have a large amount text - roughly 7000 words.
I would like to get a count of the words sizes e.g. the count of 4 letter words, 6 letters words using regex.
I am unsure how to go about this - my thought process so far would be to split the sentence into a String array which would allow me to count each individual elements size. Is there an easier way to go about this using a regex? I am using Groovy for this task.
EDIT: So i did get this working using an normal array but it was slightly messy. The final solution simply used Groovy's countBy() method coupled with a small amount of logic for anyone who might come across a similar problem.
Don't forget word boudary token \b. If you don't put it at both ends of a \w{n} token then all words longer than n characters are also found. For a 4 character word \b\w{4}\b for a six character long word use \b\w{6}\b. Here is a demo with 7000 words as input string.
Java implementation:
String dummy = ".....";
Pattern pattern = Pattern.compile("\\b\\w{6}\\b");
Matcher matcher = pattern.matcher(dummy);
int count = 0;
while (matcher.find())
count++;
System.out.println(count);
Read the file using any stream word by word and calculate their length. Store counters in an array and increment values after reading each word.
You could generate regexes for each size you want.
\w{6} would get each word with 6 letters exactly
\w{7} would get each word with 7 letters exactly
and so on...
So you could run one of these regex on the text, with the global flag enabled (finding every instance in the whole string). This will give you an array of every match, which you can then find the length of.

Advanced Lua Pattern Matching

I would like to know if either/both of these two scenarios are possible in Lua:
I have a string that looks like such: some_value=averylongintegervalue
Say I know there are exactly 21 characters after the = sign in the string, is there a short way to replace the string averylongintegervalue with my own? (i.e. a simpler way than typing out: string.gsub("some_value=averylongintegervalue", "some_value=.....................", "some_value=anewintegervalue")
Say we edit the original string to look like such: some_value=averylongintegervalue&
Assuming we do not know how many characters is after the = sign, is there a way to replace the string in between the some_value= and the &?
I know this is an oddly specific question but I often find myself needing to perform similar tasks using regex and would like to know how it would be done in Lua using pattern-matching.
Yes, you can use something like the following (%1 refers to the first capture in the pattern, which in this case captures some_value=):
local str = ("some_value=averylongintegervalue"):gsub("(some_value=)[^&]+", "%1replaced")
This should assign some_value=replaced.
Do you know if it is also possible to replace every character between the = and & with a single character repeated (such as a * symbol repeated 21 times instead of a constant string like replaced)?
Yes, but you need to use a function:
local str = ("some_value=averylongintegervalue")
:gsub("(some_value=)([^&]+)", function(a,b) return a..("#"):rep(#b) end)
This will assign some_value=#####################. If you need to limit this to just one replacement, then add ,1 as the last parameter to gsub (as Wiktor suggested in the comment).

Regex for UK registration number

I've been playing with creating a regular expression for UK registration numbers but have hit a wall when it comes to restricting overall length of the string in question. I currently have the following:
^(([a-zA-Z]?){1,3}(\d){1,3}([a-zA-Z]?){1,3})
This allows for an optional string (lower or upper case) of between 1 and 3 characters, followed by a mandatory numeric of between 1 and 3 characters and finally, a mandatory string (lower or upper case) of between 1 and 3 characters.
This works fine but I then want to apply a max length of 7 characters to the entire string but this is where I'm failing. I tried adding a 1,7 restriction to the end of the regex but the three 1,3 checks are superseding it and therefore allowing a max length of 9 characters.
Examples of registration numbers that need to pass are as follows:
A1
AAA111
AA11AAA
A1AAA
A11AAA
A111AAA
In the examples above, the A's represents any letter, upper or lower case and the 1's represent any number. The max length is the only restriction that appears not to be working. I disable the entry of a space so they can be assumed as never present in the string.
If you know what lengths you are after, I'd recommend you use the .length property which some languages expose for string length. If this is not an option, you could try using something like so: ^(?=.{1,7})(([a-zA-Z]?){1,3}(\d){1,3}([a-zA-Z]?){1,3})$, example here.

Separate text using regex

I have a string like
abcdefangners
and a set of numbers that specifies how to group the above string, such as
3,4
In this case, the output should be
abc,defa,gners
Is something like this possible using regex? I have one option of using a loop to get the comparisons of the set one by one, but is there a better way to do it?
You could do:-
/(.{3})(.{4})(.*)/
This would give you the substrings which you'd then have to join together.
You'd have to create the regexp for each set of numbers so it would not be as easy as other methods of string manipulation.

How to split string into chunks using regular expressions while keeping URI coded special characters together

Let's assume you have a string that you want to split into chunks having a maximum size of x characters. If you ignore new lines, a suitable regular expression would be .{1,x}
The problem I have is that I want to keep URI coded special characters like %20 together.
Example:
Hello%20world%20how%20are%20you%20today
Doing a "dumb" chunking with 5 character chunks, you end up with:
Hello
%20wo
rld%2
0how%
20are
%20yo
u%20t
oday
What I want to achieve is this:
Hello
%20wo
rld
%20ho
w%20a
re%20
you
%20to
day
Is this even possible with only regular expressions? I currently have a working solution with a loop that goes through each character and fills a bucket. If the bucket is full, it adds its content to an array of chunks and empties it. However, it also checks if the current character is a % and if the bucket would be able to hold 3 more characters (% plus the two hex digits). If it can, OK, otherwise it would push the content of the bucket in the chunks array and start with a fresh bucket.
Keep it simple, stay with your working solution with a loop, its probably faster and ten times more readible.... http://www.codinghorror.com/blog/2008/06/regular-expressions-now-you-have-two-problems.html
Try this regular expression to match all parts:
/(%[0-9A-F]{2}[^%]?[^%]?|[^%]%[0-9A-F]{2}[^%]?|[^%][^%]%[0-9A-F]{2}|[^%]{1,5})/
This basically lists all possible options to get at most five characters:
%[0-9A-F]{2}[^%]?[^%]? – a percent-encoded octet followed by at most two non-% characters
[^%]%[0-9A-F]{2}[^%]? – one non-% character, followed by a percent-encoded octet followed at most one non-% character
[^%][^%]%[0-9A-F]{2} – two non-% characters followed by a percent-encoded octet
[^%]{1,5} – one to five non-% characters