Matching word pattern with character pattern - regex

So I have a very interesting question where I have a long string s such as:
eatsleepeatwalksleepwalk
and a smaller string p such as:
esetst
so on a quick look you can deduce that:
eat = e
sleep = s
walk = t
The problem statement is to tell whether the pattern of characters in smaller string p matches the words in the bigger string s
Size of s = 0 to 1000
Size of p = 0 to 1000
I'm aware of simple pattern matching using KMP, however this problem seems quite tricky and I'm unable to get to a starting point of solving this problem.
Any hints?
Edit 1: Look at #Neverever's answer below. Seems quite interesting, awaiting examination of space/time complexity.

Tried to solve it using JavaScript RegExp
$("button").click(function() {
let p = $("#p").val()
, s = $("#s").val()
, regMap = []
, regStr = "";
for (let c of p) {
let idx = regMap.indexOf(c);
if (idx === -1) {
regMap.push(c);
regStr += "(.+)";
} else {
regStr += "\\" + (idx + 1);
}
}
let reg = new RegExp("^" + regStr + "$");
console.log("RegExp used: " + regStr)
console.log("Result: " + reg.test(s));
});
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<label>String `s`: <input type="text" id="s" value="eatsleepeatwalksleepwalk" /></label><br>
<label>String `p`: <input type="text" id="p" value="esetst" /></label><br>
<button type="button">Run</button>

Related

Backspace String Compare Leetcode Question

I have a question about the following problem on Leetcode:
Given two strings S and T, return if they are equal when both are typed into empty text editors. # means a backspace character.
Example 1:
Input: S = "ab#c", T = "ad#c"
Output: true
Explanation: Both S and T become "ac".
Example 2:
Input: S = "ab##", T = "c#d#"
Output: true
Explanation: Both S and T become "".
Example 3:
Input: S = "a##c", T = "#a#c"
Output: true
Explanation: Both S and T become "c".
Example 4:
Input: S = "a#c", T = "b"
Output: false
Explanation: S becomes "c" while T becomes "b".
Note:
1 <= S.length <= 200
1 <= T.length <= 200
S and T only contain lowercase letters and '#' characters.
Follow up:
Can you solve it in O(N) time and O(1) space?
My answer:
def backspace_compare(s, t)
if (s.match?(/[^#[a-z]]/) || t.match?(/[^#[a-z]]/)) || (s.length > 200 || t.length > 200)
return "fail"
else
rubular = /^[\#]+|([^\#](\g<1>)*[\#]+)/
if s.match?(/#/) && t.match?(/#/)
s.gsub(rubular, '') == t.gsub(rubular, '')
else
new_s = s.match?(/#/) ? s.gsub(rubular, '') : s
new_t = t.match?(/#/) ? t.gsub(rubular, '') : t
new_s == new_t
end
end
end
It works in the terminal and passes the given examples, but when I submit it on leetcode it tells me Time Limit Exceeded. I tried shortening it to:
rubular = /^[\#]+|([^\#](\g<1>)*[\#]+)/
new_s = s.match?(/#/) ? s.gsub(rubular, '') : s
new_t = t.match?(/#/) ? t.gsub(rubular, '') : t
new_s == new_t
But also the same error.
So far, I believe my code fulfills the O(n) time, because there are only two ternary operators, which overall is O(n). I'm making 3 assignments and one comparison, so I believe that fulfills the O(1) space complexity.
I have no clue how to proceed beyond this, been working on it for a good 2 hours..
Please point out if there are any mistakes in my code, and how I am able to fix it.
Thank you! :)
Keep in mind that with N <= 200, your problem is more likely to be linear coefficient, not algorithm complexity. O(N) space is immaterial for this; with only 400 chars total, space is not an issue. You have six regex matches, two of which are redundant. More important, regex is slow processing for such a specific application.
For speed, drop the regex stuff and do this one of the straightforward, brute-force ways: run through each string in order, applying the backspaces as appropriate. For instance, change both the backspace and the preceding letter to spaces. At the end of your checking, remove all the spaces in making a new string. Do this with both S and T; compare those for equality.
It may be easiest to start at the end of the string and work towards the beginning:
def process(str)
n = 0
str.reverse.each_char.with_object('') do |c,s|
if c == '#'
n += 1
else
n.zero? ? (s << c) : n -= 1
end
end.reverse
end
%w|ab#c ad#c ab## c#d# a##c #a#c a#c b|.each_slice(2) do |s1, s2|
puts "\"%s\" -> \"%s\", \"%s\" -> \"%s\" %s" %
[s1, process(s1), s2, process(s2), (process(s1) == process(s2)).to_s]
end
"ab#c" -> "ac", "ad#c" -> "ac" true
"ab##" -> "", "c#d#" -> "" true
"a##c" -> "c", "#a#c" -> "c" true
"a#c" -> "c", "b" -> "b" false
Let's look at a longer string.
require 'time'
alpha = ('a'..'z').to_a
#=> ["a", "b", "c",..., "z"]
s = (10**6).times.with_object('') { |_,s|
s << (rand < 0.4 ? '#' : alpha.sample) }
#=> "h####fn#fjn#hw###axm...#zv#f#bhqsgoem#glljo"
s.size
#=> 1000000
s.count('#')
#=> 398351
and see how long it takes to process.
require 'time'
start_time = Time.now
(u = process(s)).size
#=> 203301
puts (Time.now - start_time).round(2)
#=> 0.28 (seconds)
u #=> "ffewuawhfa...qsgoeglljo"
As u will be missing the 398351 pound signs in s, plus an almost equal number of other characters removed by the pound signs, we would expect u.size to be about:
10**6 - 2 * s.count('#')
#=> 203298
In fact, u.size #=> 203301, meaning that, at the end, 203301 - 203298 #=> 3 pound signs were unable to remove a character from s.
In fact, process can be simplified. I leave that as an exercise for the reader.
class Solution {
public boolean backspaceCompare(String s, String t) {
try {
Stack<Character> st1 = new Stack<>();
Stack<Character> st2 = new Stack<>();
st1 = convertToStack(s);
st2 = convertToStack(t);
if (st1.size() != st2.size()) {
return false;
} else {
int length = st1.size();
for (int i = 0; i < length; i++) {
if (st1.peek() != st2.peek())
return false;
else {
st1.pop();
st2.pop();
}
if (st1.isEmpty() && st2.isEmpty())
return true;
}
}
} catch (Exception e) {
System.out.print(e);
}
return true;
}
public Stack<Character> convertToStack(String s){
Stack<Character> st1 = new Stack<>();
for (int i = 0; i < s.length(); i++) {
if (s.charAt(i) != '#') {
st1.push(s.charAt(i));
} else if (st1.empty()) {
continue;
} else {
st1.pop();
}
}
return st1;
}
}

Regex for extracting the exception names

I want to extract the exception name from the below sentences using regex pattern,
Error: MYTERA RuntimeException: No task output
Error: android.java.lang.NullPointerException.checked
I need the terms RuntimeException and NullPointerException with a single Regex pattern.
This expression might help you to do so:
([A-Za-z]+Exception)
Graph
This graph shows how the expression would work and you can visualize your expressions in this link:
Performance
This JavaScript snippet shows the performance of that expression using a simple 1-million times for loop.
repeat = 1000000;
start = Date.now();
for (var i = repeat; i >= 0; i--) {
var string = 'Error: android.java.lang.NullPointerException.checked';
var regex = /(.*)\.([A-Za-z]+Exception)(.*)/g;
var match = string.replace(regex, "$2");
}
end = Date.now() - start;
console.log("YAAAY! \"" + match + "\" is a match 馃挌馃挌馃挌 ");
console.log(end / 1000 + " is the runtime of " + repeat + " times benchmark test. 馃槼 ");

How to replace multiple value in python with the re module

I need to replace some text inside a file with the python re module.
Here is the input value :
<li><span class="PCap CharOverride-4">Contr么les</span> <span class="PCap CharOverride-4">Testes</span></li>
and the excepting output is this :
<li><span class="PCap CharOverride-4">C<span style="font-size:83%">ONTR么LES</span></span>
<span class="PCap CharOverride-4">T<span style="font-size:83%">ESTES</span></span></li>
but insted, I get this as result :
<li><span class="PCap CharOverride-4">C<span style="font-size:83%">ONTR么LES</span></span> <span class="PCap CharOverride-4">C<span style="font-size:83%">ONTR么LES</span></span></li>
Is there something that I missed ?
Here is what I've done so far :
for line in file_data.readlines():
#print(line)
reg = re.compile(r'(?P<b1>(<'+balise_name+' class="(([a-zA-Z0-9_\-]*?) |)'+class_value+')(| ([a-zA-Z0-9_\-]*?))">)(?P<maj>([A-Z脌脕脗脛脜脝脟脠脡脢脣脤脥脦脧脨脩脪脫脭脮脰脴脵脷脹脺脻]))(?P<min>([a-z脿谩芒茫盲氓忙莽猫茅毛矛铆卯茂冒貌贸么玫枚酶霉煤没眉媒每碌艙拧]*?))(?P<b2>(<\/'+balise_name+'>))')
#print(reg)
search = reg.findall(line)
print(search)
if (search != None):
for matchObj in search:
print(matchObj)
#print(matchObj[8])
print(line)
balise1 = matchObj[0] #search.group('b1')
print(balise1)
balise2 = matchObj[10] #matchObj.group('b2')
print(balise2)
maj = matchObj[6] #matchObj.group('maj')
print(maj)
min = matchObj[8] #matchObj.group('min')
print(min)
sub_str = balise1+""+maj+"<span style=\"font-size:83%\">"+min.upper()+"</span>"+balise2
line = re.sub(reg, sub_str, line)
#ouverture du fichier pour ajour ligne
filename = file_name.split(".")
#file_result = open(filename[0]+"-OK."+filename[1], "a")
#file_result.writelines(line)
#file_data.writelines(line)
#file_result.close()
print(line)
NB : I don't know how to use the module Beautifulsoup of python so why I do it manually.
Pardon me for my poor english.
Thanks for your answer !!
So, I totally forgot about this question but here is the solution I came up with after fixing the code I wrote long time ago :
for line in file_data.readlines():
reg = re.compile(r'(?P<b1>(\<' + balise_name + ' class=\"(([a-zA-Z0-9_\-]*?) |)' + class_value +
')(| ([a-zA-Z0-9_\-]*?))\"\>)(?P<maj>([A-Z脌脕脗脛脜脝脟脠脡脢脣脤脥脦脧脨脩脪脫脭脮脰脴脵脷脹脺脻]))(?P<min>([a-z脿谩芒茫盲氓忙莽猫茅毛矛铆卯茂冒貌贸么玫枚酶霉煤没眉媒每碌艙拧]*?))(?P<b2>(\<\/' + balise_name + '\>))')
print(line)
while reg.search(line):
search = reg.search(line)
if search:
print(search)
while search:
balise1 = search[0] # search.group('b1')
print('b1 : ' + str(balise1))
balise2 = search[11] # search.group('b2')
print('b2 : ' + str(balise2))
maj = search[7] # search.group('maj')
print('maj : ' + str(maj))
min = search[9] # search.group('min')
print('min : ' + str(min))
sub_str = search[1] + "" + maj + "<span style=\"font-size:83%\">" + min.upper() + \
"</span>" + balise2
print(sub_str)
line = re.sub(str(search[0]), sub_str, line)
print(line)
search = None
Here is what I changed with the code :
Fix some unescaped char inside the pattern
Iterate the result one by one
Fix group number for the sub function
Hope it will help someone who faced the same problem as me.

Find value when not between quotes

Using JavaScript & regex I want to split a string on every %20 that is not within quotes, example:
Here%20is%20"a%20statement%20"%20for%20Testing%20"%20The%20Values%20"
//easy to read version: Here is "a statement " for Testing " The Values "
______________ ______________
would return
{"Here","is","a statement ","for","Testing"," The Values "}
but it seems my regex are no longer strong enough to build the expression. Thanks for any help!
A way using the replace method, but without using the replacement result. The idea is to use a closure to fill the result variable at each occurence:
var txt = 'Here%20is%20"a%20statement%20"%20for%20Testing%20"%20The%20Values%20"';
var result = Array();
txt.replace(/%20/g, ' ').replace(/"([^"]+)"|\S+/g, function (m,g1) {
result.push( (g1==undefined)? m : g1); });
console.log(result);
Just try with:
var input = 'Here%20is%20"a%20statement%20"%20for%20Testing%20"%20The%20Values%20"',
tmp = input.replace(/%20/g, ' ').split('"'),
output = []
;
for (var i = 0; i < tmp.length; i++) {
var part = tmp[i].trim();
if (!part) continue;
if (i % 2 == 0) {
output = output.concat(part.split(' '));
} else {
output.push(part);
}
}
Output:
["Here", "is", "a statement", "for", "Testing", "The Values"]

Replace each RegExp match with different text in ActionScript 3

I'd like to know how to replace each match with a different text?
Let's say the source text is:
var strSource:String = "find it and replace what you find.";
..and we have a regex such as:
var re:RegExp = /\bfind\b/g;
Now, I need to replace each match with different text (for example):
var replacement:String = "replacement_" + increment.toString();
So the output would be something like:
output = "replacement_1 it and replace what you replacement_2";
Any help is appreciated..
You could also use a replacement function, something like this:
var increment : int = -1; // start at -1 so the first replacement will be 0
strSource.replace( /(\b_)(.*?_ID\b)/gim , function() {
return arguments[1] + "replacement_" + (increment++).toString();
} );
I came up with a solution finally..
Here it is, if anyone needs:
var re:RegExp = /(\b_)(.*?_ID\b)/gim;
var increment:int = 0;
var output:Object = re.exec(strSource);
while (output != null)
{
var replacement:String = output[1] + "replacement_" + increment.toString();
strSource = strSource.substring(0, output.index) + replacement + strSource.substring(re.lastIndex, strSource.length);
output = re.exec(strSource);
increment++;
}
Thanks anyway...
leave off the g (global) flag and repeat the search with the appropriate replace string. Loop until the search fails
Not sure about actionscript, but in many other regex implementations you can usually pass a callback function that will execute logic for each match and replace.