C++ to Ruby: find substring in a string - c++

I learned Rails and now would like to expand my knowledge of Ruby. So I'm doing some C++ exercises in Ruby. Specifically I need to find if a substring exists in a string. If it does I need it to return its starting index. If it doesn't exist have it return -1. I came up with a Ruby solution that's very similar to C++ and was wondering if there's a "better", more idiomatic solution in Ruby?
C++
int find(char str[], char sub_str[])
{
int str_length = strlen(str);
int sub_str_length = strlen(sub_str);
bool match = false;
for(int i=0; i<str_length; i++)
{
if(str[i] == sub_str[0])
{
for(int j=1; j<sub_str_length; j++)
{
if(str[i+j] == sub_str[j])
match = true;
else
{
match = false;
break;
}
}
if(match)
return i;
}
}
return -1;
}
Ruby
def find_sub_str(str, sub_str)
match = false
for i in 0...str.length
if str[i] == sub_str[0]
for j in 1...sub_str.length
if str[i+j] == sub_str[j]
match = true
else
match = false
break
end
end
if match == true
return i
end
end
end
return -1
end

You can use the index method of String. It returns nil on failure to match, which is more idiomatic Ruby than returning -1.
"SubString".index("String") # -> 3
"SubString".index("C++") # -> nil
You could wrap it in a test that returns -1 for nil if you really wanted this behavior.

Don’t use for in Ruby, it just calls each and doesn’t introduce scope. So for i in 0...str.length becomes (0...str.length).each do |i|.
Higher-order functions are your friend! Using each_cons & find_index makes things much cleaner (study Enumerable, it’s home to many useful methods):
def find_sub_str(str, sub_str)
str.chars.each_cons(sub_str.length).find_index do |s|
s.join == sub_str
end
end
find_sub_str('foobar', 'ob') #=> 2
Just use Ruby core’s index :):
'foobar'.index('ob') #=> 2
Both #2 & #3 return nil, not -1, when there is no match. This is best because nil is falsey in Ruby.

#how if we use this solution, it gets the job done in O(n)
given_string = "Replace me with your code!"
chars_given_string = given.split('')
chars_of_substr = "ith".split('')
is_substr = false
ptr = 0
char_given.each do |i|
if ( i == substr[ptr])
ptr += 1
else
ptr = 0
end
is_substr = true if ptr == substr.length
break if ptr == substr.length
end
puts is_substr

Related

Idiomatic rust rewrite of Rob Pikes regex

In the book The Practice of Programming, there's a short snippet of C code that implements a regex matcher. Brian W Kernighan has expanded on that chapter and published it online at https://www.cs.princeton.edu/courses/archive/spr09/cos333/beautiful.html
Robs C code relies on the '\0' sentinels for length matching, but that won't work in Rust. So my port of the code ends up with lots of length checks.
I use slices to byte arrays to simplify the task a bit, instead of supporting utf-8 encoded strings.
The implementation consists of three functions re_match, match_here and match_star. I use ranges for iterating over the slices.
// Search for regex anywhere in the text.
fn re_match(regex: &[u8], text: &[u8]) -> bool {
if regex.len() > 0 && regex[0] == b'^' {
return match_here(&regex[1..], &text);
}
// We need to check even if text is empty
if match_here(regex, text) {
return true
}
for i in 0..text.len() {
if match_here(regex, &text[i..]) {
return true
}
}
false
}
// Search for regex at beginning of text
fn match_here(regex: &[u8], text: &[u8]) -> bool {
if regex.len() == 0 {
return true;
}
if regex.len() > 1 && regex[1] == b'*' {
return match_star(regex[0], &regex[2..], text);
}
if regex.len() == 1 && regex[0] == b'$' {
return text.len() == 0;
}
if text.len() > 0 && (regex[0] == b'.' || regex[0] == text[0]) {
return match_here(&regex[1..], &text[1..]);
}
false
}
// Search for c* regex at beginning of text.
fn match_star(c: u8, regex: &[u8], text: &[u8]) -> bool {
if match_here(regex, text) {
return true;
}
let mut i = 0;
while i < text.len() && (text[i] == c || c == b'.') {
if match_here(regex, &text[i..]) {
return true;
}
i += 1;
}
false
}
Question
How can I rewrite these kind of functions to not need so many length checks?
META: Should I use iterators instead of slices as parameters? When choose one over the other?
For the match_here I would use match with the underappreciated slice-pattern syntax. That would avoid most len() uses. Using is_empty() is nicer than len() == 0:
fn match_here(regex: &[u8], text: &[u8]) -> bool {
match regex {
&[] => true,
&[b'$'] => {
text.is_empty()
}
&[z, b'*', ref tail # ..] => {
match_star(z, tail, text)
}
&[z, ref tail # ..] if if !text.is_empty() && (z == b'.' || z == text[0]) => {
match_here(tail, &text[1..])
}
_ => false,
}
}
Or if you feel fancy you can do a double match:
fn match_here(regex: &[u8], text: &[u8]) -> bool {
match (regex, text) {
(&[], _) => true,
(&[b'$'], &[]) => true,
(&[b'$'], _) => false,
(&[z, b'*', ref tail # ..], txt) => {
match_star(z, tail, txt)
}
(&[z, ref tail # ..], &[]) => false,
(&[z, ref tail # ..], &[tz, ref ttail # ..]) => if z == b'.' || z == tz => {
match_here(tail, ttail)
}
_ => false,
}
}
The cool thing about this latter option is that since you are never using the index operator [x] you are sure you will never go out of bounds, without ever checking the len() of your slices.
The len() of the other functions, I don't particularly see them as non-idiomatic. They may be rewritten in a more rusty way, but then the equivalence to the C code would not be so obvious. You would need to stop and think!
About using iterators or slices, I personally prefer slices for this kind of things. The problem with iterators is the backbuffer, for example, to check for the x* you need to take two bytes from the iterator, but then, if the second one is not a * you have to put it back... Naturally you can use Iterator::peek but that will only give you one element. If at any time you need to look ahead two bytes, you have a problem.
So unless you need to parse a very big input (> hundreds of MBs) I would stick to the plain slices.

Backspace String Compare Leetcode Question

I have a question about the following problem on Leetcode:
Given two strings S and T, return if they are equal when both are typed into empty text editors. # means a backspace character.
Example 1:
Input: S = "ab#c", T = "ad#c"
Output: true
Explanation: Both S and T become "ac".
Example 2:
Input: S = "ab##", T = "c#d#"
Output: true
Explanation: Both S and T become "".
Example 3:
Input: S = "a##c", T = "#a#c"
Output: true
Explanation: Both S and T become "c".
Example 4:
Input: S = "a#c", T = "b"
Output: false
Explanation: S becomes "c" while T becomes "b".
Note:
1 <= S.length <= 200
1 <= T.length <= 200
S and T only contain lowercase letters and '#' characters.
Follow up:
Can you solve it in O(N) time and O(1) space?
My answer:
def backspace_compare(s, t)
if (s.match?(/[^#[a-z]]/) || t.match?(/[^#[a-z]]/)) || (s.length > 200 || t.length > 200)
return "fail"
else
rubular = /^[\#]+|([^\#](\g<1>)*[\#]+)/
if s.match?(/#/) && t.match?(/#/)
s.gsub(rubular, '') == t.gsub(rubular, '')
else
new_s = s.match?(/#/) ? s.gsub(rubular, '') : s
new_t = t.match?(/#/) ? t.gsub(rubular, '') : t
new_s == new_t
end
end
end
It works in the terminal and passes the given examples, but when I submit it on leetcode it tells me Time Limit Exceeded. I tried shortening it to:
rubular = /^[\#]+|([^\#](\g<1>)*[\#]+)/
new_s = s.match?(/#/) ? s.gsub(rubular, '') : s
new_t = t.match?(/#/) ? t.gsub(rubular, '') : t
new_s == new_t
But also the same error.
So far, I believe my code fulfills the O(n) time, because there are only two ternary operators, which overall is O(n). I'm making 3 assignments and one comparison, so I believe that fulfills the O(1) space complexity.
I have no clue how to proceed beyond this, been working on it for a good 2 hours..
Please point out if there are any mistakes in my code, and how I am able to fix it.
Thank you! :)
Keep in mind that with N <= 200, your problem is more likely to be linear coefficient, not algorithm complexity. O(N) space is immaterial for this; with only 400 chars total, space is not an issue. You have six regex matches, two of which are redundant. More important, regex is slow processing for such a specific application.
For speed, drop the regex stuff and do this one of the straightforward, brute-force ways: run through each string in order, applying the backspaces as appropriate. For instance, change both the backspace and the preceding letter to spaces. At the end of your checking, remove all the spaces in making a new string. Do this with both S and T; compare those for equality.
It may be easiest to start at the end of the string and work towards the beginning:
def process(str)
n = 0
str.reverse.each_char.with_object('') do |c,s|
if c == '#'
n += 1
else
n.zero? ? (s << c) : n -= 1
end
end.reverse
end
%w|ab#c ad#c ab## c#d# a##c #a#c a#c b|.each_slice(2) do |s1, s2|
puts "\"%s\" -> \"%s\", \"%s\" -> \"%s\" %s" %
[s1, process(s1), s2, process(s2), (process(s1) == process(s2)).to_s]
end
"ab#c" -> "ac", "ad#c" -> "ac" true
"ab##" -> "", "c#d#" -> "" true
"a##c" -> "c", "#a#c" -> "c" true
"a#c" -> "c", "b" -> "b" false
Let's look at a longer string.
require 'time'
alpha = ('a'..'z').to_a
#=> ["a", "b", "c",..., "z"]
s = (10**6).times.with_object('') { |_,s|
s << (rand < 0.4 ? '#' : alpha.sample) }
#=> "h####fn#fjn#hw###axm...#zv#f#bhqsgoem#glljo"
s.size
#=> 1000000
s.count('#')
#=> 398351
and see how long it takes to process.
require 'time'
start_time = Time.now
(u = process(s)).size
#=> 203301
puts (Time.now - start_time).round(2)
#=> 0.28 (seconds)
u #=> "ffewuawhfa...qsgoeglljo"
As u will be missing the 398351 pound signs in s, plus an almost equal number of other characters removed by the pound signs, we would expect u.size to be about:
10**6 - 2 * s.count('#')
#=> 203298
In fact, u.size #=> 203301, meaning that, at the end, 203301 - 203298 #=> 3 pound signs were unable to remove a character from s.
In fact, process can be simplified. I leave that as an exercise for the reader.
class Solution {
public boolean backspaceCompare(String s, String t) {
try {
Stack<Character> st1 = new Stack<>();
Stack<Character> st2 = new Stack<>();
st1 = convertToStack(s);
st2 = convertToStack(t);
if (st1.size() != st2.size()) {
return false;
} else {
int length = st1.size();
for (int i = 0; i < length; i++) {
if (st1.peek() != st2.peek())
return false;
else {
st1.pop();
st2.pop();
}
if (st1.isEmpty() && st2.isEmpty())
return true;
}
}
} catch (Exception e) {
System.out.print(e);
}
return true;
}
public Stack<Character> convertToStack(String s){
Stack<Character> st1 = new Stack<>();
for (int i = 0; i < s.length(); i++) {
if (s.charAt(i) != '#') {
st1.push(s.charAt(i));
} else if (st1.empty()) {
continue;
} else {
st1.pop();
}
}
return st1;
}
}

Match word by its prefix

I'm trying to match a string by its prefix that ends with a particular character. For example, if my string is "abcd" and ends in #, then any word which is a prefix of "abcd" should be matched as long as it ends with #. Here are some examples to help illustrate the pattern:
Input: "ab#" gives true (as "ab" is a prefix of "abcd" and end with a #).
Input: "abcd#" gives true (as "abcd" is a prefix of "abcd" and end with a #).
Input: "bc#" gives false (as "bc" is a not a prefix of "abcd" ).
Input: "ab#" gives false (while "ab" is a prefix of "abcd", it doesn't end with #) .
Input: "ac#" gives false (while "ac" is contained within "abcd", it doesn't begin with a prefix from "abcd") .
So far, I've managed to come up with the following expression which seems to be working fine:
/(abcd|abc|ab|a)#/
While this is working, it isn't very practical, as larger words of length n will make the expression quite large:
/(n|n-1|n-2| ... |1)#/
Is there a way to rewrite this expression so it is more scalable and concise?
Example of my attempt (in JS):
const regex = /(abcd|abc|ab|a)#/;
console.log(regex.test("abcd#")); // true
console.log(regex.test("ab#")); // true
console.log(regex.test("abc#")); // true
console.log(regex.test("abz#")); // false
console.log(regex.test("abc#")); // false
Edit: Some of the solutions provided are nice and do do what I'm after, however, for this particular question, I'm after a solution which uses pure regular expressions to match the prefix.
Just use String#startsWith and String#endsWith here:
String input = "abcd";
String prefix = "ab#";
if (input.startsWith(prefix.replaceAll("#$", "")) && prefix.endsWith("#")) {
System.out.println("MATCH");
}
else {
System.out.println("NO MATCH");
}
Edit: A JavaScript version of the above:
var input = "abcd";
var prefix = "ab#";
if (input.startsWith(prefix.replace(/#$/, "")) && prefix.endsWith("#")) {
console.log("MATCH");
}
else {
console.log("NO MATCH");
}
Try ^ab?c?d?#$
Explanation:
`^` - match beginning of a string
`b?` - match match zero or one `b`
Rest is analigocal to the above.
Demo
Here's a left field JavaScript option. Build and array of valid prefixes, use join on the array to make your regex pattern.
var validPrefixes = ["abcd",
"abc",
"ab",
"a",
"areallylongprefix"];
var regexp = new RegExp("^(" + validPrefixes.join("|") + ")#$");
console.log(regexp.test("abcd#"));// true
console.log(regexp.test("ab#")); // true
console.log(regexp.test("abc#")); // true
console.log(regexp.test("abz#")); // false
console.log(regexp.test("abc#")); // false
console.log(regexp.test("areallylongprefix#")); //true
This can be adapted to the language of tour choosing, also handy if your prefixes are dynamically retrieved from a database or similar.
Here's my c# attempt:
private static bool test(string v)
{
var pattern = "abcd#";
//No error handling
return v.EndsWith(pattern[pattern.Length-1])
&& pattern.Replace("#", "").StartsWith(v.Replace("#",""));
}
Console.WriteLine(test("abcd#")); // true
Console.WriteLine(test("ab#")); // true
Console.WriteLine(test("abc#")); // true
Console.WriteLine(test("abz#")); // false
Console.WriteLine(test("abc#")); // false
Console.WriteLine(test("abc")); //false
/a(b(cd?)?)?#/
Or for a longer example, to match a prefix of "abcdefg#":
/a(b(c(d(e(fg?)?)?)?)?)?#/
Generating this regex isn't completely trivial, but some options are:
function createPrefixRegex(s) {
// This method creates an unnecessary set of parentheses
// around the last letter, but that won't harm anything.
return new RegExp(s.slice(0,-1).split('').join('(') + ')?'.repeat(s.length - 2) + '#');
}
function createPrefixRegex2(s) {
var r = s[0];
for (var i = 1; i < s.length - 2; ++i) {
r += '(' + s[i];
}
r += s[s.length - 2] + '?' + ')?'.repeat(s.length - 3) + '#';
return new RegExp(r);
}
function createPrefixRegex3(s) {
var recurse = function(i) {
if (i >= s.length - 1) {
return '';
}
if (i === s.length - 2) {
return s[i] + '?';
}
return '(' + s[i] + recurse(i + 1) + ')?';
}
return new RegExp(s[0] + recurse(1) + '#');
}
These may fail if the input string has no prefix before the '#' character, and they assume the last character in the string is '#'.

Regular expression: match an incomplete search string

Is there a regular expression for matching a string that is not necessarily complete?
Example:
some other supercalifragilisticexpialidocious random things
and maybe supercalifragilistic meaningless padding
lorem superca ipsum dolor
I would like to match whichever left part of supercalifragilisticexpialidocious there is each time. There are not necessarily spaces around words.
The expected result would be to find:
supercalifragilisticexpialidocious
supercalifragilistic
superca
This is similar to matching the same character any number of times, but more universal.
Thank you!
I know this isn't a regex, but I think your objective can be accomplished better using code. Here's an example of a JavaScript function that matches as much of an input string as it can:
function matchMost(find, string){
for(var i = 0 ; i < find.length ; i++){
for(var j = find.length ; j > i ; j--){
if(string.indexOf(find.substring(i, j)) !== -1){
return find.substring(i, j);
}
}
}
return false;
}
For example, if you call matchMost("supercalifragilisticexpialidocious", "lorem superca ipsum dolor"), it will return the string "superca". If string doesn't contain a single character from find, the function will return false.
Here's a JS Fiddle where you can test this code: http://jsfiddle.net/n252eyw1/
UPDATE
This function matches as much of the left side of an input string as it can:
function matchMostLeft(find, string){
for(var j = find.length ; j > 0 ; j--){
if(string.indexOf(find.substring(0, j)) !== -1){
return find.substring(0, j);
}
}
return false;
}
JS Fiddle: http://jsfiddle.net/sjy312ae/
There is, but it's not tidy at all (and probably not very performant either). This regex matches at least 3 characters on the left side and up to supercal as written; the way to extend it should be fairly plain.
(?:sup(?:e(?:r(?:c(?:a(?:l)?)?)?)?)?)?
Paul's answer is likely far more useful in the general case.
Below is an alternative solution. It is similar to Paul's in that it uses indexOf rather than regular expressions. This also makes it equally case-sensitive. My approach should perform better in exceptional cases where Paul's solution would cause excessive calls to indexOf; typically when:
the needle is very long (worse than supercalifragilisticexpialidocious), and
you have a lot of separate texts to scan, and
the majority of texts either do not match, or contain only short matches.
If this is not the case with you, then please use Paul's solution, as it is clean, simple and readable.
function getLongestMatchingPrefix(needle, haystack) {
var len = 0;
var i = 0;
while (len < needle.length && (i = haystack.substring(i).indexOf(needle.substring(0, len + 1))) >= 0) {
while (++len < needle.length && haystack.substring(i, i + len + 1) == needle.substring(0, len + 1)) {}
}
return needle.substring(0, len);
}
Fiddle: http://jsfiddle.net/ov26msj5/

VB.Net Remove Everything After Third Hyphen

Here is the string I am looking to modify:
170-0175-00B-BEARING PLATE MACHINING.asm:2
I want to keep "170-0175-00B". So I need to remove the third hyphen and whatever is after it.
A rapid solution
string test = "170-0175-00B-BEARING PLATE MACHINING.asm:2";
int num = 2;
int index = test.IndexOf('-');
while(index > 0 && num > 0)
{
index = test.IndexOf('-', index+1);
num--;
}
if(index > 0)
test = test.Substring(0, index);
of course, if you are searching for the last hyphen then is more simple to do
int index = test.LastIndexOf('-');
if(index > 0)
test = test.Substring(0, index);
What about some LINQ?
Dim str As String = "170-0175-00B-BEARING PLATE MACHINING.asm:2"
MsgBox(String.Join("-"c, str.Split("-"c).Take(3)))
With this approach, you can take anything out after Nth hyphen, where N is easily controlled (a const).
Something like this?
regex.Replace(sourcestring,"^((?:[^-]*-){2}[^-]*).*","$1",RegexOptions.Singleline))
You may not want the Singleline option though, depending how you use it.
Thanks so much for such fast replies.
Here's the path that I took:
FormatDessinName("170-0175-00B-BEARING PLATE MACHINING.asm:2")
Private Function FormatDessinName(DessinName As String)
Dim match As Match = Regex.Match(DessinName, "[0-9]{3}-[0-9]{4}-[0-9]{2}[A-Za-z]([0-9]+)?") 'Matches 000-0000-00A(optional numbers after the last letter)
Dim formattedName As String = ""
If match.Success Then 'Returns true or false
formattedName = match.Value 'Returns the actual matched value
End If
Return formattedName
End Function
Works great!