Regular Expression to find string starts with letter and ends with slash /

Regular Expression to find string starts with letter and ends with slash / - regex

I'm having a collection which has 1000 records has a string column.
I'm using Jongo API for querying mongodb.
I need to find the matching records where column string starts with letter "AB" and ends with slash "/"
Need help on the query to query to select the same.
Thanks.

I'm going to assume you know how to query using Regular Expressions in Jongo API and are just looking for the necessary regex to do so?
If so, this regex will find any string that begins 'AB' (case sensitive), is followed by any number of other characters and then ends with forward slash ('/'):
^AB.*\/$
^ - matches the start of the string
AB - matches the string 'AB' exactly
.* - matches any character ('.') any number of times ('*')
\/ - matches the literal character '/' (backslash is the escape character)
$ - matches the end of the string
If you're just getting started with regex, I highly recommend the Regex 101 website, it's a fantastic sandbox to test regex in and explains each step of your expression to make debugging much simpler.

I have found out the solution and the following worked fine.
db.getCollection('employees').find({employeeName:{$regex: '^Raju.*\\/$'}})
db.getCollection('employees').find({employeeName:{$regex: '\/$'}})
getCollection().find("{employeeName:{$regex:
'^Raju.*\\\\/$'}}").as(Employee.class);
getCollection().find("{employeeName:{$regex: '\\/$'}}").as(Employee.class);
getCollection().find("{employeeName:#}",
Pattern.compile("\\/$")).as(Employee.class);
getCollection().find("{"Raju": {$regex: #}}", "\\/$").as(Employee.class);
map = new HashMap<>();
map.put("employeeName", Pattern.compile("\\/$"));
coll = getCollection().getDBCollection().find(new BasicDBObject(map));

You could try this.
public List<Product> searchProducts(String keyword) {
MongoCursor<Product> cursor = collection.find("{name:#}", Pattern.compile(keyword + ".*")).as(Product.class);
List<Product> products = new ArrayList<Product>();
while (cursor.hasNext()) {
Product product = cursor.next();
products.add(product);
}
return products;
}

Related

Getting words Starting with symbol in dart

I'm trying to parse in Dart long strings containing hashtags, so far I tried various combinations with regexp but I cannot find the right use.
My code is
String mytestString = "#one #two, #three#FOur,#five";
RegExp regExp = new RegExp(r"/(^|\s)#\w+/g");
print(regExp.allMatches(mytestString).toString());
The desidered output would be a list of hahstags
#one #two #three #FOur #five
Thankyou in advance

You should not use a regex literal inside a string literal, or backslashes and flags will become part of the regex pattern. Also, omit the left-hand boundary pattern (that matches start of string or whitespace) if you need to match # followed with 1+ word chars in any context.
Use
String mytestString = "#one #two, #three#FOur,#five";
final regExp = new RegExp(r"#\w+");
Iterable<String> matches = regExp.allMatches(mytestString).map((m) => m[0]);
print(matches);
Output: (#one, #two, #three, #FOur, #five)

String mytestString = "#one #two, #three#FOur,#five";
RegExp regExp = new RegExp(r"/(#\w+)/g");
print(regExp.allMatches(mytestString).toString());
This should match all of the hashtags, placing them into capture groups for you to later use.

SCALA regex: Find matching URL rgex within a sentence

import java.util.regex._
object RegMatcher extends App {
val str="facebook.com"
val urlpattern="(http://|https://|file://|ftp://)?(www.)?([a-zA-Z0-9]+).[a-zA-Z0-9]*.[a-z]{3}.?([a-z]+)?"
var regex_list: Set[(String, String)] = Set()
val url=Pattern.compile(urlpattern)
var m=url.matcher(str)
if (m.find()) {
regex_list += (("date", m.group(0)))
println("match: " + m.group(0))
}
val str2="url is ftp://filezilla.com"
m=url.matcher(str2)
if (m.find()) {
regex_list += (("date", m.group(0)))
println("str 2 match: " + m.group(0))
}
}
This returns
match: facebook.com
str 2 match: url is ftp:
How do I manage the regex pattern so that both the strings are matched well.
What do the symbols actually mean in regex. I am very new to regex. Please help.

I read your regex as:
0 or 1 (? modifier) of the schemes (http://, https://, etc.)
followed by 0 or 1 instance of www.,
followed by 1 or more (+ modifier ) alphanumeric characters ,
followed by any character ( . is a regex special character, remember, standing for any one character),
followed by 0 or more (* modifier) alphanumerics,
followed by any character (. again)
followed by 3 lowercase letters ({3} being an exact count modifier)
followed by 0 or 1 of any character (.?)
followed by one or more lowecase letters.
If you plug your regex into regex101.com, you'll not only see a similar breakdown ( without any errors I might have made, though I think i nailed it), and you'll also have a chance to test various strings against it. Then, once you have your regexes working the way you want, you can bring them back to your script. It's a solid workflow for both learning regexes and developing an expression for a particular purpose.
If you drop your regex and your inputs into regex 101, you'll see why you're getting the output you see. But here's a hint: when you ask your regular expression to match "url is ftp://filezilla.com", nothing excludes "url is" from being part of the match. That's why you're not matching the scheme you want. Regex101 really is a great way to investigate this further.

The regex can be updated to
((ftp|https|http?):\/\/(?:www\.|(?!www))[a-zA-Z0-9][a-zA-Z0-9-]+[a-zA-Z0-9]\.[^\s]{2,}|www\.[a-zA-Z0-9][a-zA-Z0-9-]+[a-zA-Z0-9]\.[^\s]{2,})
This is all I needed.

Surrounding one group with special characters in using substitute in vim

Given string:
some_function(inputId = "select_something"),
(...)
some_other_function(inputId = "some_other_label")
I would like to arrive at:
some_function(inputId = ns("select_something")),
(...)
some_other_function(inputId = ns("some_other_label"))
The key change here is the element ns( ... ) that surrounds the string available in the "" after the inputId
Regex
So far, I have came up with this regex:
:%substitute/\(inputId\s=\s\)\(\"[a-zA-Z]"\)/\1ns(/2/cgI
However, when deployed, it produces an error:
E488: Trailing characters
A simpler version of that regex works, the syntax:
:%substitute/\(inputId\s=\s\)/\1ns(/cgI
would correctly inser ns( after finding inputId = and create string
some_other_function(inputId = ns("some_other_label")
Challenge
I'm struggling to match the remaining part of the string, ex. "select_something") and return it as:
"select_something")).

You have many problems with your regex.
[a-zA-Z] will only match one letter. Presumably you want to match everything up to the next ", so you'll need a \+ and you'll also need to match underscores too. I would recommend \w\+. Unless more than [a-zA-Z_] might be in the string, in which case I would do .\{-}.
You have a /2 instead of \2. This is why you're getting E488.
I would do this:
:%s/\(inputId = \)\(".\{-}\)"/\1ns(\2)/cgI
Or use the start match atom: (that is, \zs)
:%s/inputId = \zs\".\{-}"/ns(&)/cgI

You can use a negated character class "[^"]*" to match a quoted string:
%s/\(inputId\s*=\s*\)\("[^"]*"\)/\1ns(\2)/g

How to create "blocks" with Regex

For a project of mine, I want to create 'blocks' with Regex.
\xyz\yzx //wrong format
x\12 //wrong format
12\x //wrong format
\x12\x13\x14\x00\xff\xff //correct format
When using Regex101 to test my regular expressions, I came to this result:
([\\x(0-9A-Fa-f)])/gm
This leads to an incorrect output, because
12\x
Still gets detected as a correct string, though the order is wrong, it needs to be in the order specified below, and in no other order.
backslash x 0-9A-Fa-f 0-9A-Fa-f
Can anyone explain how that works and why it works in that way? Thanks in advance!

To match the \, folloed with x, followed with 2 hex chars, anywhere in the string, you need to use
\\x[0-9A-Fa-f]{2}
See the regex demo
To force it match all non-overlapping occurrences, use the specific modifiers (like /g in JavaScript/Perl) or specific functions in your programming language (Regex.Matches in .NET, or preg_match_all in PHP, etc.).
The ^(?:\\x[0-9A-Fa-f]{2})+$ regex validates a whole string that consists of the patterns like above. It happens due to the ^ (start of string) and $ (end of string) anchors. Note the (?:...)+ is a non-capturing group that can repeat in the string 1 or more times (due to + quantifier).
Some Java demo:
String s = "\\x12\\x13\\x14\\x00\\xff\\xff";
// Extract valid blocks
Pattern pattern = Pattern.compile("\\\\x[0-9A-Fa-f]{2}");
Matcher matcher = pattern.matcher(s);
List<String> res = new ArrayList<>();
while (matcher.find()){
res.add(matcher.group(0));
}
System.out.println(res); // => [\x12, \x13, \x14, \x00, \xff, \xff]
// Check if a string consists of valid "blocks" only
boolean isValid = s.matches("(?i)(?:\\\\x[a-f0-9]{2})+");
System.out.println(isValid); // => true
Note that we may shorten [a-zA-Z] to [a-z] if we add a case insensitive modifier (?i) to the start of the pattern, or just use \p{Alnum} that matches any alphanumeric char in a Java regex.
The String#matches method always anchors the regex by default, we do not need the leading ^ and trailing $ anchors when using the pattern inside it.

Capturing a repeated group

I am attempting to parse a string like the following using a .NET regular expression:
H3Y5NC8E-TGA5B6SB-2NVAQ4E0
and return the following using Split:
H3Y5NC8E
TGA5B6SB
2NVAQ4E0
I validate each character against a specific character set (note that the letters 'I', 'O', 'U' & 'W' are absent), so using string.Split is not an option. The number of characters in each group can vary and the number of groups can also vary. I am using the following expression:
([ABCDEFGHJKLMNPQRSTVXYZ0123456789]{8}-?){3}
This will match exactly 3 groups of 8 characters each. Any more or less will fail the match.
This works insofar as it correctly matches the input. However, when I use the Split method to extract each character group, I just get the final group. RegexBuddy complains that I have repeated the capturing group itself and that I should put a capture group around the repeated group. However, none of my attempts to do this achieve the desired result. I have been trying expressions like this:
(([ABCDEFGHJKLMNPQRSTVXYZ0123456789]{8})-?){4}
But this does not work.
Since I generate the regex in code, I could just expand it out by the number of groups, but I was hoping for a more elegant solution.
Please note that the character set does not include the entire alphabet. It is part of a product activation system. As such, any characters that can be accidentally interpreted as numbers or other characters are removed. e.g. The letters 'I', 'O', 'U' & 'W' are not in the character set.
The hyphens are optional since a user does not need top type them in, but they can be there if the user as done a copy & paste.

BTW, you can replace [ABCDEFGHJKLMNPQRSTVXYZ0123456789] character class with a more readable subtracted character class.
[[A-Z\d]-[IOUW]]
If you just want to match 3 groups like that, why don't you use this pattern 3 times in your regex and just use captured 1, 2, 3 subgroups to form the new string?
([[A-Z\d]-[IOUW]]){8}-([[A-Z\d]-[IOUW]]){8}-([[A-Z\d]-[IOUW]]){8}
In PHP I would return (I don't know .NET)
return "$1 $2 $3";

I have discovered the answer I was after. Here is my working code:
static void Main(string[] args)
{
string pattern = #"^\s*((?<group>[ABCDEFGHJKLMNPQRSTVXYZ0123456789]{8})-?){3}\s*$";
string input = "H3Y5NC8E-TGA5B6SB-2NVAQ4E0";
Regex re = new Regex(pattern);
Match m = re.Match(input);
if (m.Success)
foreach (Capture c in m.Groups["group"].Captures)
Console.WriteLine(c.Value);
}

After reviewing your question and the answers given, I came up with this:
RegexOptions options = RegexOptions.None;
Regex regex = new Regex(#"([ABCDEFGHJKLMNPQRSTVXYZ0123456789]{8})", options);
string input = #"H3Y5NC8E-TGA5B6SB-2NVAQ4E0";
MatchCollection matches = regex.Matches(input);
for (int i = 0; i != matches.Count; ++i)
{
string match = matches[i].Value;
}
Since the "-" is optional, you don't need to include it. I am not sure what you was using the {4} at the end for? This will find the matches based on what you want, then using the MatchCollection you can access each match to rebuild the string.

Why use Regex? If the groups are always split by a -, can't you use Split()?

Sorry if this isn't what you intended, but your string always has the hyphen separating the groups then instead of using regex couldn't you use the String.Split() method?
Dim stringArray As Array = someString.Split("-")

What are the defining characteristics of a valid block? We'd need to know that in order to really be helpful.
My generic suggestion, validate the charset in a first step, then split and parse in a seperate method based on what you expect. If this is in a web site/app then you can use the ASP Regex validation on the front end then break it up on the back end.

If you're just checking the value of the group, with group(i).value, then you will only get the last one. However, if you want to enumerate over all the times that group was captured, use group(2).captures(i).value, as shown below.
system.text.RegularExpressions.Regex.Match("H3Y5NC8E-TGA5B6SB-2NVAQ4E0","(([ABCDEFGHJKLMNPQRSTVXYZ0123456789]+)-?)*").Groups(2).Captures(i).Value

Mike,
You can use character set of your choice inside character group. All you need is to add "+" modifier to capture all groups. See my previous answer, just change [A-Z0-9] to whatever you need (i.e. [ABCDEFGHJKLMNPQRSTVXYZ0123456789])

You can use this pattern:
Regex.Split("H3Y5NC8E-TGA5B6SB-2NVAQ4E0", "([ABCDEFGHJKLMNPQRSTVXYZ0123456789]{8}+)-?")
But you will need to filter out empty strings from resulting array.
Citation from MSDN:
If multiple matches are adjacent to one another, an empty string is inserted into the array.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js